Very similar to Programmed I/O but it is interrupt driven
CPU does its normal tasks and only focuses on the I/O module when it receives an interrupt.
Up until then the interface monitors the device and not the CPU.
On interrupt, CPU executes the ISR and resumes its tasks.
Direct Memory Access
In this method, the DMA controller takes complete control of the buses, writes directly to memory, bypassing the CPU.
The DMAC sends a BR (bus request) to the CPU asking to give up control of the bus.
The CPU responds with a BG (bus grant) after putting the address, data and read-write lines into high impedance.
After the transfer, the DMAC disables BR.
CPU resumes normal operation.
Different ways of DMA transfer
Burst transfer: A sequence of words is transferred in a continuous burst.
Cycle stealing: Only one word is transferred at a time after which bus control is given up to the CPU.
Configurations
Single bus with detached DMA: Each transfer suspends CPU twice - I/O to DMA, DMA to memory.
Single but with integrated DMA: Each transfer suspends CPU once - DMA to memory. The I/O module has a DMA integrated and doesn't transfer to DMA via bus unlike the previous config.
Seperate I/O bus: CPU is suspended only once. There is a separate bus for I/O which interacts with the memory via the DMA and then the data bus.
Working
Has three registers to store address, number of words to be transferred and mode of transfer (the three blue registers). BG = 1: DMA writes to memory. BG = 0: CPU writes to DMA or reads from DMA. RD/WR are bi-directional and used depending on BG.
CPU sends
starting address of memory block (that is to be read or written) to the DMA
Word count
Control - read/write
A control signal to start the transfer
Peripheral makes a DMA request.
DMAC makes a bus request.
CPU grants a bus grant.
DMAC places the address on the address bus and activates read or write.
DMAC sends an acknowledgement to the peripheral.
Direct Cache Access
DCA is just DMA but instead of main memory, it loads data to cache directly.
This eliminates the extra step of CPU fetching from main memory into cache with plain DMA.
Based on hints from TLP, (the hints contain info about which blocks might be need in cache) the CPU prefetches the blocks into cache for faster access.