Short background: MMIO regions are typically mapped as uncachable / device memory, so CPU must not treat device registers like normal cacheable DRAM. I’m asking about the microarchitecture routing and buffering behavior when a load/store to an MMIO address is executed.
When a load/store to MMIO is executed, does the request normally follow the same core → uncore / interconnect path as ordinary memory requests but simply bypass cache tag lookup and cache-fill logic (i.e., forwarded on the same ports/queues), or is it routed via a separate logical/physical path (sideband/APB or dedicated port) with different buffers and routing rules — or is the real behavior commonly a hybrid of both (control MMIO via sideband, bulk/device data via main fabric/DMA)?
And also in gem5 models I’ve seen, MMIO accesses are implemented by letting the request traverse the cache hierarchy but marking it non-cacheable so it is not stored. Is that approach an accurate microarchitectural model of real CPUs, or primarily an implementation/simulation convenience? If it is accurate, what hardware behavior (buffers, write-combine, uncore forwarding) is being modeled.
I tried looking into Intel CPU datasheets, ARM documentation, and other public references, but I couldn’t find a clear statement about whether MMIO requests use the same path as normal memory requests (with cache-bypass) or a separate port/bus. Maybe I missed it, but I’d appreciate any concrete references if they exist.