How are MMIO requests routed in CPU microarchitecture — cache-bypass on same path or a separate bus/port?

Question

Short background: MMIO regions are typically mapped as uncachable / device memory, so CPU must not treat device registers like normal cacheable DRAM. I’m asking about the microarchitecture routing and buffering behavior when a load/store to an MMIO address is executed.

When a load/store to MMIO is executed, does the request normally follow the same core → uncore / interconnect path as ordinary memory requests but simply bypass cache tag lookup and cache-fill logic (i.e., forwarded on the same ports/queues), or is it routed via a separate logical/physical path (sideband/APB or dedicated port) with different buffers and routing rules — or is the real behavior commonly a hybrid of both (control MMIO via sideband, bulk/device data via main fabric/DMA)?

And also in gem5 models I’ve seen, MMIO accesses are implemented by letting the request traverse the cache hierarchy but marking it non-cacheable so it is not stored. Is that approach an accurate microarchitectural model of real CPUs, or primarily an implementation/simulation convenience? If it is accurate, what hardware behavior (buffers, write-combine, uncore forwarding) is being modeled.

I tried looking into Intel CPU datasheets, ARM documentation, and other public references, but I couldn’t find a clear statement about whether MMIO requests use the same path as normal memory requests (with cache-bypass) or a separate port/bus. Maybe I missed it, but I’d appreciate any concrete references if they exist.

On Intel CPUs for example, the System Agent is one of the stops on the ring bus that connects cores (and L3 slices) with each other and memory controllers. X86 Address Space Controller? has some info and links. — Peter Cordes
– Peter Cordes, Commented Sep 30 at 11:05

Mohamed Hussain S · Accepted Answer · 2025-09-30 10:18:41Z

0

Most MMIO accesses in modern CPUs use the same core, memory interconnect path as normal DRAM but are marked uncacheable, so caches are bypassed (no allocation or tag lookup). Buffers like load/store and write-combine still handle them. Some low-bandwidth control/status registers may use separate sideband ports with dedicated buffers. So gem5’s approach of traversing the cache hierarchy as uncached memory is a reasonable model for real CPUs.

answered Sep 30 at 10:18

Mohamed Hussain S

263 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SungwookKang Oct 1 at 4:19

Thank you very much for the detailed explanation. If I understood correctly, modern CPUs don’t explicitly separate MMIO and normal memory requests onto different ports. Could you kindly point me to any documentation or references where I could learn more about this in detail?

Mohamed Hussain S Oct 2 at 17:02

Refs:- intel.com/content/www/us/en/developer/articles/technical/… intel.com/content/www/us/en/security-center/advisory/… gem5.org/documentation/general_docs/memory_system/… Public microarchitecture docs don’t disclose the exact porting/routing, but the above confirm that MMIO is generally uncacheable traffic on the same fabric, with sideband paths used only for some control registers.

Collectives™ on Stack Overflow

How are MMIO requests routed in CPU microarchitecture — cache-bypass on same path or a separate bus/port?

1 Answer 1

2 Comments

Hot Network Questions