From what I understand, in a pipelined CPU, every stage takes 1 cycle. But instructions are fetched from memory which takes up to ~150 cycles. The CPU fetches most instructions from the L1-cache, but I've read that it takes around 4 cycles. From that logic, a new instruction should only start every 4th cycle, which obviously makes no sense?
This diagram shows a new instruction starts every cycle.
So, how does the CPU fetch a new instruction every cycle, if it takes more than one cycle to even fetch an instruction?