In the PMMX, the penalty for misprediction of a conditional jump is 4 clocks in the U-pipe,
and 5 clocks if it is executed in the V-pipe. For all other control transfer instructions it is 4
clocks.
In the PPro, P2 and P3, the misprediction penalty is higher due to the long pipeline. A
misprediction usually costs between 10 and 20 clock cycles.
Branch mispredictions are much more expensive on the P4 and P4E than on previous
generations of microprocessors. The time it takes to recover from a misprediction is rarely
less than 24 clock cycles, and typically around 45 µops. Apparently, the microprocessor
cannot cancel a bogus µop before it has reached the retirement stage. This means that if
you have a lot of µops with long latency or poor throughput, then the penalty for a
misprediction may be as high as 100 clock cycles or more.
The misprediction penalty is approximately 13 clock cycles in the PM and 15 clock cycles in
the Core2
Nehalem: The misprediction penalty is longer than on Core2 due to a longer pipeline. The measured
misprediction penalty is at least 17 clock cycles.
Sandy Bridge: The misprediction penalty is often shorter than on the Nehalem thanks to the µop cache
(see page 94 below). The misprediction penalty was measured to 15 clock cycles or more
for branches inside the µop cache and slightly more for branches in the level-1 code cache.
Atom: The penalty for mispredicting a branch is up to 13 clock cycles.
VIA Nano: The misprediction penalty is typically 16 clock cycles, max. 20.
K10: AMD manuals say that the branch misprediction penalty is 10 clock cycles if the code
segment base is zero and 12 clocks if the code segment base is nonzero. In my
measurements, I have found a minimum branch misprediction penalty of 12 and 13 clock
cycles, respectively.