Die Gerüchteküche brodelt und erste Vermutungen sind natürlich auch schon da [
Ace's Hardware ] :
Prescott: let us talk performance (INTEL)
By Johan
Tuesday, January 6, 2004 4:37 PM EST
Our previous newspost about Prescott's power consumption generated some brilliant and enlighting posts. The Ace's crew likes to thank you all for improving the quality of our messageboard to an excellent level again
.
Now that the Prescott launch is getting closer and we have a pretty good idea what the architectural improvements are, we can make some decent predictions. To refresh your memory let me reiterate a post that I published in September 2003. Prescott is basically a Northwood Pentium 4 with the following improvements:
bigger D-L1 cache (16 KB instead of 8 KB) & L2-cache (1 MB instead of 512 KB) .. No comments necessary
4x Improved Clock Distribution (compared to Northwood) for better Frequency Scaling
Automated design of the functional blocks for better clockscaling
Improved Imul latency : Northwood/Willamette do their integer multiplications on the FPU, and the big latency is due routing the data between integer and FP datapaths. Prescott has a dedicated integer multiplier. (Thanks goes to Heikki Kultala).
Prescott New Instructions (SSE-3), which will not improve performance at the launch (needs optimized software).
Additional WC Buffers. Instead of sending small pieces of data to the AGP videocard, these pieces of data are stored together in buffers, and send through in one big burst. This helps to preserve FSB bandwidth as the bandwidth of the FSB is more efficiently used (less overhead from one big burst than from many small ones)
Improved Pre-Fetcher Branch Predictor. I did not get much info on this but it seems that the buffers have been made bigger so the branch predictor will be able to cope better with more then one thread.
Improved Hyperthreading: two new instructions: Monitor and Wait, which will only improve performance on recompiled software.
we can add two more, one being a fact and another being a rumor:
longer pipeline (fact)
Higher L2-cache latency ? (rumor)
Now if you just look at the first group you would assume Prescott is a super Pentium 4: better clock scalability and more IPC. However, SSE-3 software will not exist at the launch of Prescott, additional WC buffers are not going to make much difference and the small difference gets even smaller because the 800 MHz FSB has access to a very decent amount of bandwidth. The improved imul latency could help, but the real bottleneck of integer code lies in branches. Not even a 6.8 GHz ALU is going to help there. The impact of the 16 KB L1 should not be overestimated as 16 KB might have seemed much back in the days of the 486DX4 but critical loops of the software of today requires much more.
So we are left with fast twice as big L2-cache and L1-cache, and a slightly improved branch predictor of which the effect is probably totally negated by the higher branch misprediction penalty. This means that some software will not run - clock for clock - faster on the Prescott than on the Northwood P4. So besides SSE-3 optimized software, and software that benefits from hyperthreading, a 3.4 GHz Prescott will -IMHO- perform like a Northwood 3.4 GHz.
Basically, I expect that most games will run on it like on a 3.4 Ghz Northwood, In fact, many games are already using the CPU more and more for AI (Battlefield 1942 uses up to 25% of the CPU's clockcycles). The software where Intel is already doing well such as Lightwave, Cinema4d and 3DSMax, will show the Prescott being faster clock for clock than Northwood. With two or more threads, the extra L2-cache space will be put to good use. Intel's main objective with Prescott is getting higher clockspeeds out of Netburst without lowering the IPC.
More data in February...