Ever since I first heard it mentioned, I have been a little bit (okay maybe more than a little) fascinated by the DX/50, which has led to me reading nearly every article and piece of information on the subject that is web-accessible. If you want to know the story behind this CPU and how it ruled the roost, before fading into insignificance (this actually didn't happen until the DX4 came out, 3 years after its introduction, amazingly), head over to my (almost) definitive article on it.
There is one question I've always wanted to know the answer to, which is something that people talk about a lot: is the DX2/66 actually faster than the DX/50? There are a huge number of people saying that the DX/50 just isn't worth bothering with because of 'stability' and 'overheating' issues. While there is some truth in this (Intel themselves called the DX/50's design "problematic") it's not the full story. I've been running my system for over a year now in various forms and it's never overheated once, managing with just a heatsink. I've also had no problems at all running VLB cards at the full 50MHz, so I think you need to start out with a really good motherboard, chipset, cache and RAM to have any chance of success. Ultimately this was a business CPU, so I think it's important that the rest of the system is server-grade. I don't think a lot of people do this.
The short answer is: yes, the DX2/66 is quicker in cases where you would notice (e.g. the Doom frame rate, Windows performance). The long answer is a lot more complicated because it includes "why?", and that's what we're going to be looking at.
Frequency = Speed?
When the 486 was introduced, it initially followed the trend established by Intel's previous CPUs of upping processing power through upping the frequency (or 'speed' as people like to call it) of the CPU. The RAM, external (aka level 2) cache and CPU were all connected via a system bus that runs at the same speed. In addition to this, there were some established peripheral buses - ISA and EISA running at 8.33MHz, and MCA running at 10MHz - and they continued to run at this frequency regardless of the CPU's clock. Obviously in the case of the 486, which was introduced at 20MHz, this creates a significant gap, so the cards on those buses - such as hard drive controllers and graphics cards - were limited, no matter how fast the CPU was. There is also a performance gap between the CPU and the RAM, however: dynamic random-access memory, which is what most computers from the era used, tended to have a 80- or 70-nanosecond access time, which equates to about 12.5- or 14-MHz respectively. 'Wait states' are used, so that the CPU pauses for a period of time until data from the RAM is ready to be processed. The speed of the more expensive external cache should match the CPU frequency, which means 20ns in this case (higher performance systems were usually equipped with 60ns RAM and 15ns cache). As the gap between CPU and bus increased, it became increasingly expensive and difficult to design systems.
Double the Fun
For this reason, Intel's next step was a change in strategy: they introduced clock-doubling. 50MHz was causing problems (explained in detail in this article), so this new technique was a way of increasing the processing power without increasing the bus frequency. Clock-doubled CPUs weren't just sold in systems straight from manufacturers like their predecessors, though - Intel started selling retail models, using the name 'Overdrive'. These were expensive, drop-in replacements for an existing, slower 486 that ran at the same bus speed but processed instructions more quickly. The DX2/50 was released first, then the DX2/66. This was the first CPU to exceed the frequency of the DX/50, though it ran a more conservative 33MHz bus. What we hear from people is that the increase in CPU frequency compensates sufficiently for the drop in bus speed so that the DX2/66 turns out to be quicker overall than the DX/50 and is less of a pain to build a system around. Either way, when you have multiple frequencies in a system, the relationship between frequency and speed is no longer linear, so that terminology shouldn't be used really. What is interesting is that if you calculate the average of 33MHz and 66MHz, you get 49.5MHz.
Catch the Bus
Things gets even more interesting when you throw a local bus in there. Although the EISA bus provided the 32-bit bandwidth of 386 and 486 CPUs (and early Pentiums), they still couldn't handle the speed. Proprietary local buses emerged during 1992 and allowed certain peripherals (usually graphics cards) to run at the same frequency as the memory bus. During 1993, the VESA Local Bus was introduced and was adopted industry-wide. It was now possible to run controller and graphics cards at 50MHz, so did this give the DX/50 an edge over the DX2/66? You will hear people saying that it didn't in real terms. 50MHz is actually out of spec for VLB - it maxed out at 40MHz according to the specification - but motherboards existed that possessed VLB and a 50MHz bus, and cards were made that were 50MHz friendly if you can find them. As a result, people don't often include detailed I/O or bus performance measurements in benchmarks, instead just running the usual thing like Superscape, Speedsys and Doom, for example. I've pulled together a stupid number of benchmarks, including Windows, in order to leave no stone unturned and find out every angle of the competition between these two CPUs. Let's get to it!
Like any good scientific experiment, we should have some predictions. It's relatively easy to work out what should happen in theory, but that doesn't mean it will happen in practice. Although the actual measurements taken from each test will be reported, they will also be expressed as a percentage change. This should make it easier to compare the significance of each test.
Maths is an area I'm not great at, so I'm going to try and keep it simple. The data I'll be reporting on is the relative speed between the DX and the DX2. For example, where CPU tests are concerned, how much faster is 66MHz compared to 50MHz? In absolute terms it's 16MHz, but to find out what that means relatively, we divide 66 by 50 and multiply by 100 to get 132%. Thus:
66MHz is 132% of 50Mhz (32% higher).
50MHz is 75% of 66Mhz (25% lower).
50MHz is 152% of 33MHz (52% higher).
33MHz is 66% of 50MHz (34% lower).
- For pure CPU tests, we might expect measurements to be proportional to the internal CPU clock i.e. the DX2/66 should score 32% higher than the DX/50. If it doesn't, then we know there's another factor at play and we can look at what this might be.
- Similarly, for RAM-based tests, we might expect the DX/50 to be 52% faster than the 33MHz bus of the DX2/66, but I doubt it will be that simple.
- Where raw graphics performance is measured, again this should theoretically be proportional to the bus speed i.e. a 50MHz bus should outperform a 33MHz bus by 52%. This may not translate into programs like games, however, where a combination of factors are involved.
- Hard drive performance is a lot harder to call, especially considering the variety of configurations I will be using. The VLB controller performance could be affected by bus speed in a similar way to graphics, though the maximum performance of the IDE hard drive itself may become a limiting factor. Predicting the performance of an EISA controller is a lot more complex. Part of the reason for this is that EISA's frequency should be the same as the ISA bus (8.33MHz), but I currently have no way of measuring this. If the motherboard is using a divider, this could vary slightly. EISA is also 32-bit, however, resulting in a theoretical bandwidth of 33MB/s (though this is apparently closer to 20MB/s in reality) and I have no idea what effect a caching controller will have on performance. In theory the raw processing power of the DX2 should win out if the bus frequency remains constant.
- Performance under a GUI is highly dependent on drivers and overall system performance, so I would expect the DX2 to be faster in a configuration that is otherwise identical, particularly because its L1 cache will be faster.
Overall, I'm expecting the DX2 to edge it, though not by much. I don't think it will stomp all over the DX like I have seen people say. It's fair to say that, at the moment, I'm a bit defensive of the DX/50 as I have formed something of a fondness for it over the last few years. This may change, but at least that will be based on actual evidence than a mere perception.
I'm using the system from my 486 web server to test the CPUs. There is a video about it here, but these are the specs anyway:
Motherboard: TMC PET48PN rev 1.0, EISA and VLB, OPTi 82C682 chipset (rated for 50MHz).
BIOS: AMI 07/07/91
CPU: see below.
RAM: 8MB, 70ns, 72-pin SIMMs (with parity).
Cache: 512KB, 20ns SRAM, 15ns tag chips.*
Graphics Card: 2theMax Tseng ET4000 W32i, 2MB 45ns DRAM
- DPT SmartRAID III EISA SCSI-2 controller (PM2122 with CM4000 cache module and DM4000 RAID module) with 16MB, 60ns SIMM RAM (with parity).
- QDI QD6580 V3.0 VLB IDE controller.
- Promise DC-4030VL rev.2 VLB controller with 16MB cache SIMMs.
- Tekram DC600T ISA controller with 16MB cache SIMMs.
- Seagate ST3290A (IDE): 260MB, 64K cache.
- 2x Seagate ST32171N (SCSI): 2.16GB, 512K cache, 7200rpm.
*A brief note on the L2 cache setup. Ideally I would have chips with matching speeds, but it has proven impossible to achieve this. The tag chips came with the board and are apparently impossible to acquire, so I can't get 20ns versions of them. The cache chips are currently only available in one place in the world and they don't have 15ns versions of them, so I have to use what is available. This hasn't been a problem in all but one configuration.
Rather than just have a straight head-to-head between two CPUs, I'm going to use a selection. The main reason for this is to spot whether there are any anomalous results. Seeing consistency between CPUs of a similar type at the same clock speed should help to ensure I haven't made any glaring errors in my tests, because there are a lot of numbers to go through. I'm using the following CPUs:
Intel 486DX/50 (A80486DX-50): I have two models of this chip: the SX546 and the later SX710. The reason for this is that I had a lot of problems with the SX546 when level 2 cache was enabled - TopBench refused to run at all, for example. I sourced the SX710 and found improved stability with that one. I have a couple of theories on why that is but am yet to prove either. Theory one is that the DX/50 is very sensitive to voltage fluctuations and that they made some improvements to this in later die revisions. Theory two is that the cache chips aren't fast enough. This has been somewhat disproven by other chips running fine at 50MHz. Either way, I was able to use the SX546 for my own internal comparisons, but I'm using the SX710 for the tests published here.
AMD 486DX/40 at 50MHz (A80486DX-40): I had one of these a while back, but smoked it when I inserted it into the CPU socket incorrectly. First time (and last time, hopefully) for everything. I was troubleshooting at the time and was getting a bit frustrated, so I rushed. Still, I wanted an additional direct comparison with the DX/50 because of the issues I was having and, although AMD never released a DX/50 themselves, I wondered about overclocking a DX/40. To my surprise it works perfectly, allowing me to make like-for-like comparisons with the scores of the slightly sus Intel model. One other thing I could have done was underclock the Intel DX/50 and compare that to the DX/40 for an additional comparison, but I only just thought of it and I'm not doing all that again.
AMD 486DX2/50 at 66MHz (A80486DX2-50): although not quite as direct a comparison, this chip is just another way of ensuring that weird results don't skew things.
Intel 486DX2/66 (DX20DPR66): this is an Overdrive, which is essentially a retail CPU. It's the same as the OEM DX2, but with a heatsink glued to the die. Not to be confused with the later 'enhanced' model, which has write-back cache and performs even better.
SGS Thompson 486DX2/66: This is a licensed Cyrix chip, manufactured by ST and was originally only here to provide a further level of comparison. Turns out it's a bit weird.
For more information on any of the tools used in these tests, check out my article on Benchmark Programs for Older Systems.
TopBench 0.38: No clear winner
This is a quick and easy way to measure the effect of a component change when all other things are equal. What we see is that there is negligible difference between all the CPUs in this test. The fastest result - 212 for the Intel DX2/66 - is only 1.4% faster than the DX/50's 209, while the slowest - the 206 for the ST DX2/66 and overclocked AMD DX2/50 - are 1.4% slower. These are margins that a user would not be able to perceive, so there are no winners from this test.
I also performed this test without L2 cache to see what difference it makes.
Well, what do you know? It seems the overall performance of the DX2 is much more dependent on the system having L2 cache installed. The DX is 17% slower without cache, while it's 31% less with the DX2. That's a significant difference between the two. Overall, the DX is 21% faster than the DX2 in this configuration. I wonder how this will play out in the other tests.
Landmark System Speed Test 6.0: DX2 win (32%)
This only measures the frequency of the CPU, though it's weirdly relative to an AT in its metric, rather than absolute.
The performance difference is much more visible here, when only the CPU is being tested. Again we have perfect parity between the overclocked AMD DX/40 and Intel's DX/50. This is really interesting because the AMD chip performs flawlessly when overclocked. The Intel DX2/66 comes out on top, its CPU score being 32% higher than the DX/50. This is perfectly in line with expectations, with ST's chip only 1% behind. AMD's overclocked DX2/50 can't quite reach the same heights, though.
FPU scores are a little more interesting, with ST's chip leading the way, 39% faster than the DX/50. Intel's DX2 produces a less surprising 32% improvement. Does this mean the ST FPU is stronger than Intel's? Maybe it depends on the specific operations, but I might dig deeper into this at some point.
Without L2 cache, the performance of the DX is completely unaffected, while the DX2 is about 7% slower. Not quite the effect we saw under TopBench, but it's there. It does reduce the difference in performance over the DX to 23%
Norton System Information 8.0: DX2 win (13%)
I have no idea how SysInfo comes up with its 'Computing Index' Score, but it seems to have produced some more varied results.
The DX/50 scores lowest on this occasion, marginally beaten by AMD's DX/50 but 1.3% is within tolerance and it's possible that an average of three tests each might balance these two out. Intel's DX2 is the winner, but only 13% ahead of the DX/50 this time. Surprisingly, ST's DX2 is pretty far back by comparison, scoring only 3.4% more than the DX/50. Maybe we can find out why.
Running without L2 cache, it becomes apparent how big a factor it is in the SI test. The performance difference with the DX is 30%, which seems fair, but it's a massive 133% with the DX2! I ran this test multiple times to make sure and it's consistent, but I think it does say more about the benchmark than the CPU. I guess we will learn more as the other results come in. It also means that the cacheless DX beats the DX2 by an enormous 52%!
Chips & Technologies MIPS 1.2: DX2 win (28%)
There is an issue with this program and it occurs right about the speed of these CPUs, unfortunately. All the tests run fine on the DX/50s but the 'Reg to Reg', 'Reg to Mem' and 'Overall' scores are unreadable on the DX2s, so they can't be used. That means I have less than half the results I would like, and the ST DX2 is too fast even for the 'Integer' and 'Mem to Mem' tests, leaving only the 'General' score.
As you can see, there is almost perfect parity between the o.c. DX/40 and the DX/50, as shown previously. What we also see, however, a similar relationship between the o.c. DX2/50 and the DX2/66. They perform 28% faster than the DX/50 (using an average of the three results), which is obviously a noticeable margin. The higher score of the ST chip makes me very curious to find out just how much faster it is than the others, though we only have one result to show for it here. That shows a 55% improvement over the DX/50, so there's something interesting going on.
Disabling L2 cache produces something interesting. Although the DX, again, is unaffected, the DX2 is nearly 9% faster with it enabled. Given that this program was designed to work on CPUs before any kind of cache was commonplace, this clearly demonstrates how well the DX2's architecture has been optimised to utilise L2 cache when present - it still performs well without it, but it makes a real difference. The DX couldn't care either way. This diminishes the DX2's superiority to 21%.
CABT: DX2 win (10%)
So this is supposedly a non-synthetic benchmark, but I'm not sure how useful it is. Let's see.
Because the result is in seconds (well, less than a second), lower is better. Once again we have parity where we would expect to see it and, once again, the ST is doing something weird. It's actually 2% slower than the 50MHz CPUs. More interestingly, the 66MHz CPUs are only 10% faster than the 50MHz ones, rather than the theoretical ~33%. I guess this demonstrates a more rounded comparison, rather than it being all about raw speed.
Disabling the cache reverses the result. Here we see that it actually does make a big difference to the speed of the DX but, again, the gap is greater on the DX2. As a result, the DX is quicker overall. Not by much (7%) but, in a first-past-the-post contest, it's a win.
CacheChk 4.0: mixed results
This is where we really get down to it. Being able to see the performance differences at all three levels of memory will be really interesting.
Again, each chip at each respective speed performs almost identically (except for the ST which continues to be an oddity so I'll look at that separately). In terms of level 1 cache, the MB/s pretty much aligns with the frequency of the CPU, which is amusing. The DX2s outperform the DXs by 28% here, but why not the full 33%? One possible explanation is the actual measurable frequency of the CPUs. I used CPUCHK and found that the 66MHz CPUs were actually running at 65.4MHz. The 50Mhz ones measured at 50.3MHz. The difference between those two speeds is actually 30%, so that goes some way to explaining it I suppose.
The L2 scores show what you might expect - because the external cache is connected to the CPU via the system bus, 50MHz beats 33MHz, but only by about 8%. I get the feeling this should be higher, so something is going on here. It does demonstrate in part why the DX2 manages to outperform the DX so consistently thus far. Although the L1 cache is small, it still seems to be the greatest influence over CPU performance, especially when it's running at 66MHz.
It's a different story with RAM transfers, however, with the 50MHz bus outscoring 33MHz by 29%. There's only one scenario I can think of where this superior score would actually benefit the DX/50 consistently and that's in a datacentre, where large amounts of data are coming in from storage and cache hits are reduced. Most people would not be able to perceive such a difference when gaming, for example, because so many other factors would effectively balance it out.
So let's talk about the ST because there's some weird shit going on there. Its L1 score of 54 is barely higher than the 50MHz CPUs, which accounts for the clear discrepancies in its scores for TopBench, SysInfo and CABT. What's really interesting is that its L2 score is the best one here, 23% higher than the DX/50 and 33% higher than its Intel counterpart. This can't even be explained by it having a better memory controller because, as far as I know, this was all handled by the chipset.
We will finish here by averaging out all three scores for the two main CPUs in question. That gives us 35.6MB/s for the DX and 38MB/s for the DX2, which is a difference of only 7%. This actually masks the significant performance differences between these two chips in some tests, however. For applications where L1 performance is more important, the DX2 wins out, but for applications where L2 and RAM performance is required, the DX wins. Interesting stuff. We can dive deeper into this with the Speedsys tests later on.
The difference in the results after disabling L2 cache were negligible.
Navrátil Software System Information 0.60.45: DX2 win (13%)
I was really hoping for a more noticeable difference in these tests. The Dhrystones test shows a CPU's ability to perform integer operations, while the Whetstones test is for floating point. We have a similar pattern represented here as we've seen in the other synthetic tests: the DX2s are faster, but not as fast as you would think. Again we're not seeing the outright performance gain we might expect from a CPU running calculations 32% faster - on average they only manage a 13% improvement for integer math, while it's a little higher than that for floating point. It once again shows that a 66MHz CPU is not 32% faster than a 50MHz CPU. I don't really get it, because this test is supposed to be pure CPU - if RAM isn't involved once it gets going, why isn't the gap larger? And if RAM is involved, it's not a pure CPU test.
Side note: later developments of the Dhrystone test, such as SPECint, were designed to provide more of an overview of system performance when it was found that the original code loop could fit into cache, but they only run on Linux and I can't be bothered with that. Yet.
RAM might not be involved in the test, but L2 cache certainly is. Scores are heavily affected by the presence of cache, meaning that the DX is now 16% faster than the DX2. It's interesting that the Whetstones seem mostly unaffected by the presence of L2 cache, while Dhrystones are all over the place. The performance gain on the DX2 by having the cache there is near enough 50%, which is a crazy degree of influence really.
System Speed Test 4.78 - RAM Bandwidth & CPU Score: DX win (50%) & DX2 win (33%)
Aka Speedsys, this will allow us to break down the cache and memory scores into read/write/move, giving us more granularity from the results. I've chosen to do this in multiple charts, though, for presentation reasons.
Apologies for this chart: the units on the left are different for both series, but the metric is similar. The left bar for each CPU is the RAM bandwidth in MB/s, while the right bar is the CPU score. It was either this or two charts. As you can see the 50MHz bus kicks the 33MHz bus's butt, to the tune of 50%, almost the full speed benefit of the bus as you would expect.
The CPU score of 24.6 for the Intel DX2/66 is a full 33% higher than the DX/50's 18.4 and I think that's the first time we've seen that borne out in any of the tests. It's almost like the tests Speedsys does are contrived to show the 'correct' score for common CPUs. What disproves this is that the pretender of that group, the o.c. DX2/50, only scores 29% faster than the DX, while the ST randomly scores 47% higher! Again, if the CPU test is only testing the CPU and not other components that are tested elsewhere, why would the Cyrix core, running at the same frequency as the Intel part, outperform it so significantly? It would be nice if we understood more about the code that these tests are running.
Disabling L2 cache didn't affect the memory bandwidth and while it had little effect on the score of the DX, there was a noticeable drop in the DX2's score, confirming what we have already seen. No chart required for this.
System Speed Test 4.78 - L1 Cache: DX2 win (10% avg)
Moving onto L1 cache, let's start with the left column of each CPU, which is cache reads. The Intel DX2/66 is the clear winner here, outperforming the DX/50 by the magic 33%. What astounds me most about all these tests is the parity between Intel's DX and AMD's. They have been indistinguishable all the way along. AMD's DX2 has also been keeping pace with Intel's, though the latter does seem to have the edge. Despite running at the higher internal clock speed, the ST DX2 scores much closer to the DX on writes, which is odd, but interesting.
The next two columns are cache writes and moves, and we see results flip the other way with the DX beating the DX2 by 51%. If we compare the averages of all three (column 4), the DX comes out of it with an 11% win overall.
One other thing that's clear from the charts is that, while cache reads are faster with the DX2, the gap between reads and writes is significantly larger as well. If we look at the 16.9MB/s of the Intel DX2, that's 73% of the read score of 62.3MB/s. Compare that with 25.4MB/s vs 47.3MB/s for the DX, which is only 54%. Did Intel work out that prioritising cache reads was the way to go?
What we can see illustrated here is the relationship between L1 and L2 cache. Read and write operations are largely unaffected across both CPUs but, if we look at moves, we see a big shift: a 135% improvement to be precise in both cases when L2 is present. On average, the performance of the DX is faster than the DX2 with L2 cache disabled, albeit by 4%.
System Speed Test 4.78: L2 Cache: DX win (26% avg)
Looking at L2 cache now, you can see the 50MHz CPUs come out on top across the board here, aside from an anomaly or two. Starting on the left column with reads, the DX is only 5% faster so that's a draw. Moving along to writes, we see much the same result as with L1 cache, being 51% in the favour of the DX, while moves are 34% in its favour. As a result, the average scores give the DX a 26% win.
The ST DX2 is the stand out CPU here, as we saw under Cachechk. Although writes and moves are in-line with the other DX2s, reads are really quick, 29% quicker in fact. Does this explain how it came out on top in the MIPS tests? The only way that would make sense is if the instructions being run are too large for the internal 8KB cache, requiring frequent reads of the external cache. Weird.
Similarly to the L1 cache, we see a greater disparity (44%) between reads and writes on the DX2 than we do with the DX (19%). If this was purely down to the chipset it would be constant, so it does have something to do with the CPU.
Obviously there are no results for this test with L2 cache disabled.
System Speed Test 4.78 - RAM: DX win (39% avg)
As expected, 50MHz wins again in each test: 26% better for reads, 48% for writes and 46% for moves. That's 39% faster on average. The read/write disparity continues, but inverted so that writes are faster than reads. The gap on the DX is 39%, while it's only 18% on the DX2. Somehow they are identical on the ST, another unusual showing.
Without L2 cache, each CPU is marginally quicker than when it's enabled, but not worth displaying here.
Graphics (Frames per Second): no clear winner
So let's stop faffing about with these memory tests and look at some actual, you know, applications. Here we have five benchmarks that produce an FPS score, giving us a much more real-world view of things.
The first column is 3D Benchmark VGA 1.0 aka Superscape. The DX2's 41.6 does beat the DX's 40, but not by much: a difference of 1.6 frames per second is only a 4% improvement, so no one's going to notice that.
Next we come to the most useless benchmark of them all, PC Player Benchmark 1.0. Given that this program was released at some point in late 1996, it's about as appropriate a benchmark for 486 CPUs as Quake is, but it doesn't stop people from running it routinely. Obviously I have, for comparison's sake, but this will be the last time I use it on this generation of CPUs. The difference of 0.4 in the scores also equates to a 4% difference, which is again not something the naked eye would be able to perceive.
Much more relevant is Doom. Interestingly, both the 50MHz CPUs struggled with this, crashing frequently at various points in the benchmark by either bombing to the DOS prompt, hanging the system completely, or producing an error message. Disabling L2 cache stabilised the system, but those results can't be used in a like-for-like comparison. (Ironically, the DX2 produced similar results when L2 cache was disabled!) Anyway, the DX produced a score of 22.7fps, which surprisingly beats the ST DX2 by 1.2. The Intel equivalent takes the highest score, however, with 25.3, breaching that magic 25fps boundary. Still, with only a 1.6 improvement, it's not far ahead at all and equates to about an 11% improvement.
Chris's 3D bench produces no surprises, with the ST DX2 again only able to match the performance of the DXs. The others aren't much better, with the Intel DX2 only managing a 3fps improvement over the Intel DX. That's 12%.
Finally we use an older graphics engine to compare the chips with Wolfenstein 3D. There's really very little variation at all in the results, enough to suggest that the engine can't run any faster than that. Not so, as an 80MHz DX2 can exceed 80 FPS, so the small variation is legitimate. As you can see, the DX/50 comes out on top with 65.7, while the DX2 languishes 2.5 FPS behind. Not quite as big a gap as we saw with Doom and not really noticeable to the naked eye, but the DX is 10% faster.
Once again we see that disabling L2 cache reverses the result - the DX wins at everything. So, it seems that the various strengths and weaknesses of each CPU balance out when it comes to gaming.
Graphics: Head to Head
This is a good point to bring in some results from some other people: HighTreason and Ancient Electronics have done videos comparing the DX/50 to the DX2/66 so we can compare all three sets of results and make sure I haven't fucked anything up in a big way. The only tests that all three of us have published are Superscape (left column) and Doom (right column).
Nice: it seems we have parity. My DX/50 runs marginally quickest out of the three. That's probably down to the graphics card. When it comes to the DX2, however, my motherboard appears to be doing it a disservice. HighTreason's score for Superscape is 2.9 FPS ahead, while Ancient Electronics' is 4.6 FPS ahead. I can't think what could cause this if the CPU is the only thing that has changed. Thankfully, Doom is a lot closer, with HighTreason's result only marginally ahead of the others.
I know for a fact that the DX2/66 can run a bit faster on my system with tighter memory timings and zero wait states because of its 33MHz bus - the DX/50 won't boot with these settings. Either way, the improvement is still very small so I don't know what's going on there. Could be a chipset limitation.
Morph 3D: DX2 win (25%)
This is a handy little graphical benchmark, written in assembler so it's very quick. It doesn't make big demands on the graphics subsystem, so I would expect it to be mostly be relative to CPU speed. It also crashes with a divide by zero error a lot on the 50MHz bus. This is interesting, because it suggests something is having trouble keeping up. I don't think it's the graphics card, because we know it can handle the bus and it's not a a complex program, so that suggests something else, either the RAM, the cache, or the CPU. Given that the CPU is meant to run at this speed and given that it works fine when L2 cache is disabled, I think it's the cache. I just don't know why.
All I can really say is, WTF? the AMD and Intel DX/50s are closely matched, as are the AMD and Intel DX2/66s. But the ST? It's 73% faster than the Intel DX2! At first you will think that I messed something up. Nope, I checked and triple checked - the ST actually does run the program that fast. There is nothing in our other results to explain this so, without access to the source code, I'm at a loss.
With L2 cache disabled, there is a negligible difference between the results of both the DX and the DX2 - the difference is still 25% - so there's nothing to see really. That suggests the code can fit into L1 cache.
VidSpeed 3.1: DX/50 win (54%)
Originally I used the results from the Landmark video test and the VESA memory bandwidth test from Speedsys, but there is a better test that serves the same purpose and enhances it: VidSpeed 3.1. There is another program, called VMax256, which does a very similar thing - the charts looks exactly the same, which is encouraging.
The three columns are video memory transfers at 8-bit, 16-bit and 32-bit resolutions respectively, measured in MB/s. The first time I made this chart it didn't look right at all. Results had obviously been skewed somehow so I had to re-run a bunch of them to make sure. As suspected, there had been some inconsistencies in the settings I'd used when testing, specifically to do with the graphics card wait states. I have run so many tests and there are so many bloody jumpers on this system that all it takes it to forget one and it ruins everything until it gets noticed. I knew testing with multiple CPUs of the same speed was a good idea!
Anyway, the important info here is that the 50MHz bus is 54% faster than the 33MHz here.
Disabling L2 cache has a dramatic impact on video memory performance, it seems. A cache-less DX still outperforms the full-speed DX2, however, and look at how pathetic its scores are without cache - that's a 25% drop, which is actually in-line with the performance of an SX/25!
In both charts there is a clear difference between the two bus speeds, demonstrating the benefits of the VESA local bus. The DX running at the full 50MHz is streets ahead of the DX2's 33MHz bus, 48% in fact. This also demonstrates the abilities of that Tseng ET4000 W32i. Actually, let's just take a moment to appreciate the raw performance of this card.
As you can see, it wipes the floor with the other two VLB cards I have and is nearly twice as fast in places. The performance of the S3 is unsurprising, given that it's essentially an ISA card with a VLB connector, but the Cirrus Logic offering doesn't do much better, apparently being incapable of 32-bit wide transfers. I think the overall performance difference comes from the 45ns RAM, plus the interleaving, which effectively doubles the speed when compared to a card without it. The particular model of ET4000 I have is by 2the Max and has a number of jumpers for configuring wait states from 0 to 3. As an aside of this aside, we can take a look at the effect of these here.
This clearly shows there is a tangible benefit from running at 1WS if stable, with gains in the region of 23% on average.
Hard Drive Tests (EISA & SCSI): DX win (22% avg)
From this point it's a straight-up comparison between the DX and the DX2. I didn't mind testing the performance of multiple CPUs for the other benchmarks, as I wanted to do a definitive comparison anyway for another project (I actually did 25 separate benchmarks for 12 different CPUs in a total of 28 configurations - that's another article), but there was no point doing multiple hard drive tests with all of them.
I used a few programs to test hard drive performance with the EISA caching controller configured for RAID-1 and then RAID-5, mostly because I wanted to see the difference between the two, and because I wanted to learn what the programs were measuring. I should mention that, much as I would have liked to use period-correct drives for these tests, those familiar with SCSI gear will know that drives from 1993 aren't particularly easy to come by and, when you do, they rarely work. As such, the drives used here are from about 1998, because that's what I could get my hands on.
The first thing I noticed is that, if you're only using Speedsys to measure your drive interface performance, you're not getting the whole picture. I don't know why the buffered Speedsys score is so much lower than the other cached results, but it's consistently about 20% less. I also don't know why 4Speed reads data twice as quick as the linear Speedsys test. Either way, it demonstrates not only that the results are consistent and reliable, but also that there is always a benefit from running multiple programs. Although I've included the results from the linear tests in the chart, they are not included in the overall results because they are bottlenecked by the drives, so the bus makes little or no difference.
The second thing is that there is a clear performance benefit to be had from the higher bus speed when caching is used: about 22% on average. RAID-5 at 50MHz is the fastest configuration overall managing well over 20MB/s, with RAID-1 at 50MHz closely resembling the score for RAID-5 at 33MHz. RAID-1 at 33MHz is 'slowest' at about 15MB/s. I have to admit I'm surprised that RAID-5 is faster - I would have thought that mirroring would be quicker because there's no parity involved. Technically both configurations should perform the same, because the bus sees the cache instead of the drives.
Clearly there was a massive benefit from using a caching controller with hard drive technology of the time. QDIMark and CoreTest are clearly reading data straight from the cache, because over 20MB/s for ~1992 is insane. I'm pretty sure that's saturating the bus.
If you're wondering where the 'write' scores are, these programs don't provide that kind of test. If I wanted, I could find or make a method that does, but you may be unsurprised to hear that I've honestly had enough by this point.
Hard Drive Tests (IDE): DX win (15% avg)
I was really interested to see the performance comparison between caching controllers on each bus, especially when compared to a more bog-standard controller of the time. Luckily I have a period-correct drive for these tests, so the real-world comparison was as accurate as possible. It's from 1993, which apparently performs at PIO 0 rates (pre-standard, I guess). It also has only 64K of cache on-board, because SRAM prices.
As you can see, the performance difference is pretty stunning, not so much between the two CPUs, but between the interfaces. SpeedSys isn't included this time because the tests refused to work on any controller with the IDE drive I'm using, but their usefulness has been proven as limited anyway. The tests did run, but produced zero data.
First let's compare the difference between the CPUs. For basic ISA, the DX2 scored 1.36MB/s on average, while the DX/50 scored 1.73MB/s. That's 27% faster, which is definitely noticeable. Interestingly, the difference when using a caching ISA controller is diminished: the 50MHz bus producing only a 15% improvement.
The difference is even less marked using VLB, however. The DX averages 2.74MB/s, versus 2.6MB/s for the DX2, which is a gap of only 5% and no one is going to feel that. I'm guessing the hard drive bottlenecks the results here.
Using the caching VLB controller, the DX is fastest again by about 16%. Part of the reason for the diminished impact of the bus is that the controller wouldn't initialise at 50MHz, so I had to enable wait states. It could be argued that this isn't a like-for-like comparison, but it accurately demonstrates one of the main reasons why the 50MHz bus was phased out, and why 33MHz succeeded. It may not be like-for-like, but it is real-world.
If we average out the results for all the tests, we get 4.33MB/s for the DX2 and 4.99MB/s for the DX - a win of 15%.
Anyway, the other thing I'm interested in is the performance differential between the interfaces. Using a caching ISA controller gets you a significant 54% bump over plain ISA with the DX2/66, but only 39% with the DX/50. Using a standard VLB controller produces more of a benefit, but is only 30% faster than ISA. One thing that's really important to note here is that you only get this bump if the driver is loaded for the interface, otherwise it will perform almost exactly the same as ISA. Clearly VLB is more than 30% faster than ISA, but the hard drive must again be the bottleneck.
If we compare the standard VLB controller to the caching version, however, the difference is huge: over 800%, in fact, which really shows how badly VLB was needed back then. If we compare the caching controllers against each other, VLB is nearly 500% faster using the DX2 compared to non-cached. Considering that VLB is about 400% faster than ISA in clock speeds alone, it shows how handy these caching controllers were.
There's only one question I still have here: EISA vs VLB.
Well, there you have it. In its fastest configuration, my caching SCSI controller on the EISA bus is nearly twice as fast as the VLB equivalent. I say equivalent because I would love to have a VLB SCSI controller, but I don't. I'm a bit confused by this, to be honest, as both buses are 32-bits wide, so VLB should be faster purely based on the clock difference. Given that RAID-5 was faster than RAID-1, we know that drive configuration has an effect on transfer rates, even when cache is involved. It's very likely that the drive is, again, the limiting factor here and that if I were to use a later IDE drive it would ramp up the score significantly. This would also be unrealistic for the time, but I like to push things to their limit. That's a test for another day!
I could be done with this article by now, but no - I felt that including Window tests was important because it's the OS that most people interested in productivity would have been using back then. There are some extensive tests out there by publishers of PC Magazine but, fuck me, they take a long time and I cannot be bothered with all that. I did try some out and encountered some parity errors, crashes and the like so I'm not getting into troubleshooting all that shit.
I used a fresh installation of Windows for Workgroups 3.11 and installed the most recent driver for the ET4000 W32 graphics card (there are a few, and they all perform differently). I also installed the 'enhanced driver' for the hard drive controller and it halved the performance, so I removed it and decided it was an unnecessary complication at this stage. 32-bit disk and file access were enabled. Windows refused to run on the DX/50, however, while L2 cache was enabled, so I also performed tests on the DX2/66 without L2 out of curiosity.
Windsock 3.3: DX2 win (27%)
Well then. Once again we can see how significantly the DX2's performance is enhanced by the L2 cache. If we compare the overall scores, it's about a 27% improvement over both the DX and the cacheless DX2. Graphics performance seems to be most strongly affected, so I'm going to go out on a limb and say that GUI performance is very cache-sensitive. Considering we saw in previous tests that the DX/50's performance is barely affected by L2 cache, I would predict that, even if we could get it working with L2 enabled, it wouldn't affect the numbers all that much. Not being able to run the current Microsoft OS in optimal state is a large strike against the DX, though.
In the like-for-like cacheless tests, RAM and HD performance is stronger on the DX, but the full-speed DX2 is so much faster in the important areas that it doesn't come close to compensating.
WinTach 1.2: DX2 win (42%)
This is a purely graphical benchmark and is particularly useful when testing drivers or comparing different graphics cards' performance. All tests were performed at 800x600 resolution with 256 colours.
While the DX does indeed out-perform the DX2 on equal footing (by 27% overall), the performance of the DX2 with cache enabled once again blows everything else away (42% compared to the DX). The greatest margin of 51% is actually with CAD drawings, which is no surprise given the superior maths performance of the clock-doubled CPU.
WinTune 2.0: DX2 win (22%)
Windows Magazine released this program for readers to diagnose common performance issues with their computers. It performs some benchmarks and recommends improvements that could be made to the system configuration. The first couple of tests are Dhrystone and Whetstone.
Both CPUs without cache are mostly similar, with the DX/50 nudging ahead in MIPS, but the DX2 stamps its superiority with a 32% improvement in the Dhrystone and 34% in the Whetstone tests respectively with cache enabled. That's what we call a 'no-brainer'.
In non-graphical tests, we can see that the DX/50 isn't actually that far off the DX2/66 but it's definitely behind, by 11% on average. The thing about Windows in particular is its reliance on SmartDrive, its software-based caching system. This gives the superior processing speed of the DX2 the performance edge in disk performance, which is the opposite of what we saw in DOS. In the uncached disk test, the 50MHz bus is 13% ahead, but it loses out in all other areas.
Here's all the results in one place so they can be easily browsed at speed. I have included which CPU 'won' each test, so we can produce some kind of overall score, and colour-coded the DX as green and the DX2 as red. If any CPU wins a test by 5% or less, this is considered a draw. It's an arbitrary amount, but I don't think anyone would disagree that differences in performance are almost impossible to perceive at such a small margin. Some could argue that a 10% difference would also be hard to perceive but that's more open to debate.
TopBench 0.38: draw (no L2: DX 21%)
Landmark: DX2 32% (no L2: DX2 23%)
SystemInfo: DX2 13% (no L2: DX 52%)
MIPS: DX2 28% (no L2: DX2 21%)
CABT: DX2 10% (no L2: DX 7%)
L1: DX2 28%
L2: DX 8%
RAM: DX 29%
Avg: DX2 7%
NSSI: DX2 13% (no L2: DX 16%)
- RAM Bandwidth: DX 50%
- CPU Score: DX2 33%
- L1 reads: DX2 33%
- L1 writes: DX 51%
- L1 moves: DX 51%
- L1 avg: DX 11%
- L2 reads: draw
- L2 writes: DX 51%
- L2 moves: DX 34%
- L2 avg: DX 26%
- RAM reads: DX 26%
- RAM writes: DX 48%
- RAM moves: DX 46%
- RAM avg: DX 39%
- Superscape: draw (no L2: DX 16%)
- PC Player: draw (no L2: DX 12%)
- Doom: DX2 11% (no L2: DX 11%)
- C3DBench: DX2 12% (no L2: DX 6%)
- Wolf3D: DX 10% (no L2: DX 23%)
Morph 3D: DX2 25% (no L2: DX2 26%)
VidSpeed 3.1: DX 54% (no L2: DX 48%)
Hard Drive Tests (EISA & SCSI):
- RAID-1 (cached): DX 26%
- RAID-5 (cached): DX 18%
- Uncached: draw
Hard Drive Tests (IDE):
- ISA: DX 27%
- ISA (cached): DX 15%
- VLB: draw
- VLB (cached): DX 16%
- CPU: DX2 29%
- Video: DX2 43%
- Disk: DX2 26%
- RAM: DX 43%
- Overall: DX2 26%
- Word: DX2 44%
- CAD: DX2 51%
- Spreadsheet: DX2 37%
- Paint: DX2 45%
- Overall: DX2 42%
- Dhrystone: DX2 32%
- Whetstone: DX2 34%
- RAM: DX2 8%
- Video: DX2 31%
- Disk (avg): DX2 13%
- Disk (cached): DX2 17%
- Disk (uncached): DX 13%
It's like a game of two halves: a sea of green under DOS shows that the DX won most of the tests, 20-13 in fact. 5 of those tests were drawn. But when we come to Windows, the DX2 cleans up 2-15. Overall that's 22 wins for the DX/50 and 28 for the DX2.
The DX/50, released in 1991, occurred between the releases of Windows 3.0 and 3.1. It was a time when GUI-based operating systems really weren't established on the PC, and performance graphics cards weren't a thing most people could get their hands on either. VESA Local Bus came along over a year later and changed all that, of course, but the ISA bus was what most people had to put up with. By the time the DX2/66 was released in November 1992 everything had changed: local bus graphics was becoming widespread and Windows 3.1 was well established - it makes sense that Intel's engineers will have optimised the DX2's design to suit such an operating systems. DOS was very much still in use, particularly for gaming, but productivity under Windows was where it was at for home and business users. The raw bus performance of the DX/50 will definitely have made a difference in a server context, especially in multi-CPU systems, but the DX2 could perform most calculations faster and it shows.
Crucially, it would seem, systems with larger amounts of L2 cache became more prevalent as SRAM costs came down. It was not uncommon for 486 systems to have no L2 cache at all when the DX/50 was released, whereas 256KB was common on DX2/66 systems. The tests here show that the DX/50 (and thus other earlier 486 models), were not at all optimised to take advantage of larger amounts of L2 cache. They also clearly show that a DX2/66 coupled with L2 cache was a killer combo, and seems to have been something of a watershed moment for the development of the 486. In this sense, the DX2 is almost a next-generation CPU compared to the DX
I am still very fond of the DX/50 because it's quirky, daring and weird. It's also unstable in a number of settings, but these issues can be overcome. With the DX2/66, it just works and it works really fucking quickly. To me that makes it a bit boring, but that says more about me.
Still, I think it's unfair of people to say that the DX/50 isn't worth bothering with. For me the whole point of this silly hobby is taking the opportunity to experiment with hardware that was completely unobtainable at the time it was released. I was 11 when the DX/50 was released and systems sporting that CPU cost upwards of $8,000 - there is no way I ever could have experienced such a thing back then. Sure, if you want to build a relatively early 486 system using original hardware, the DX2/66 is your best bet - it's widely available, performs great and won't give you any hassle. But where's the fun in that? I've had an absolute nightmare running all these tests, getting that obscure RAID card working and all the other issues I encountered, but I loved every minute of it. Nothing else out there can give you the experience a 486DX/50 can when coupled with weird L2 cache, EISA, VLB and a multitude of jumpers.
I realise that no one asked for this article to be written. In fact this is probably of interest to about 10 people out there, but I hope that someone has learned something from these efforts. As ever, I do my absolute best to only say something if it's true and verifiable, but I would love to know if I've messed something up - feel free to leave a comment or I'm also active on Twitter. I hope that if you have enjoyed reading this you might consider buying me a coffee via Ko-Fi. I promise to give 20% of every donation to the Internet Archive, without many of my articles wouldn't be possible. Thanks for reading :)
Intel 486 & 486 POD (Pentium Overdrive) CPUID, S-spec, & Steppings (chipdb.org)
Tseng ET4000 W32i Datasheet (dosdays.co.uk)
The good old and glorious 486 DX 50 - and - rant against 25 MHz buses in the Pentium era (Vogons)
Does the Cyrix Cx486DX-50 actually exist? (Vogons)
486DX-50 MHz EISA system (Vogons)
Intel 486DX 33 MHz vs Intel 486DX 50 MHz (Vogons)
WhatANerd's "Aerospace" 486DX 50MHz (Vogons)
MB and RAM for 486 DX-50 (Vogons)
Which Intel DX-50 CPU to get? (Vogons)
The motherboard from Hell. Intel XBase DX50 server and no POST madness (Vogons)
does this look right for a 486dx-50 system (Vogons)
HP Vectra VL2 4/66 Hard Drive Access "Stuck" Issue (Vogons)
My Ultimate VL/EISA 486 (Vogons)
30/4/23 - First version published.