1). I promise you, Steamroller (Steamroller B would be most likely to refer to a stepping, not a new architecture) is directly derived from Bulldozer and Piledriver. Clock-for-clock, Bulldozer is substantially less efficient than the old K10 Thuban architecture in most workloads. There are a handful of exceptions. For AMD to regain Thuban-level single-thread performance and scaling would be a significant improvement.
2). It does not contain eight threads. I have seen the chip. It doesn't. No Hyper-Threading, no octal-threading.
3). http://techreport.com/re...50-processor-reviewed/14 Is a better comparison for value. An AMD eight-core performs like a quad-core from Intel.
4). Expect evolutionary performance gains, nothing more.
I'm sorry you continue to believe the situation is something other than what it is. Expect Kaveri to be an evolution of Piledriver.
And 5). I agree with you regarding the importance of AMD and general misconduct of Intel. Unfortunately, all the misconduct in the world does not equate to a performance advantage for AMD.
When you measure total processing power in FLOPs AMD's HSA enabled APUs will destroy Intel's best processors.
The author and virtually every current benchmark is CPU biased.
When graphic intensive software is coded to take advantage of HSA innovations:
1) shared memory
2) h Queuing
Kaveri tests well against Intel's top of the line chips.
Kaveri will be a $150 or less chip compared to a $650 Iris Pro...
There is NO question which is the better value.
Even more important, is that AMDs' APU roadmap results in greatly increasing power thru the GPGPU processing, using HSA innovations.
MSFT is supporting HSA!
AMD has quietly informed some journalists, and will be more publicly elaborating on MSFT support soon....
Did anyone watch this Video from a top engineer at Amazon Web Services:
Cloud providers are working closely with AMD to replace Intels' overpriced solutions.
Cloud providers do NOT need Windows servers.
WHY would a top engineer at AWS be publicly supporting AMD?
Its because AWS has been working closely with AMD in the creation of semi-custom ARM 64 bit chips that will be used in SeaMicro server installations.
Google, Facebook, MSFT, are in the same boat.
The public Cloud is the biggest change in IT since the Internet became a public assess communication medium 20 years ago.
WHY do you suppose SAMSUNG, QUALCOMM, IT, ARM, have joined AMD as founding members of the HSA?
Its because HSA innovations will be the architecture of the future!
People who blindly focus on OLD benchmarks will be left eating dust!
I'm typing this up on my Nexus so I apologize if it reads sloppy. How did you get a review/marketing sample? I thought AMD wasn't sending those out until after CES and as far as I know one wasn't on display at APU13? They just showed the video. Can you post a screen shot of the task manager or CPUZ? I agree with you about being the underdog not automatically granting you some level of artificial success. You disputed my claims about Linux and I posted the real world benchmarks from Phoronix backing them up. I also included several other articles that did the same. Those are legitimate real world benchmarks however and not theorized synthetic ones that you and I know are useless outside of trying to convince the uninitiated to buy this over that. I don't agree about the single core performance between Piledriver and Deneb being in favor of the latter. That's an old claim from the hysteria that went viral when Bulldozer came out. Bulldozer sucked, let's not argue that point. In almost every real world benchmark I see or test the Vishera chip wins hands down on multi and single threading...finally. For non-gaming workloads I also disagree with the idea that the 8 core FX is on par with an i5. Did you check out the links I provided? For gaming performance right now 2 threads is almost all you need and Intel leads in single threaded performance. Except Tek Syndicate did show the FX chip beating the Ivy i5 and checking in just behind the i7-3770- I posted that link above (Logan is an Intel Fanboy who openly admits it). But for work oriented tasks like those benchmarks I shared the FX chip's 8 physical integer cores are a force to be reckoned with. Under professional grade software like Blender, Sony Vegas, or even Adobe those 8 physical integer cores give Intel's (Ivy) 8 virtual cores a run for their money since those programs are optimized to use many threads and Hyper Threading isn't that efficient. Some developers purposely disable HT in the BIOS for performance reasons. For virtual machines those 8 physical cores also shine out against the i7s (Ivy) virtual ones. You can actually pass those FX cores on to the VM. Now yes, my 3930K handily beats the FX everywhere but I paid $169 for the FX and $569 for the 3930K. At the end of the day going back to the APU and the article AMD is saying that even with the 10% clock reduction SteamrollerB is still 20% more efficient than Piledriver and the same goes for the iGPU with a 30% increase, after a clock reduction, over the A10's 8670 "Devastator" which was pre-GCN anyways. Kaveri will not be an answer to Haswell. Some sites are claiming it'll compete with a Haswell i5, but I'm very skeptical. I do think it will beat an Ivy Bridge i5 and perhaps give Ivy's i7 a run based on those increases over Piledriver and the inclusion of, again, OpenCL, HSA, & Mantle. You can't keep throwing up the "single core speed" banner when you have variables like those in play. A large part of the reason why Intel is faster is because compilers, Windows, and even some benchmarking software are purposefully optimized to favor Intel's architecture. The lesson that AMD has needed to learn is that it's not necessarily the hardware, but the software that makes the chip great. If AMD could get the same software optimization advantage that Intel has, then the performance differences between the two would shrink. This is what we see under Linux since it's a community driven neutral OS, and behold Intel's i7-3770(K) has barely any lead on the FX Piledriver if any at all. This is what AMD is doing in partnering with the HSA Foundation and Khronos with their OpenCL standard for computation. For gaming Mantle makes the CPU cores almost irrelevant. True Audio is also interesting as is the ARM co-processor. Opinions aside I appreciate the forum we have going on here. I will confess though that you're right about the threads it seems. WCCF updated the article and redacted the "4/8" they had on the slide chart. If I could find a way to upload an image to this thread I'll happily post a screenshot of the original article they posted showing the 8 threads just to back up my sanity. I think WCCF is about to be dropped from my feed...So for that point I stand corrected.
Whoa Ken...where'd you come from?! Nice addition to the discussion. I had no idea about AWS jumping ship. I deal with them quite a bit and they're a pretty big player in the arena. Many developers from what I'm seeing are hyped about HSA and the advantages it holds for future processing. Kaveri's computational power with HSA enabled is impressive. I think you meant public "access" though. It read pretty funny the first time through though regarding their "public assess". [;)]
Yeah, you have a point Neil, perhaps we take things too far [H] and it is simply a price/performance issue. Especially since users can't tell the difference for most normal everyday use anyways. It is what it is, but according to another's post: I like taking my super-charged economy car with an Edelbrock manifold and street race it with Ferraris on the weekend. So simple and normal may have already flew out the Windows......at least since "8" anyways [;)]
Oh and I apparently like pixie dust and magical Linux-Sutra to justify my hardware habits! How magical...[:D]
I was at APU13 and had time enough with some of the test beds to check their basic stats. No screenshots or CPU-Z data, but I got a look at clock speeds and core counts. The second-gen engineering samples were running slightly slower than the 3.7GHz / 4GHz model that's been forecast as the top-end part, but they were quad-core, quad-threaded chips.
If the FX-8350 competes against Intel chips in Linux, I'm not familiar enough with the Linux environment to challenge that. I'll leave that to the Phoronix people, who do it very well.
AMD has priced the FX-8350 at $169 at NewEgg. The cheapest Intel quad-core is $179.That's a reasonably good comparison for multi-threaded workloads, by which I mean I'd expect the eight-core AMD chip to perform approximately like the four-core + HT Intel CPU. As for why I compare against Shanghai, let's use Cinebench 11.5 as a good example. I choose it because it scales well, it's readily available, and the figures are widespread.
Scores drawn from http://anandtech.com/bench/product/203?vs=697
Cinebench 11.5 Single-Thread:
X6 1100T (3.6GHz): 1.10
FX-8350 (4.2GHz): 1.11
Now, divide the CB score by the clock speed in GHz to get the efficiency of the processor in this particular test.
X6 1100T =0.305.
FX-8350 = 0.264
Let's check multi-threading:
X6 1100T: 5.90
We can perform the same calculation using the total GHz speed of all the cores. For the Thuban, that's 6x3.3, for Piledriver it's 8x4.0
X6 1100T efficiency: 0.2979
FX-8350 efficiency: 0.215
We can perform this calculation with other tests. Compare the x264 encode tests, which the FX-8350 wins. Divide the frame rates by the clock speed of the chip, and the result is as follows: X6 1100T: 3.87. For Piledriver: 2.8.
Check the 7zip benchmark, which Piledriver also wins. Thuban's 18,416 divided by 19,800 = 0.930. Piledriver's 0.731.
Once we normalize for clock speed and core count, Thuban is more efficient than Piledriver in the vast majority of tests. The situation could be considered analogous to the P3 / P4 days, when the P3 was far faster than the P4 clock-for-clock, but the P4 eventually pulled ahead thanks to clock frequency and Hyper-Threading. Nonetheless, if the P4 had regained P3 *efficiency* at any point, the result would have been a far faster chip.
Piledriver is crippled by two things:
1). Poor scaling. This was an *inevitable* consequence of sharing resources. If you run four threads across four modules (1 thread per module) and then run 4 threads on two modules (two threads per module), Piledriver is 15-20% faster in the first configuration than the second. That means an eight-core Piledriver is more like a six-core Thuban, *period*, in almost every workload.
2). It's not as efficient. This is born out amply in single-threaded tests, where a 4.2GHz Piledriver matches a 3.6GHz Thuban.
Therefore: If Kaveri is as efficient as Thuban in single-threaded tests and scales like K10 in multi-threaded tests, the result will be a substantially faster processor. Even when the FX-8350 is faster than the X6 1100T, it's *not* as fast as an eight-core, 4.2GHz K10 would have been.
To sum it allllll up:
If Kaveri increases single-threaded performance and multi-threaded scaling by 15-20%, it will match Thuban on both counts clock-for-clock.
Don't be ridiculous. The point of clock-normalized comparisons is to compare the efficiency of any two chips in the same workload. I can compare Intel against AMD or the original Pentium against Haswell. Deriving numbers in this fashion does not tell me why performance looks as it does, but it's a standard method of gauging the efficiency of two processors.
If I took the derived efficiency and divided by power consumption multiplied by time-to-execute, I can calculate CPU efficiency per watt.
Your point regarding efficiency is well made. Maybe Imisunderstood you however; I thought you were talking about the overallperformance between Deneb and Piledriver. Your clock-for-clock comparison onlyillustrates one aspect, namely efficiency, of the overall performance between the two architectures. At the end of the day, as your data evenshows, Piledriver is still more powerful "overall" than Deneb, butyes, it has its own quirks since it’s a different architecture. Yes though, I completely agree with what you said about P3/P4 comparing it to BD and the K10. It's simply notas efficient clock for clock. Deneb also had better FPUs from what I can tell. Assuming the clock efficiencywill only be on par with K10, the inclusion of HSA, OpenCL, and Mantle forgaming, will theoretically solve those serial/parallel processing efficiencies. What I think many don't understand is AMD isn'tcompeting with Intel solely based on x86 raw core performance, which isessentially what Rory Read said. They believe the innovation they need to breakthrough Murphy's ceiling, and keep up with a much more resource laden Intel, isfind a way to integrate and leverage multiple system resources to work more closely in unison. Hence the much talked about buzzword "HSA". Intelis doing something similar with Crystal Well and their 128MB eDram L4 cache. Personally I find Intel’s solution very interesting and I'd like to see it compare against Kaveri with HT disabled.
Many people even within the tech industry don’t use Linux; whichis too bad and I think they’re missing out. I natively boot Ubuntu& Arch, and I run Windows just for development, and maybe a gamehere and there, in virtualization. Linux,in my opinion, is a much more optimized OS and by far the most advanced. Iadmire Mac OS and AQUA's GUI layer, but at the end of theday while it has many novelties it just doesn’t offer what Linux does...and I just can't justify the cost of a Mac anymore. I am happy that it’s allowed BSD-UNIX to return to the spotlight however. I'm not much of a Windows fan though.
Off subject - I didn't realize you were Joel Hruska fromExtremeTech and [censored]. I read your article regarding the 9590 vs. Ivy-E which was wellwritten. The 9590 is a bit ridiculous in my opinion though. I’d still go for the Ivy-E at their price points ($500for the AMD and $550 for the Intel?). What are your thoughts on the speculationsurrounding the decline of x86 and the possible ARM saturation in the low powerserver market? The latter seems reasonable, but I’d imagine there would need tobe many changes made that would cost quite a bit of money.
Thanks for the information Joel, especially the explanation regarding efficiencies which I had never considered.
Again, I wrote this at work so I apologize if it’s choppy.