Archive for 'niagara'
First results from the Niagara Benchmarking
In order to make sure that the real benchmarks are as efficient as they can be, we’ve repeated our usual procedure using dd to determine the most efficient buffer size on the platform. More details about that procedure can be found in my earlier mis-titled blog post on scheduler benchmarking.
For the sake of comparison, I’ll repeat the results from our dual Itanium, which has 32GB of memory;

The important information to derive from the graph is the smoothness of the lines (which is a function of how well the scheduler and VM perform) and the absolute value of the bytes/sec number. The Itanium box can push about 3.5 x 109 bytes per second, or 3.5 Gigabytes/sec which is 28 Gigabits/sec. Now bear in mind that the procedure involved is not multi-threaded or even multi-process, so we can very generously guess that the dual-CPU system could push about 56 Gigabytes/sec of pure I/O throughput, completely ignoring the overhead implicit in multi-CPU I/O scheduling.
The benchmarking process relies on the presence of a version of dd with the dd-performance-counter patch from Debian, which the Sun box doesn’t have. Luckily however, SUN now have much of the source for Solaris online, so I popped over to the OpenSolaris code browser and grabbed a copy of dd.c and based on the Debian patch came up with the following patch. I also modified our dder.sh script a little to cope with the lack of seq;
#!/usr/bin/bash
STARTNUM="1"
ENDNUM="102400"
# create a 100 MB file
./dd bs=1024 count=102400 if=/dev/zero of=local.tmp
# Clear the record
rm -f record
# Find the most efficient size
i=$STARTNUM
while test $i -le $ENDNUM; do
./dd bs=$i if=local.tmp of=/dev/null 2>> record
i=$(( $i + 1 ))
done
# get rid of junk
grep "transf" record | awk '{ print $7 }' | cut -b 2- | cat -n | \
while read number result ; do
echo -n $(( $number + $STARTNUM - 1 ))
echo " " $result
done > record.sane
and after a few hours of running, here is the result;

So, we have a graph which is very similar in shape to the dual-Itanium box, except it’s a whole order of magnitude less in raw throughput terms. As we’ve seen above, a process could push up to 3.5 Gigabytes/sec on the Itanium box, on the Niagara box that becomes .34 Gigabytes/sec or about 2.72 Gigabit/sec. Now, the Niagara box is virtualised in that the process runs on one logical CPU, the Niagara box has 32. So if we are going to extrapolate from there, and do the same generous guess as we did for the dual-Itanium box, we’d get 87 Gigabit/sec, again completely ignoring the multi-CPU overhead.
Now bearing in mind that the Itanium box would have to deal with the overhead of managing two physical CPUs and the Niagara box would have to deal with the overhead of managing 32 logical CPUs on 4 physical CPUs there probably isn’t very much between them in reality, in terms of how much raw overall I/O they can push - though if I had to guess at this stage which would win, I’d say the Niagara box - but hopefully we’ll get much more meaningful information over the next few weeks.
Either way, both systems can probably comfortably saturate a 10Gigabit/sec interface and can certainly have a single process saturate a gigabit interface, which is all they ever have to be engineered for, beyond that the number doesn’t matter a whole lot, unless you’re running a very very busy database server. But this information is still very useful. For one thing, it gives me some confidence that with a properly tuned Apache build we can blow SUN’s own benchmarking numbers away, this system looks like it’s capable of very decent I/O performance. It also confirms the architectural and engineering decisions SUN says they’ve made. This system is architected for paralellism, it’s not supposed to have super amazing performance for any single-process task, it’s designed to be able to run lots of those tasks better than anything else can, all at the same time.
They’re not lying, this is very different from hyperthreading. With hyperthreading, we don’t see anything like these results, when we run our graphs we get plots like this;

and it barely changes when we turn hyper-threading off. Hyper-threading seems like a convenient interface to enable better pipelining, and there’s nothing wrong with that, but if you’re running one process it won’t make a difference. Niagara on the other hand seems to behave just like a load of individual CPU’s, with a lot of less cross-subsidisation (if any). So, if you really want amazing single-process performance, or you have a requirement for a single process to be able to sustain say a 10Gigabit/sec download then Niagara is definitely not the right platform, but then SUN don’t claim that it is. SUN have made the design choice to really build for multi-threaded systems and our benchmark here seems to validate that.
The only other information that can be gleaned from our graph is that the lines are slightly less smooth than on the Itanium box. Only a little, and frankly I’m surprised they’re as smooth as they are considering the amount of virtualisation which is going on. The graph is still vastly more smooth than any x86 plot we’ve ever performed and about the only conclusion we can make is that if you had some real-time task that was sensitive to the microsecond, Niagara probably wouldn’t be as good a choice of platform as Itanium, though still much much better than x86. Again, this is not the market SUN are aiming Niagara at, and we’re really just validating some engineering choices here.
So, armed with our new knowledge of the systems potential, we’re going to really put this system to task and hopefully get the most useful information of all out of it; Just how much real network throughput can it manage, and just how many concurrent downloads can it really handle.
Update:
O.k., so it looks like the roughness of the graph can be neatly explained by the Solaris scheduler granularity. Reading about it here with pointers from Paul reveals that by default ordinary processes have a 100HZ scheduling period, contrary to my statement about real-time applications there are APIs available which expose much higher frequency scheduling, so ignore that part. Also I’m informed that the Niagara SMT is also a means to increase pipelining efficiency, but that the pipelines are shorter and the switching latency much smaller. That does seem to be born out by the above.
Sun Goodness
Following on from my previous post about our experiences trying to get a Sun server, I got some great help from some SUN employees, not least Paul Jakma (of Quagga fame) and after filling in one form and posting a comment on Jonathan Schwartz’s blog, and telling a small fib about what country we’re in, the T2000 arrived yesterday.
Right now, I’m on study leave from work, but I’m in a few half-days and plan to steadily put the system through its paces. We went for the T2000, with 16Gb of memory, and prtdiag shows me a grand total of 32 logical processors, nice!
MB/CMP0/P0 0 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P1 1 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P2 2 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P3 3 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P4 4 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P5 5 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P6 6 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P7 7 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P8 8 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P9 9 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P10 10 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P11 11 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P12 12 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P13 13 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P14 14 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P15 15 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P16 16 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P17 17 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P18 18 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P19 19 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P20 20 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P21 21 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P22 22 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P23 23 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P24 24 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P25 25 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P26 26 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P27 27 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P28 28 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P29 29 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P30 30 1000 MHz SUNW,UltraSPARC-T1 MB/CMP0/P31 31 1000 MHz SUNW,UltraSPARC-T1
I’m going to have to install some more software (not least Apache!) and apply some tunnigs before it can be benchmarked properly, but already we’re getting some useful information. We’ve plugged the system into a metred Power Distribution Unit (PDU), to get a sense of how much current it draws, and while the meter is not granular enough to tell me what each port uses, here’s the graph of before/after plugging it in;
Plugging the T2000 in is almost exactly half-way through the graph, and you see a small spike there as the system controller comes online. The next ramp then is when I issued the poweron command and the whole system came onine. The power unit only measures in .1 AMPs, and the step correspondonds to an increase from 5.8 amps before plugging in the unit to 6.8 amps after it is fully powered on.
So, 1 Amp at 220 volts is 220 Watts +/1 10% given the accuracy of the unit. That’s pretty good for a beefy server. The Niagara platform is not the subject of SUN’s famous adds comparing their power usage to Dell’s - those were x64 servers - but still it’s good to see that power usage has been kept in trim. I remember when E450s would guzzle many times that number of watts. Our Dell 2850’s use about 290 Watts each +/- 10%, when they’re not busy, for the sake of comparison.
I’ll keep blogging our results from all of the testing and benchmarking as we go through it, and since I’m due to talk at the Irish System Administrators Guild this tuesday, I’ll probably include a lof of the results there too.
Busy, Busy, Busy
I didn’t blog at all during January, and I didn’t get to code as much on Apache stuff as I would have liked either, and it looks like it’s going to be like this for a while now, and I’m now finally at a stage why I can explain why I’m so busy!
We’re building a new data-centre, over a pretty short period of time (has to be occupyable - by servers - on May 1st) and believe me, this is no small amount of work. We’ve been running tender evaluations, designing cabinet layouts, working out budgets, negotiating SLA’s and contracts and lots more besides. As the build progresses I’ll try to blog about it, including photos and so on. We’re doing some things in a little bit of an unusual way and I’ll try and explain our reasoning along the way. Hopefully this will prove of use to others too.
But before that, I should cover a little of what I’ve been up to in the last 6 weeks.
3 weeks ago, I went over to visit Nóirín, and we went up Zugspitze where we had a great, if somewhat cold, time spending the night in an Igloo! You can read Nóirín’s write up on it here and there’s a bunch of photos too.

While in Munich, we also caught a Jacques Loussier gig, which I thought was a bit odd to be honest, but was good to have been at nonetheless.
On the Digital Rights Ireland front, we’ve been working hard to be in a position to accept donations as well as handling some behind-the-scenes legal work. This week, along with the TCD Dublin Legal Workshop, we’ll be hosting a talk from Suw Charman on Friday. This should be great, and if you can make it all, please do. The DRI invite and write up is here.
On the Apache front, one of the really annoying aspects of being so busy is that I havn’t been able to find the time to do much coding, I had to back out of the execd work I had started, but hopefully in a few months I’ll get a chance to get back to it. To mitigate my own sense of guilt over this, I volunteered to RM the latest httpd 2.0.x release, and I’m glad I did. It’s a lot of work, it took me at least 60 hours to get 2.0.x into a releasable state (we’re now waiting on some licensing issues to be clarified before a candidate is rolled) - but unlike coding, this work can be easily broken up. It’s possible to do 15 minutes or an hour here and there and have it all add up productively.
When I code, I need to do it in large uninterruped blocks or I lose my concentration and start being unproductive. If you’re involved in any Open Source projects, I’d say volunteering to RM is a great way of contributing when you don’t have the space to get a load of coding done, but want to help nonetheless. Though get very very familiar with how to manage code merges!
I’m considering proposing two talks for ApacheCon Europe, but before I do, it’d be useful to hear any feedback on what people would like to hear, my current ideas are:
- Scaling Apache httpd to 50,000 concurrent users
This talk would be an update of the talk I gave last year, only now with even bigger numbers. It would include the standard tuning/benchmarking basics but also new things like the pluggable schedulers in Linux, the siege utility, the event MPM in much more detail (and how it improves performance over worker), the new graceful-stop feature and how that helps, our experiences on the Itanium platform and Itanium-specific tunings and a bit on mod_ftp thrown in for good measure.
- IPv6 at the ASF
This talk would be a few things in one. A brief introduction to IPv6 from the point of view of a typical user of ASF software (mostly server software), the common platform bugs and how to avoid them, a survey and report of IPv6 support in all ASF software (I pretty much have this part done), and then some details on IPv6 from an ASF developer point of view, what’s needed and so on, using APR as an example (we have a load of bug-workarounds in the APR IPv6 code - it’s one of the best sources of platform bug documention).
If these interest you, or turn you off, or if you can think of anything else better, do tell!
Update:
A reader got in contact with me to ask how the trial of the latest Sun kit went. Like I blogged last December, Sun announced a free trial of their Niagara boxes for people to determine how good they are and to consider buying some. As far as I can tell, this trial is vapourware. We never heard back, despite filling in the form again and mailing just to be sure. A few other people we’ve talked too attested to a similar experience. I guess Sun still suck.
Update 2:
A look at Sun’s revised form for the trial, shows that Ireland isn’t on the list of selectable countries, which might explain why we never heard back. What a load of crap. Sun definitely suck.
Justin Mason is a God
… at least according to Tim Bray.

He’s also got a photo of a seemingly disinterested me;

… getting a demo of the dtrace stuff. In fairness, dtrace is really impressive, and a few of us httpd committers hope to add dtrace inspection points to Apache over the coming months, it looks really useful. Matty has already done some really useful work in this regard over here, where you’ll find mod_dtrace and some other interesting examples. They should be taken with a pinch of salt though, Matty isn’t quite getting how pools work and some of his examples misinterpret what’s going on, but still it’s excellent work all the same.
I thought Tim’s keynote was actually very good, but I found one part of it really dissapointing. Their raw numbers on the Niagara system were totally bogus, he quoted a figure of 25,000 requests/second from the box, both in the presentation and the article, but it’s really not. It can do way way better. We had a close look at the system and found amongst other things that the httpd was a 32-bit build, which was a bad start, and as Tim points out the benchmarking was being done with ab from a laptop. Since our dual Itanium box can push 25,000 reqs/second in its sleep, I’m guessing their Niagara box can easily push at least 4 times that number, especially with the event MPM. It maxed at 290 Mbit/second too, which is quite low (despite what Tim might think). Just last week we shipped 1.2Gbit/sec in production without actually noticing much at all, so again I’d say the Niagra box can at least quadruple that kind of number. Of course it’s great that the numbers were wrong in the conservative direction, the opposite of the usual corporate PR.
The guys mentioned that they have some detailed Specweb stats, so it’ll be neat to see those. The platform did really “feel” fast (by which I mean responsive), not the usual sparc sense of treacle-like slowness, and dtrace really is an amazing utility. I know I’ll be looking very very seriously at the platform, and I already like their low-end X64 boxes as an alternative to Dell. So I have to agree with God, Sun really have made a gigantic sea-change, and it is kind of mind-blowing. Good stuff!
Update: I’ve applied for a free box from Sun for 60 days, to benchmark it myself, thouroughly.








