Archive for 'mirroring'

Niagara vs Showdown

Posted on March 23, 2006, under apache, general, mirroring, niagara.

So, after a week with the Niagara T2000, I’ve managed to find some time to do some more detailed benchmarks, and the results are very impressive. The T2000 is definitely an impressive piece of equipment, it seems very, very capable, and we may very well end up going with the platform for our mirror server. Bottom line, the T2000 was able to handle over 3 times the number of transactions per-second and about 60% more concurrent downloads than the current machine can (a dual Itanium with 32Gb of memory) running identical software. Its advantages were even bigger than that again, when compared to a well-specced x86 machine. Not bad!

The Introduction is one of the single busiest webservers in the world. We handle many millions of downloads per day, but unusually for a high-demand site, we do it all from one machine. This is usually a bad idea, but as a mirror server has built-in resilience (in the form of a world-wide network of mirrors), and as we can’t afford 20 terabytes of ultra-scalable, network-available storage, we use a single machine with directly attached storage, and rely on our ability to tune the machine to within an inch of its life. We regularly serve up to 1.2 Gigabit/sec, and have handled over 27,000 concurrent downloads. There’s some more detail on our previous set-up (which is mostly identical to the current one) in my paper on Apache Scalability.

Over four years ago, when I started in HEAnet, Solaris and Sparc hardware represented about 50% of our Unix systems. Now it represents less than 2%, so I’ve had less and less opportunity to tinker on Solaris in the last few years, but have kept up with it enough to know how to use dtrace, and to still understand the Solaris fundamentals. At ApacheCon US 2005, Covalent had a T2000 along as a demonstration machine. I got to play with it a little and was very impressed. Unlike prior experiences, this machine felt very responsive. There was no waiting for the output of commands, no listening to the whirring of hard disks, and the benchmarking numbers it was producing weren’t bad either.

When Jonathan Schwartz announced the “Free Niagara box for 60 Days” deal, we jumped at the opportunity to test one of the these boxes – which may be ideal for our needs. It took a while for Sun to iron out some administrative problems, but they certainly held up their end of the deal, and a nice shiny T2000 arrived a little over a week ago, for us to try out.

The Machines

To get a better sense of the machine’s performance in comparison to our other options, we rustled together a Dell 2850 Dual 3.2Ghz Xeon with 12GB of RAM, running Debian, and our current Dell 7250 Itanium (which is a dual 1.5Ghz with 32GB of RAM).

T2000 2850 7250

Throughout the benchmarking, the machine used for firing off the benchmarks (using ab, httperf and siege) was another Dell 2850, this time a dual 2.8Ghz xeon with 4GB of memory. For performing the concurrency and latency tests, we used more, similarly-configured (and identical each time), 2850′s and 2650′s to run yet more parallel benchmarks.

As is a live system which we can’t simply take off-air because we want to complete some benchmarks, we ran the tests during its quietest periods of use. To be fair, we also made sure that the other two systems – when benchmarked – were loaded with a baseline of 40 requests per second, with an average concurrency of around 300. After initially determining which machines were “winning” the benchmarks we tried to structure the load to favour the “loser” of the benchmarks, if any decision was needed. This means that where one machine comes out on top, the margin by which it wins is actually a conservative estimate.

Ordinarily, we try to drastically reduce the number of services on a machine, to free up memory and scheduler time on the system. However, as the T2000 came with a large number of services running, and it’s not entirely easy to determine what is and isn’t actually a critical service, we shut down obvious candidates – such as the various network filesystem daemons – but left some others alone. Again, if anything, this means that our results are actually conservative for the Sun, although they probably do reflect a real-world set-up, which will have these services running.

The Preparation

As no system comes configured perfectly for such extreme tests, we did a number of things to each machine we tested, to achieve as much performance as we could manage. Since my Solaris skills are rustier than my Linux skills by a fair margin, it’s more than possible that our benchmarks under-represent the performance of the T2000.

The first thing we did after receiving the system was to get smpatch configured, and to run “smpatch update”. Getting the system completely up to date took a good 6 hours, and that still only covered critical and security updates, as we don’t have a subscription for everything else. Being a Debian and Ubuntu user, this is annoying. “apt-get update && apt-get dist-upgrade” would have done the same thing, and upgraded everything in about 15 minutes, at the very, very longest. Hopefully though, that will be improved upon.

Next, we installed the SUNWspro suite, in order to have a compiler, linker and so on – which is mighty useful for compiling Apache from source! Some reasonably trivial invokations of apachebench seem to show that this compiler produces faster binaries than gcc. Over the years, there have been claims that 64-bit binaries are actually slower than 32-bit binaries. Our testing didn’t show much of a difference, but just in case there is one, we used 32-bit builds of Apache, though with the correct largefile-magic, so that we could still transfer very large files.

We didn’t apply many Solaris kernel tunings, mainly because the Solaris team seem to be working hard to get rid of them, and putting a lot of effort into making the default behaviour ultra-scalable. Nevertheless, we upped max_nprocs various times to cope with the insane number of processess we were creating. Keeping an eye on tcp:tcp_conn_hash_size with ndd seemed to show little problem with the default values, and this is the main Solaris tunable we’ve had to tune in the past.

Apart from mounting the filesystems with the “noatime” mount-option, we did no filesystem tuning, which is something I’m keen to improve on, particularly if we can try out ZFS. Again, if anything, this means that the performance of the T2000 may be under-represented. However, as our benchmarking was restricted to just 3 files, with no directory traverals, probably not by much. If anyone has any pointers on intensive filesystem tuning on Solaris, please send them my way!

Itanium 7250
The Itanium box runs version of the Linux kernel and our list of related sysctl’s looks like this;

net/ipv4/tcp_rmem="8192 87380  1747600"
net/ipv4/tcp_wmem="8192 87380  1747600"
net/ipv4/tcp_wmem="8192 10000000 10000000"

We also up the txqueulen on our interfaces to 50000, for achieving super-high throughput to our Geant users. The XFS filesystem was mounted with the “noatime” and “ihashsize=65535″ mount options.

2850 Xeon
For the sake of consistency, the kernel was also installed on the Xeon box, with the same system and interface settings as the Itanium box. The ext3 filesystem used was mounted with the “noatime” mount-option.
Common to each box were the usual Apache tunings we apply. For each machine, we tried to determine the quickest MPM to use. In the case of the two Dell boxes, this was the event MPM, which was ahead of the worker MPM by about 2%. We couldn’t get the event MPM working on Solaris (more about that later), so we used the worker MPM – which was over twice as fast as prefork on the platform.

As Solaris seemed to respond better to more LWP’s than PID’s, we ran with 64 threads per child – which is not at all an unreasonable number. Increasing beyond this did give us slightly better results, but the potential for 64 downloads to die at once, when there’s a problem, is just about enough real-world risk to deal with, for me. The relevant configuration stanza looks like:

<IfModule mpm_worker_module>
    ServerLimit            1563
    ThreadLimit            64
    StartServers           10
    MaxClients             100032
    MinSpareThreads        25
    MaxSpareThreads        75
    ThreadsPerChild        64
    MaxRequestsPerChild    0

Note: these are stupid values for a real-world server, and will waste a lot of memory for the scoreboard. They are really only useful if you are doing some insane benchmarking and testing.

We naturally set “AllowOverride None”. Interestingly, although sendfile() functions flawlessly on Solaris (unlike on Linux), using it seemed to have an impact on performance. Using it did reduce the amount of memory used by Apache on the box, but it gave slower performance than just read() and write() – so perhaps it’s blocking characteristics are slightly different. Thus, we set “EnableSendfile off” and used MMap instead (via “EnableMmap”) which seemed to be the fastest way to ship bytes.

Another hack we applied to speed up Apache was to change the default buffer size, which is buried in the bowels of APR and can only be changed at build-time. In each case, the buffer size was changed as per the most efficient value (as determined by our previous benchmarks on single-threaded I/O). Don’t try this at home kids, unless you really know what you’re doing.

So, with our tunings applied, we set about performing our benchmarks, and for the sake of sticking with the showdown theme, I’ve divided the results into good, bad and ugly. (No, there weren’t really any ugly results – it’s just a fun theme for a post).

The Good

Power Usage
As I’d previously blogged, one of the first things we were able to measure was the power usage of the machine. Much to my amazement, it remained at the original level (+/- 20%) of current draw for the duration of our tests, peaking at a mere 1.2 Amps, or about 290 Watts. This compares pretty favourably with our Dells, though I should add that the Dells both have more disks in their chassis than the T2000.

Machine Average draw Peak Yearly cost
Sun T2000 1 Amperes 1.2 Amperes €210
Dell 2850 1.6 Amperes 2 Amperes €350
Dell 7250 1.8 Amperes 2.2 Amperes €395

Costs are calculated on the average draw, at the Irish commercial ESB rate, and do not include cooling costs (roughly triple the number to get the overall yearly cost). Electricity supply was 240V, so multiply the Amperes by 240 to get the raw numbers of watts. These results were calculated using an APC metred PDU. This is not a scientific instrument, and it’s entirely possible that results are inaccurate. Some rough calibration did show that the unit produced consistent results, so personally I’m confident enough about the order in which the machines are ranked, but I wouldn’t go so far as to be certain of the raw numbers produced. We really need a good power meter to produce that kind of reliability.

Requests per second
How many requests the machine can handle in a second is probably the most valuable statistic when talking about webserver performance. It’s a direct measure of how many user requests you can handle. Fellow ASF committer, Dan Diephouse, has been producing some interesting stats for requests-per-second for webservices (and they are impressive), however we were more interested in how many plain-old static files the machine could really ship in a hurry. And without further ado, those numbers are;

Concurrent Downloads

Sun’s own benchmarks have quoted up to 2500 requests per second, which we didn’t find particularly impressive. Our current box – merely a dual Itanium – can do 2700 requests per-second without much trouble. I’m happy to confirm though, that the tricks we do to reduce Apache’s memory usage on Linux have as much of an effect on Solaris. Our results are averaged over 5 runs of the testing, during which the T2000 managed a very, very impressive 5718 requests per second. Not bad!

Despite the new kernel, the x86 box still struggled to push out a disappointing 982 requests per second, while our Itanium churned through a reliable 2712 requests per second.

Unfortunately, neither the siege nor apachebench utilities can cope with the levels of concurrency we test with these days, as there are simply far too many sockets involved. Tuning the client machine itself becomes a serious task in order to be able to cope with the sheer volume of outbound requests. We currently have some commercial traffic generation and scaling testers in our test-lab, but we decided not to use those either. Instead, multiple servers were thrown at the problem and we used 11 machines all-in, all running instances of siege at the same time. The instances were fired off by hand, but within a few seconds of each other, and more than enough requests (100,000) were used, to ensure that the processes were given enough time to ramp up to the level of parallelism required. Each machine was on the same LAN as the server we were benchmarking.

With those limitations in mind, the test certainly allowed us to find out the rough breaking point of each machine. On any system, sustaining over 10,000 concurrent requests would involve denying some requests outright, but the cut-off or breaking point was defined as the point when the server got to 50% availability. We used some other tricks, like assigning the server multiple IP addresses and targetting each client at a different address, to a) give the tuple-tracking code in the IP stacks an easier time and b) allow us to easily track how many clients each server was sustaining.

Also, in each case, the system was pretty much unusable by the time we were done! After killing all of the connections, the Linux boxes would take about 5 minutes before becoming responsive enough that we could get to a shell prompt. The T2000 would take about 20 minutes, although I think that if we reserved more processes for the root uid, that might change – sshd seemed responsive enough, but would block on fork() when trying to create a shell process.

Concurrent Downloads

As you can see, the T2000 was able to sustain about 83,000 concurrent downloads, and my limited dtrace skills tell me that thread-creation at that point seemed to be the main limiting factor, which is hardly surprising. For us, that number represents an upper limit on what the machine could handle when faced with a barrage of clients. Of course, no server should ever be allowed to get into that kind of insane territory, but it’s always good to know that there is plenty of headroom. More to the point, it means that availability at the lower levels of concurrency is much higher. Compared to the 57,000 concurrent connections our Itanium box, and the 27,000 our Xeon box can handle, it looks like the T2000 would be a very, very good choice of server for our load.

Latency vs concurrency
I would have liked to have been able to measure availability vs concurrency, but unfortunately our method of testing doesn’t really allow for this. Although we can sum the availabilities as seen by each client participating in the benchmark, this doesn’t always time-average correctly. In other words, if we used two client systems, and client A reported 90% availability and client B reported 80% availability, does that mean 85% uptime overall, or 80%? Unfortunately, it doesn’t mean either. Averaging only works if the two figures are perfectly overlapped in time, so it’s an average – but weighted in proportion to the lack of an overlap. The real availability is somewhere between 80% and 85%, and it’s very hard to figure out where. If the client systems were identical in hardware terms, we could come close to solving the problem by firing off the benchmarks with the at command, but our systems aren’t all that close in terms of spec.

Instead, what we can do, is to measure the latency as it increases with concurrency, in each case taking the worst value from our benchmarking clients. Benchmarking from a single system shows that there is a very high degree of correlation between an increase in latency and a decrease in availability, so this measurable gives us a good idea of both.

Latency vs Concurrency

Overall, the T2000 performs very impressively. At very low numbers of concurrency, it actually has a higher latency than either of the Dell machines we tested, but these latencies are of the order of tens of milliseconds. In other words, the network latency makes a bigger difference in the overall scheme of things.

With no concurrency at all, the T2000 would exhibit latency of 9 milliseconds, compared to the Itanium’s 1 millisecond (and in fact, ab actually outputs 0, so it’s less than 1 millisecond) and at 1000 concurrent requests the T2000 would have 48 milliseconds, compared to 12 milliseconds for the Dual Itanium box. However, as we scaled up the concurrency, the latency numbers change fairly rapidly, in favour of the T2000. Due to the huge changes in scale, we’ve had to use a logarithimic graph, but at 50,000 concurrent downloads, our Itanium would take up 38 seconds to respond to a client, compared to the T2000′s 26 seconds. At 83,000 downloads, which only the T2000 could manage, the latency had gone up to 57 seconds, but it still responded.

Overall, I think it’s fair to say that while the T2000 doesn’t seem to have ultra-low latency performance, it has much better scalability and provides much better availability as more and more connections are added. So again, overall, the T2000 is still the better webserver.

The Bad

I’m a bit reticent to label these results “bad”, because they really are in areas in which Sun have never claimed the machine will perform. The Niagara platform is architected for parallelism, it’s not supposed to give great performance for any single-threaded task. If you have a load which requires great performance to a single client, Sun have an array of other hardware they’d prefer to sell you instead. However, since some aspects of single-threaded performance do have a direct impact on webserver performance, I’ve included some relevant ones here.

Single-threaded I/O
As I’ve previously blogged, one of the first benchmarks we run on any machine is to determine how much I/O a single-threaded task can drive, and what the most efficient buffer size to achieve that is. There’s much more detail in the linked blog post, but the summary information can be easily graphed:

Maximum single-thread throughput

These results may be attributable in part to the relatively slow system disks that the T2000 ships with, and much better performance can probably be derived by using a faster disk setup. On the other hand, the performance Linux achieves is mainly due to the very aggresive vfs caching it performs. Unlike the Linux box, the T2000 produces the same throughput numbers whether it is the first time or the tenth time it has read a file. Linux, on the other hand, takes much longer to serve a file the first time, but after that, it’s served from RAM.

It’s also useful to put these results in context; what they mean is that a single-threaded task, doing as pure and simple an I/O task as possible, can push 3.5Gigabytes per second. The Niagara box comes with 4 Gigabit/sec interfaces, so even a single-threaded task could fill that, 7 times over. Still, if I were deploying a load with a large and very active database component, I would do some more extensive testing to ensure that any single-threaded I/O constraints had no overall effect.

Single-download throughput
After gathering the numbers on single-threaded I/O, and confirming that the T2000 could easily saturate its 4 Gigabit interfaces – at any level of concurrency high enough to generate that level of traffic – we decided to see if the I/O numbers exhibited themselves for a single download. To perform this benchmark we went back to basics, and used curl and wget to grab a 1 Gigabyte file repeatedly. To help the systems out, we increased the MTU to 9000 bytes and made sure the TCP window size was big enough to take the entire file straight away. We also monitored for any packet loss during the tests (there was none).

Due to the way we handle the load-balancing of our network interfaces on the Linux boxes, which is per-flow, any single download is limited to 1Gigabit/second. Sure enough, wget reported a neat 123 MB/sec fairly reliably. Since the balancing was per-flow, it’s entirely possible the machine can actually ship faster downloads, and neither system seemed under any strain while doing this. With the T2000 on the other hand, we could push no more than 48 MB/sec, which is still a very respectable 384Mbit/sec.

Single download performance

Apart from increasing the MTU and Window size, we didn’t apply any Solaris-specific tunings for improving these numbers, so again, it’s possible that these numbers are under-representing true possible performance. And once again, we really have to put these numbers into context. As a whole, the T2000 has no problems saturating it’s 4 Gigabit/sec of connectivity, and that’s what it’s designed for – parallelism. All our numbers mean, is that if you wanted truly incredible performance for any single download, this probably isn’t the right architecture. Outside of where I work, and other high-speed research networks, I’m not aware of any place where high-speed, single-flow statistics really matter a whole lot, especially for HTTP. The network is usually a limiting factor anyway. I mean, how many people have jumboframe capable multi-gig WANs?

The Ugly

Ok, so ugly is a bad choice of word. But like I said, this is a “showdown”. While testing the event MPM, we did manage to upset the Solaris kernel to the extent that it actually crashed;

panic[cpu21]/thread=300024a7020: BAD TRAP: type=31 rp=2a102c87720 addr=0 mmu_fsr=0
occurred in module "genunix" due to a NULL pointer dereference

httpd: trap type = 0x31
pid=652, pc=0x10fb4dc, sp=0x2a102c86fc1, tstate=0x4400001607, context=0x514
g1-g7: 0, 0, 12, 38, 0, 0, 300024a7020

Nice! I havn’t looked into this in detail yet, but it’s likely due to the unusual synchronisation semantics the event MPM features right now. The event MPM is marked as experimental, and if you’re not an Apache developer, you probably shouldn’t be running it. Still, the thread-handling code within the MPM all runs as a non-root user, so it really shouldn’t be able to cause the kernel to crash. Then again, it was handling about 30,000 requests at the time, with no accept mutex. This isn’t exactly within the normal range of expected behavior for a userland application. Since switching to the worker MPM, we’ve had flawless performance and not a single crash.

The Conclusion
The T2000 is one very impressive piece of kit, and at a list price of around €15,000 ($16,995), costs less than half of the price of the dual Itanium we’ve been benchmarking it against (it’s also less than I can price up a comparable X86 box for – seems to be the memory that does it). We may very well go with the platform for our next iteration of

The benchmarks we’ve run were all run with our own load in mind, but hopefully they’re still of some use to others. If you’re thinking about giving the platform a try, do run your own benchmarks though, don’t take our word for it. It’s always better to have these things validated and improved upon.

The Future
We’re not finished benchmarking just yet, we still have more planned! The Niagara box has some impressive SSL-offload features, and if we get a chance, we’d like to test those capabilities. We just needs to get the hacked-up engine3-supporting versions of openssl and flood onto the box, which will involve a bit of research. Some of the Apache SpamAssassin guys may try running some SpamAssassin benchmarks on the machine too, which should be impressive, as they lend themselves to parallelisation very well. We’re also going to try and improve on our above tests, and I’ll keep blogging about the results as we manage to do that.

Rather tantalisingly, there’s a comment on Dan Kegel’s C10K page saying that “Doug Royer noted that he’d gotten 100,000 connections on Solaris 2.6 while he was working on the Sun calendar server”. but doesn’t give any details of the hardware involved. But still, 100,000 connections, on 2.6! It gives me hope that with more tuning, the T2000 might be capable of scaling beyond the 83,000 we had.

If I develop some more free time, I also hope to use the machine to instrument Apache httpd (and maybe apr) for dtrace. Do check out Matty’s mod_dtrace though, for a cool module which instruments all of the handlers.

In the meantime, you can check out all of my blog posts about the Niagara box through my new Niagara category. Mads is also keeping tabs on other benchmarks taking place within the ASF community.

The Cheeky Part!

I don’t know what the status of Jonathan’s offer to be allowed to keep a server, at the discretion of the Niagara team, is – but we might as well give it a try.

Although we’re seriously considering the platform for the future, HEAnet doesn’t have a use for a Niagara box right now, but the other participants in our benchmarking efforts (and hopefully we’ll be blogging their results soon enough too) do – DCU’s Networking Society, RedBrick. RedBrick just celebrated 10 years as a networking society, and 5 years ago, Sun donated a massive E450 to the society, on which we ran our 2000 user shell server for 2 years.

We even pulled out all of the stops at the time, and had the Taoiseach (the Irish Prime Minister) turn out to launch the machine. I’m hoping we can convince SUN to donate the Niagara box to RedBrick, where they can use it for even more testing and benchmarking, as it really is an ideal machine for a shell environment. Lots and lots of low-memory parallel tasks.

So if you thought this round-up was of any use, digg it, link to it, or mail it to your local Sun Niagara team member, and we’ll see if we can be useful enough to merit a donation!

Getting rid of errant HTTP requests

Posted on November 24, 2005, under apache, general, mirroring.

When you run a busy website, you’re bound to pick up a lot of wacky requests, and some downright broken clients. Annoying behaviour can range from anything from repeated nuisance requests to a full-scale Denial of Service attack. Competent mail server administrators will be very familiar with protocol-level techniques to try and hinder these requests, but you don’t hear much about it within HTTP.

This is partly because, luckily, abuse is rarer in HTTP and because not many people actually read their logs all that closely. Over at though, we get many many millions of requests per day and the bad boys all add up. We see broken browsers, broken mirror scripts, huge “wget -r” or lftp grabs of massive portions of our tree, paths on our server hardcoded into some applications – which consider us the most reliable place to fetch a XSL file or check for the latest version of a perl package.

And we’re not alone, Dave Malone had to deal with a ridiculous NTP client which was using his Webserver as a time source (yep, NTP via HTTP date headers!), and it wasn’t even being polite enough to use a HEAD request, it was actually using “GET / HTTP/1.0\r\n\r\n”. We had to patch Apache to get around that one.

Over time, we’ve developed a few tactics for helping us defeat these annoying requests and get them off of our server as quickly as possible. The first trick of course is to identify them in the first place. Any decent log analysis package or even just getting a regular “feel” for traffic through mod_status will quickly identify any odd requests. If you see 250,000 requests for an XSL file, you know something is up. Likewise if you observe that a particular host is constantly connected to you, it’s possible there’s something that needs looking at.

The next thing to look at is whether these requests are really a problem or not. In our case we can tolerate 250,000 requests for an XSL file, after all we’re not short of bandwidth which is the main resource being used. But it’s not something we would want to leave unchecked, we’re there to serve all sorts of content – not just XSL files. Huge “wget -r”s or clients which poll is far too long are a concern for us though, because we optimise the server for the average long downloads that make up most of our server. We don’t want to see lots and lots of small requests and all of the context switching that entails. They slow down the responsivity of the system.

Unfortunately, it’s pretty rare that these illegitimate requests come from single, fixed, IP addresses, and when they do more often than not that address is a proxy or a NAT box and serves many other legitimate requests, so just applying an ACL doesn’t always suit. Instead we use mod_setenvif and mod_rewrite to classify requests based on the more exact nature of the requests.

Once we’ve done that, how we deal with them falls into 4 different categories;

  1. Malformed output

    The first thing we tried was simply returning malformed data at the URL they expect.

    So if a client was persistently querying say /foo/bar/blah.xsl , we would return an XSL file that was crafted to be utterly broken and contain lots of comments explaining why (though of course only to this client, other users get the original file). This is the same tactic Dave Malone employed to combat the bold NTP clients. We patched Apache so that a date header set with mod_headers would work (ordinarily Apache doesn’t let anything else set a Date header) and returned Dec 31st 1999 to every such client.

  2. Teergrubbing

    For a lot of cases, malformed output works pretty well. But for others; typically automated processes long forgotten by their owners, it does little. For those we next tried a variation on “teergrubbing” used in some SMTP-level anti-spam defences. We just redirect to a cgi that does only a very little more than something like;

    echo "Content-type: text/plain"
    while true; do
        echo please contact
        sleep 10

    That worked pretty well, and caught a lot of brokenness including people who had hard-coded us as a dtd source. Still left us with some annoying stragglers, though.

  3. CPU exhaustion

    The next trick we try, and one which is really pretty dirty is to try an mount a Denial of Service attack on the HTTP client. Typically these clients arn’t well written, and if they even have any loop-prevention, it’s basic in the extreme. We exploit this by trying causing their client to loop and loop between sucessive HTTP requests. Now, anyone it’s relatively easy to detect a URI that redirects to itself even with one or two levels of indirection, so instead we do;

    echo "Location: /thiscgi?$RANDOM"

    Now that’s what I call mean. Now our system can easily take the load and bandwidth this causes, theirs cannot, and it can pretty quickly wear them out and soon enough we see the requests die.

  4. Memory exhaustion

    Something else I’ve been playing with lately is using the Content-Length: header, or the features of chunked-encoding to try and exhaust memory on these clients. Many of the dumb clients seem to allocate a single buffer for the entire response, especially when using chunked-encoding. By trying to make the client allocate several Gigabytes of memory, they can ocasionally be stopped dead in their tracks.

In reality, we actually implement the above with tiny Apache modules rather than CGI, for the sake of efficiency. Of course all of these are really only appropriate when you’re dealing with remote tasks someone is inappropriately running, or if they’re using woefully outdated harmful software, and if the requests are easily categorisable, and even then only if the CPU hit it causes the server results in a net drop in these requests over time.