I couldn't think of a clever title about a hydra. This is a dorky programming post. 

This weekend, I was tooling around with a couple of scripts to do some web benchmarking. Really simply, I just wanted to throw a bunch of requests at a site, see its performance, make a few changes, and do it again. I wrote it out in perl the first time, just shelling out a curl request (not even bothering to use LWP). It was the quick and dirty solution:

for ( 1..10 ) {
    # New host
    my $time = `/usr/bin/curl -o /dev/null -s -w %{time_total} $site/`;
    push( @curr_time, $time );

    # Old host
    $time = `/usr/bin/curl -o /dev/null -s -w %{time_total}  -H'host: $host' $ip{$dc}/$url/`;
    push( @old_time, $time)
}

The whole script (parsing the incoming data, running the curl requests, creating some aggregate data) took about 65 lines with comments.

It was definitely the quick solution to write. But it wasn’t the quick solution to run, as it took the better part of a day to finish running.

On a lark, I remembered that Ruby had a library called Typhoeus that was pretty much designed to do the exact task I was trying to do. And it does it with some concurrency using a queue called Hydra, to help me solve the “quick” side of things. I could have done this in perl with something like AnyEvent, but I figured why not give it a whirl in Ruby.

Took me a bit to get my head back around Ruby, but I was able to crank out the script in about 90 lines (with comments). My first pass through running it (with a 10 threads) it took a little over an hour to rip through the list of sites. In looking at the data, there were a bunch of requests that got back no result, which leads me to believe that maybe I was pushing the concurrency too high. So I dropped that down, and added a retry.

Remember, this is super quick and dirty. If this script proves fruitful, I’ll likely turn it into something reusable, but for now, the guts look like this:

hydra = Typhoeus::Hydra.new(max_concurrency: 5)
50.times do
    request = Typhoeus::Request::new(url)

    request.on_complete do |response|
        if response.code == 0
            puts "Got a 0 response for #{url}"
            unless retried[url] == 1
                puts "Retrying request"
                request = Typhoeus::Request::new(url)
                hydra.queue request
                retried[url] = 1
            end
        elsif response.code == 200
            results_file.write(data[site]["user"] + "," + site + 
                       response.total_time.to_s + "," + response.code.to_s + "\n")      
        end
    end

    puts "queueing request"
    hydra.queue request
end
hydra.run

Basically, we create 50 requests to the same url, queue them up, execute them (up to 5 at a time), and then I write the results out to a file. If I get back a response.code of 0 from the request, I retry it (but only once). If I get back a 200 response, I log it. Otherwise, I just move on.

I’m expecting with retries and the concurrency turned down to 5, that it’ll take a few hours to finish. Typhoeus, with its Hydra queue, is a pretty nifty little framework.