Digital Shakedowns

(This likely needs to be edited. Forgive any poor grammar, punctuation, or lapses in logic …)

If you’ve been paying attention to the tech press, you may have noticed an uptick in stories about DDOS (Distributed Denial of Service) attacks. A DDOS, in a nutshell, is when an attacker sends you more traffic/requests than your server or bandwidth provider can handle. It generally results in your servers going down, or your provider taking you down, for the good of their other customers.

You should familiarize yourself with the landscape. Read a couple of articles to familiarize yourself with the new wild, wild west.

DDOSes are being used as the digital equivalent of the old school shakedown. “Hey, I wouldn’t want anything bad to happen to your site. so you should, you know, pay me to make sure nothing untoward were to happen.”

In the physical world, shakedowns are less common (I think … I don’t have hard facts for that) than they were in the old days. The risk is greater to the extorter. The person or business being extorted generally has technology available to capture the extortion, our law enforcement and courts don’t look kindly on them, and since you have to physically be present to commit the extortion, it’s a lot easier to catch someone in the act.

(I’m not claiming extortion and shakedowns don’t happen any more. I just think they’re probably less common—in the US—than they used to be.)

That’s not true on the internet. With cheaply available botnets, ISPs turning a blind eye in favor of the marginal dollars, and the global internet meaning there are countries into which law enforcement cannot easily reach, the internet has become a goldmine for extortion.

There is a solution to this. It comes in two parts, and both of those parts will cost large internet providers money. But, like the music industry with MP3s, these ISPs are going to have to embrace the new cost of doing business, or they’ll slowly watch their systems turn into a barren ghetto where no true businesses will want their servers.

Step 1 of the solution is an ISP crackdown. The vast majority of the computers used in attacks are compromised PCs (often in China, where they’re using a pirated or hacked version of Windows). ISPs need to drop or throttle service the moment someone’s computer shows the signs of being used in an attack. This will hurt ISPs. They will get more support calls, deal with angry customers, and have to help customers get cleaned up. But, if they don’t do it, they’re going to run the risk of getting blocked by other service providers. If your ISP is a constant source of outbound attacks, other providers will drop your packets, and then you’ll have lots of angry customers calling to find out why they can’t get to netflix.com or espn.com.

Beyond just normal ISPs, VPN/Colo/Dedi/cloud providers need to crack down on their customers. The biggest spam networks in the world are all server providers who aren’t cracking down on their outbound traffic—because it makes them money. It’s not just spam though, these same servers and networks are often used for DNS or SNMP attacks. Like home ISPs, network providers should simply start dropping traffic coming from these providers until they clean up their acts. Nothing will speak louder here than money.

There’s a downside of this tactic: internet users in places like China, where the internet is one of the few tools citizens have to fight for democracy, are going to be disproportionately impacted. But, while I’m not a free marketer, this is a place where the free market could win. If a good ISP in China or Africa or some other impacted area were to provide a well regulated internet (not from a content perspsective, but from an outbound attack perspective), they wouldn’t be blocked, an customers would flock to them as the only provider who could see Twitter or YouTube.

Step 2 is for transit providers (the folks providing bandwidth to your favorite site on the internet, in essence) should stop looking at DDOS mitigation as a profit center and start looking at it as a cost of doing business. If a provider is simply charging customers for DDOS mitigation, or worse, not offering it at all, they are rapidly going to be at a competitive disadvantage. Small businesses, especially nascent small businesses or aspirational small businesses, cannot afford to pay for a big DDOS mitigation solution. Some provider is going to offer DDOS mitigation as a feature of their service, and they are going to suck up a good bit of the market. Once that happens, transit providers will have to offer a minimum level of mitigation service as part of their services.

There is one last thing that has to happen to make these digital shakedowns a thing of the past (or at least closer to a thing of the past). Someone is going to need to go to jail. It might end up being this 17 year old kid. And, honestly, it should be. He caused a huge disruption to the internet, potentially disrupted a significant amount of ecommerce, and wasted like many many days of work. It’ll only take a few examples before people realize there’s a much bigger risk in DDOSing someone.

And then we’ll be able to move digital shakedowns into the same category as physical shakedowns. Something from the “good old days.”

Unintentionally Eating Some Delicious Cookies

I use a cool little web app called ThinkUp to keep track of stuff I post to Twitter and Facebook. I use the self-hosted version, running on my own server, and I’ve had it running for a year or so and never had a problem.

This weekend, I went to login to see if ThinkUp would show me anything interesting. Except I couldn’t log in. Every time I tried to login, it would just kick me back to the login screen. Clearly something had gone wrong. I watched the login requests via Developer Tools in Safari and Chrome and noticed that I was not getting a PHP session cookie. That’s certainly odd—setting a session cookie is pretty straight forward and I’ve never seen it fail.

As is typical in this sort of issue, I debugged it ass backwards. I spent an hour or so writing test scripts, changing permissions on session directories, and changing session settings before realizing I was debugging things entirely wrong.

My stack looks something like this:

nginx -> varnish -> apache2

I realized that I should start by looking to see if the request to Apache2 was getting the cookie headers back. I ran a quick curl command, and sure enough, the cookie headers were there when talking directly to Apache2. Logically, I then ran the same curl command, changing it to talk to Varnish. Sure enough, the cookie headers were gone.

Finally, I’d figured out where my cookies were getting eaten (haw haw).

Diving into the Varnish config, it was pretty quickly obvious what had happened. When adding Varnish caching to support this here blog, I added this line,

unset beresp.http.set-cookie;

which basically says “get rid of the cookie header we’re sending back to the user”, which allows us to cache more stuff. Of course, that was getting set far too liberally, dropping the PHP session cookie, and making it so I couldn’t login. A couple of tweaks and restart later, and all was well.

This sort of thing happens to me somewhat frequently. I muck around with some settings on my server, and everything works great for my blog or my static site, but I forget about other things I have running, and a few weeks later, I notice they’re broken, but now I have no idea why. It’s a pretty good case for using a tool like doing, to log things I do (that aren’t necessary driven by my OmniFocus to-do list) so that I don’t spend hours debugging my self-inflicted problems.

I couldn’t think of a clever title about a hydra. This is a dorky programming post.

This weekend, I was tooling around with a couple of scripts to do some web benchmarking. Really simply, I just wanted to throw a bunch of requests at a site, see its performance, make a few changes, and do it again. I wrote it out in perl the first time, just shelling out a curl request (not even bothering to use LWP). It was the quick and dirty solution:

for ( 1..10 ) {
    # New host
    my $time = `/usr/bin/curl -o /dev/null -s -w %{time_total} $site/`;
    push( @curr_time, $time );

    # Old host
    $time = `/usr/bin/curl -o /dev/null -s -w %{time_total}  -H'host: $host' $ip{$dc}/$url/`;
    push( @old_time, $time)
}

The whole script (parsing the incoming data, running the curl requests, creating some aggregate data) took about 65 lines with comments.

It was definitely the quick solution to write. But it wasn’t the quick solution to run, as it took the better part of a day to finish running.

On a lark, I remembered that Ruby had a library called Typhoeus that was pretty much designed to do the exact task I was trying to do. And it does it with some concurrency using a queue called Hydra, to help me solve the “quick” side of things. I could have done this in perl with something like AnyEvent, but I figured why not give it a whirl in Ruby.

Took me a bit to get my head back around Ruby, but I was able to crank out the script in about 90 lines (with comments). My first pass through running it (with a 10 threads) it took a little over an hour to rip through the list of sites. In looking at the data, there were a bunch of requests that got back no result, which leads me to believe that maybe I was pushing the concurrency too high. So I dropped that down, and added a retry.

Remember, this is super quick and dirty. If this script proves fruitful, I’ll likely turn it into something reusable, but for now, the guts look like this:

hydra = Typhoeus::Hydra.new(max_concurrency: 5)
50.times do
    request = Typhoeus::Request::new(url)

    request.on_complete do |response|
        if response.code == 0
            puts "Got a 0 response for #{url}"
            unless retried[url] == 1
                puts "Retrying request"
                request = Typhoeus::Request::new(url)
                hydra.queue request
                retried[url] = 1
            end
        elsif response.code == 200
            results_file.write(data[site]["user"] + "," + site + 
                       response.total_time.to_s + "," + response.code.to_s + "\n")      
        end
    end

    puts "queueing request"
    hydra.queue request
end
hydra.run

Basically, we create 50 requests to the same url, queue them up, execute them (up to 5 at a time), and then I write the results out to a file. If I get back a response.code of 0 from the request, I retry it (but only once). If I get back a 200 response, I log it. Otherwise, I just move on.

I’m expecting with retries and the concurrency turned down to 5, that it’ll take a few hours to finish. Typhoeus, with its Hydra queue, is a pretty nifty little framework.

Don’t Look a Gift Gazelle in the Mouth

The iPhone 5S came out at the perfect time, as I had recently dropped my 4S and cracked the screen. I was well past my upgrade date, so the upgrade would be reasonably inexpensive (as iPhones go). And, to top it off, Gazelle had given me an estimate of $80 for my iPhone 4S with a cracked screen.

I went and bought my 5S, got my little Gazelle box in the mail, and, as instructed (really, it’s right in the instructions …), I dropped the box off at the nearest post box. It’s right at the end of my street—pretty convenient.

And that was the last that was ever seen or heard from my iPhone.

After a couple of weeks had gone by, I contacted Gazelle support to ask if they had any news. They pointed me to my local post office. My local post office had no record of ever receiving the package, so they told me to wait a while longer, then file an insurance claim (since Gazelle’s packages are insured!).

So I did that. I’ve talked to Gazelle, and the national USPS, and my local USPS, and round and round.

Yesterday, I found out my insurance claim was rejected by the government.
Today, I found out Gazelle can’t help me because there’s no record of the tracking.

Well, shit.

At this point, I don’t care. The insurance claim was $50, and honestly, I’ve spent more time than its worth trying to get my $50 to prove a point.

I give up, you win, forces of corporate and government inertia. And you win, especially you, dishonest postal service employee who stole my broken phone. [1]

What are the lessons here?

  • Gazelle shouldn’t make their boxes conspicuous. Had the box not been so clearly a Gazelle shipping container, it probably gets ignored.
  • Gazelle absolutely shouldn’t tell you to drop your package in a post box, when the insurance won’t really apply until it’s tracked by the USPS. Go hand it to the postal service agent by hand.
  • I’ll defend the USPS in most cases, but jesus, it’s incredibly obvious what happened and they just don’t seem to give a shit.

In the end, I’m not sure if I’ll use Gazelle again. Maybe I will, but I feel like I got a bit of the run around in this process. They must have dealt with a lost or stolen package before, in this scenario. And the USPS, well, I like my local USPS, but I’ll be handing them things and double-checking tracking from now on, because there’s a scoundrel in their midst.

I do like my new iPhone 5S, though …


  1. Because that’s clearly what happened. I dropped it in the post box. The next day, some postal employee picks it up, see it’s from Gazelle, and just pockets the box. That’s it. That, or it is still sitting there, at the bottom of the post box.

     ↩

(Un)Mistaken Identity

As the computer dork in the house, I was given the case of “I just bought this book on Amazon, but it’s not showing up in my Kindle app” by Katie. I grabbed her computer and poked around in her account. Even though she had purchased previous books, those weren’t showing up in her order history. Her new book was in her account, but would not show up on her Kindle.

So, I did what any self-respecting dork would do. I logged out and logged back in.

Lo! and behold, the new book happily downloaded to the Kindle.

But the old books were now gone.

If you’re smarter than me, you might have figured out what’s going on by now [1]. But I was still stumped. So I chatted Amazon support. After a couple of minutes with two helpful chat agents, we figured out what it was.

Somehow, Katie had two Amazon accounts with the same email address, that differed only by password.

Well, hell. With some help from the Amazon folks, we got the books all onto one account and got that setup and everything was normal.

But, dear god, in what world would you let the same email address have multiple accounts. How much bad stuff can happen because of that? For one, the one we just talked about, getting content split across multiple accounts.

What happens if I want to change my password, which account does it change? Accounts on Amazon seem to be unique by email plus password, which is such a weird thing. Turns out, this is a pretty common problem.

From a technical/product perspective, this becomes one of those interesting questions: “at what point is the small number of users taking advantage of this feature offset by the customer confusion it causes”. My guess is that there cannot be enough people who intentionally rely on this feature to be worth the negative impact it causes.

Same email, different accounts, poor use of email as identity.


  1. If you guessed “they’re on two separate accounts”, you win. Bonus points if you realized it was two accounts with the same email address.

     ↩

When Caching Bites Back

We have an application on our site that was rewritten a few years back by a developer who is no longer with the company. He attempted to do some “smart” caching things to make it fast, but I think had a fundamental lack of understanding of how caching, or at least memcached works.

Memcached is a really nifty, stable, memory-based key-value store. The most common use for it is caching the results of expensive operations. Let’s say you have some data you pull from a database that doesn’t change frequently. You’d cache it in memcached for some period of time so that you don’t have to hit the database frequently.

A couple of things to note about memcached. Most folks run it on a number of boxes on the network, so you still have to go across the network to get the data. [1] Memcached also, by default, has a 1MB limit on the objects/data you store in it. [2] Store lots of stuff in it, keep it in smaller objects (that you don’t mind throwing across the network), and you’ll see a pretty nice performance boost.

Unless … someone decides to not cache little things. And instead caches a big thing.

We started to notice some degradation in performance over the past few months. It finally got bad enough that I had to take a look. It only took a little big of debugging to determine that the way the caching was implemented wasn’t helping us: it was actively hurting us. Rather than caching entries individually, it was loading up an entire set of the data and trying to cache a massive chunk of data. Which, since it was larger than the 1MB limit, would fail.

You’d end up with something like this:

  • Hey, do I have this item in the cache?
  • Nope, let’s generate the giant object so we can cache it
  • Send it to the server to cache it
  • Nope, it’s too big, can’t cache it
  • Oh well, onto the next item … do I have it in the cache?

Turns out, this wasn’t just impacting performance. It was hammering our network.

Screen Shot 2013 12 12 at 11 51 30 AM

The top of that graph is about 400Mb/s. The drop off is when we rolled out the change to fix the caching (to cache individual elements rather than the entire object0. It was, nearly instantaneously, a 250Mb/s drop in network traffic.

The lesson here? Know how to use your cache layer.


  1. You can run it locally. It’s super fast if you do. But, if you run it locally, you can’t share the data across servers. It all depends on your use case.

     ↩

  2. That 1MB limit is changeable

     ↩

More Detail on Dropbox Backup

As mentioned in yesterday’s post, I’ve taken to backing up the important guts of my server via Dropbox. It turned out to be very easy, and gives me the added benefit of not having to do any sort of versioning: I get that for free with Dropbox. Plus, it seamlessly integrates with my existing backup routines.

So, how do I do it? It’s honestly very simple.

First, I generate some backups. I run these out of cron every morning.

mysqldump -u root wordpress | gzip -c > /path/to/backup/blogdb.sql.gz
tar czf /path/to/backup/apache2.tgz /etc/apache2/*

I do this for my web databases, any important config files, crontabs, etc. (The actual sites are already backed up, since I develop them locally.)

Once they’re dropped off in the backup location, it’s just a matter of having the script come along and copy them to Dropbox. I chose to write it in ruby. Honestly, my code is 90% of what you find in Dropbox’s example. Here it is, in all of its glory:

require 'dropbox_sdk'

# Dropbox app key, secret, token
APP_KEY = 'get your own'
APP_SECRET = 'get your own'
ACCESS_TOKEN = 'get your own'
BACKUP_DIR = '/path/to/backup/'

# Open our Dropbox Client
client = DropboxClient.new(ACCESS_TOKEN)

# Our hardcoded list of files
files = ['list of files to backup']

# For each file, let's upload it
files.each do |filename|
    file = File.new(BACKUP_DIR + filename)

    # Send file to dropbox -- last param tells us to overwrite the previous one
    response = client.put_file('/' + filename, file, true) 
end

That’s it. I don’t really do any error checking (I should, and probably will some day, but I don’t today). I should probably store the key/secret/token in another place, but since my app was created and only has access to one Dropbox folder, and I can revoke access at any time, it’s not too much of a risk. Eventually, when I get really ambitious, I’ll have the files list be dynamic, not static. But for now, it’s less than 20 lines of code to backup important files.

That’s good enough for me.