I’ve meant to use Skitch for a while, and finally got around to using it today. It’s pretty cool. For instance, you can do a search for “phpbb is a piece” and find some fun links:

google results

Then, if you want, you can do some fun stuff to it. Like add some comments:

google results with some color

Tada!

Awesome. Takes about 2 seconds.

 

I haven’t been posting very much recently because I’ve sadly been working my tail off. I very much enjoy what I do, but there are just weeks (months … years …) where it’s just a non-stop grind to get everything done. Recently, it’s been working on launching a new VPS platform, but that was interrupted by a breakdown of our customer MySQL infrastructure.

Our setup is a bit different than most. Since we don’t run a typical box-by-box web hosting architecture, we don’t simply have a thousand boxes with each one running Apache and MySQL. Instead, we have a really robust pooled architecture for everything except MySQL, which just isn’t something that’s very poolable. For MySQL, we’ve got some big boxes with a bunch of memory and some fast disks that handle our MySQL load. But, slowly over time, performance had degraded.

When you’d hop onto a box and look at the transactions per second or number of queries, nothing looked terribly out of the ordinary. Yet the load would be huge, and performance would be pretty bad. Our team brought up some new boxes, shuffled customers between them to even the load out, moved our backup processing onto the hot spare replicated boxes (to reduce even more load on the disks) and things were better.

But they weren’t better enough. (I know, awesome English, eh?)

We started just watching the processlist, looking for the culprit. And after about 5 minutes, it was obvious.

Motherfrakking phpBB spam.

phpBB is written in a really shitty way. Not the forum part, necessarily, which works when it’s not being exploited. But the search part is awful. For every word in every post (unless you’ve got a smart list of words to ignore), it throws entries in some big tables so that when you search for “foobar”, it can tell you every post that contains that work. That’s a fine design for a small board with a tiny amount of traffic. But as your board grows, even legitimately, that table can become hundreds of thousands of rows long (or more!) and inserts and selects can become extremely slow.

It’s ten times worse when the only thing putting content into is spammers who are just flooding it with huge wordlists multiple times per second. Now, all of a sudden, you’ve got this single board showing up in your processlist five times, with each entry running for 30, 40, 50 seconds. One of those boards can cause some extra load on a server.

When you’ve got ten or twenty, it can bring the server to a halt. Literally. I popped onto a server where the load was near 10. I turned off 40 phpBB boards getting spammed. The load dropped to less than 1 and stayed there.

After some quick thinking, we identified a bunch of boards that were getting spammed and turned them off. One of our engineers built a brilliant little monitoring script that can identify phpBB boards in the processlist and shut them off if they show up at a high enough frequency with those awful queries (you know then when you see them, believe me). All told, we’ve turned off maybe 12k boards in the past 2 weeks, and haven’t heard a single complaint.

Why? Because these boards were setup by users who then forgot about them. And there they sat, for months or years, collecting spam, draining resources. Basic negligence on the part of users caused a huge server load, which then caused those same customers to call in and complain.

It feels like we’ve got this mostly under control, except for the user side. We need to figure out a way to get people to realize that the things they install on their site can be exploited and lead to security issues (on their site), performance issues (for everyone), and can suck up the resources they pay for.

But yeah, it sucks when you work about an 80 hour week because people forget about their phpBB install, and the folks who wrote phpBB decided that they’d build the most stupidly designed search setup of all time.

So, when on April 8th my Twitter looked like this:
php is feces

now you know why.

 

I figured I’d log what I did when I upgraded my blog to WordPress 2.5.

First, I disabled some plugins I figured I wouldn’t necessarily need post-upgrade. The two I disabled were Kramer, which grabs Technorati links back to the blog (newly built into the WordPress dashboard) and SpotMilk, a customized dashboard (which I wasn’t even sure would work).

Then I upgraded.

So far, so good.

Poking around the settings, I decided to turn on the global Gravatar usage, rather than using the Gravatar plugin. That’s a great idea, except my theme doesn’t come with Gravatar support, so I’ll need to use the built in functions.

Then my MacBook crashed for the second time today (I think it’s Twitterrific, but we’ll see). Awesomely, MarsEdit earned its keep by having autosaved my work. So back to it.

After poking around, I got the built-in functionality to work, but since it returns an entire image tag, and not just the URL to the avatar image, it’s actually less useful to me than the plugin is. I turned the plugin back on. Good enough.

Next, I noticed the Mowser plugin had a new version. Perfect chance to try the new built-in plugin updating. Clicked the link and that was pretty much it — the plugin was up-to-date. Nifty. You can see the Mowser-fied version of my site here. Not perfect, but pretty good work from a onetwo person company.

Took this as the opportunity to clean up my plugins page. Gone are the aforementioned Kramer and SpotMilk, along with the Hello Dolly and WP-flv plugin I’d installed a while ago.

Now, I wanted to turn some of my hard-coded plugin links into widget usage, to make switching themes easier. I started adding widgets to my left sidebar, expecting that I’d need to go disable them in the code. Nope! Nice, it must use a different bit of sidebar code when you use widgets. Very cool. This allows me to dump a couple more plugins (MyNetflix and a Last.fm one).

Also, turn off WP-Cache when you’re testing, or you’ll be annoyed out of your mind.

One missing widget: I was previously using the Google Shared Items widget, but now I’ll just use the RSS feed for it. Let’s see how that looks … ugly. But, good enough for now. Maybe there’s a WordPress widget for it. Wow, I’m digging the widgets. They make my life a whole lot easier. I should have tried this a long time ago. Even added a little About Me text widget.

Turning WP-Cache back on.

Finally, testing to see if MarsEdit can still post … huzzah! Success. And with that, I’m done.

 

I’ve been meaning for a while to sort of document how I get stuff done at work. It was just over a year ago that I bought my MacBook Pro. Within a week or so, I started using it at work. Probably within the first month, I’d completely moved to my MacBook as my sole work machine. After a year, and particularly since the upgrade to Leopard, I’ve kind of worked out how I get stuff done.

Let’s start with my environment.

IMG_0337
The front wall of my office with pictures taken by a former co-worker. And, of course, the famous “Dwight” flasher flyer from “The Office”

IMG_0334
The shelf behind me containing random stuff I’ve gotten from eating kid’s meals and ice cream sundaes. And some Yankee Swap gifts. Oh, and I have some windows. That makes my life nicer.

IMG_0332
My white board and busted ass bookshelf. And my cool VT light switch cover from Matt, and some random stuff I’ve collected and hung up.

IMG_0333
The view of where I sit. I used to use that big ass monitor to do a dual-monitor display, but since I’ve been moving around so much each day, now it’s just there to keep people from having a good look at me.

IMG_0335
Finally, the MacBook Pro, my Motorola Q, my 30GB 5G IPod, my noise canceling head phones, and my phone that I don’t ever answer or use. And yeah, that’s Win2k running in Parallels. More on that in a bit.

So that’s where I do my work.

My Mac is setup in a very particular way. The upgrade to Leopard with Spaces has made my life considerably easier. It’s probably easiest to roll through how my Spaces are setup.

Space 1
This is where I use my browser, which is currently Firefox 3 Beta 3, and sometimes Safari 3.

Also running on this space is my “chat” clients. We use Jabber at work, which works nicely with Adium. I’ve also got Twitterrific running on this space.

Space 2
Here’s where my Terminal lives, which is just the default Leopard terminal. Tabbed Terminals make me happy, particularly once I made the default tab switching hot keys to be Command+Left and Command+Right.

It’s all command line and vim and mysql. Good times.

Space 3
Space 3 is where iCal and Mail live. Mail is just downloading my mail from Gmail. iCal is doing some cool stuff. I have most of my life in Google Calendar. iCal subscribes to my calendar feeds from GCal (including my work Outlook calendar–more on that in a second).

With all of my stuff in iCal, I then use the Missing Sync for Windows Mobile to sync my calendars to the previously mentioned Motorola Q (which also connects to my work Exchange server, so it’s almost as a good as a Blackberry).

Space 4
It’s the Windows space! I’ve got Win2k (don’t ask, I had a license lying around) running in Parallels, in Full Screen mode. Parallels runs pretty much just so I can run Outlook (for work email and calendaring) and so I can occasionally test stuff in IE6.

My Outlook runs a plugin called SyncMyCal to sync my Outlook calendar off to Google Calendar (which then gets sync’d down to iCal, as previously described).

Other software that occasionally comes in handy:

  • NeoOffice (though it’s slow and bulky and I’d switch if there was a viable alternative)
  • iTunes (obvs)
  • MarsEdit (for doing this sort of stuff)

That’s how I get my work done. Anything else I should be using?

 

While I sit here and wait for myisamchk to finish and tell me that the table that various folks have spent the better part of the day trying to restore is either healthy or once again dead (how’s that for a run-on sentence), I wanted to dump out some of the things we’ve done to try to make our MySQL backend scale. It’s not been pretty, but given that it’s strung together with some Perl, some MySQL, and a bunch of paper clips, I think the folks around me have proven themselves brilliant (I just sit around and pretend to know what’s going on).

Oh, and this is all without a cluster. That’s probably the next step. And “Oh: part 2,” myisamchk finished checking 28 million rows. All is good. I’ve copied the data over to the main server and brought it up and everything is happy. Back to MySQL scalability …

First issue: We’ve got too much data

This one was easy to solve. We got rid of it. Sort of. We started archiving off data that we no longer needed chronologically. Old support incidents, logging, anything that had a timestamp that we aren’t looking at gets sliced off into an archive so that the main tables can be as tidy and fast as we can get them. Which for us is like 8GB of data and not fast at all. But it’s better than 12GB of data.

Second Issue: We’ve got too many connections

When you’re small(ish), it makes sense to throw a bunch of dbs on the same server. As you grow, those connections start to swamp MySQL. MySQL starts to get all panicked, and it doesn’t know how to handle all of the people asking for data, so it starts to get sloppy about closing old handles. Then it’s basically like thermal runaway in a transistor. The server can’t close old connections, new ones open up, adding more overhead, and all of a sudden your nice server has 5000 open connections and is hosed. Again, this was a pretty easy one to fix. Bring up a new box, move some databases it to it, and hope that you’ve built your code layer to make that swap pretty easy (ours was). Presto. Now both of your servers are happy.

Third Issue: We lock up the damn tables all the time

We’ve got a lot of customers who are constantly accessing their sites. We’ve got nearly 1000 support agents across the globe who are using our tools to look at customer configuration to make sure there’s no issues. This puts a whole chunk of load and repetitive queries on the database. That’s easily handled.

Except when you add in a bunch of data updates. Agents, customers, new signups adding and editing data in the database. All of a sudden those hundred pending SELECT statements are stuck because one big select locked the data when an UPDATE came in. Now you’ve got a bunch of web users who think your stuff is slow and/or broken. We’ve tried to attack this in a few ways:

  1. Fix your queries — We watch our slow queries and try to make them faster. We look at our most often called queries and try to make them faster. Sounds simple, but bad queries are the biggest cause of problems.
  2. Add indexes — This goes with “fixing” queries. Add indexes and make sure you’re queries use them.
  3. Perform less queries — Can you cache your data? Can you make less queries and do more in your language of choice? Can you make your users smarter (maybe without them knowing) about when they need to request data? Do it. The less queries you have, the more likely you won’t lock things up.
  4. Split your reads and your writes — If you can split your reads and writes at the code layer, then you can shuttle reads off to one (or more boxes) and writes off to the primary box, and you should lock up a lot less. We accomplished this by having a couple of boxes replicate the main database, and having one of our smart engineers subclass Perl::DBI to look for SELECT statements and swap the database handle over to the read replica. It helps more than you might thing (but it’s not a silver bullet).

Most of this sounds like common sense. It is. But it still matters. We’re trying to do a lot with a little, and every ounce of performance you can squeeze out matters, when your users are super demanding and will use any slowness as an excuse.

There are some other things we should and probably will try:

  • Denormalization to bring data back together and cut down on costly joins
  • Sharding to split our data up into smaller chunks and this cut down on long table scans and huge indexes
  • A real MySQL cluster to optimize reads and writes and spread traffic out to many nodes

I wish I knew more. I’m still barely up the curve compared to some of the engineers and admins I work with. Thankfully, they’ve been able to keep our many million row tables (and many GB data and index files) humming along with few interruptions.

© 2011 That Not So Fresh Feeling Suffusion theme by Sayontan Sinha