Archive for the ‘Technology’ Category

I’m taking a couple days off from work and heading up to the beach for the day with some folks. Should be a nice break from the ColdFusion-rich days I’ve been spending at work. We’re working on a project to take ColdFusion users from being spread out across our Windows servers and move them to their own Windows servers, where poorly written code can’t take down other, non-ColdFusion pages.

I’m not a big fan of ColdFusion. I can understand why people would use it if they’re not particularly skilled developers, but once you know enough to use ColdFusion well, it seems like you’d want to use ASP, Perl, Python, Ruby, PHP … something …. anything else. ColdFusion runs through Java, so it tends to be slow when you’re running it through IIS. Making things worse, we’ve discovered that ColdFusion’s default JDBC-ODBC bridge is pretty much crap. When you get 12 concurrent database queries, the ColdFusion ODBC service dies. But not gracefully, it gets stuck in a state where it can’t be stopped or started. The box has to be rebooted.

Why does this suck royally? First, because ColdFusion users (at least those using our service) tend to write really crappy code without closing their queries and sometimes running queries within queries, and they can hit that 12 concurrent database queries pretty quickly. But the bigger gotcha, the bigger kick in the junk, is that ColdFusion is installed as a wildcard script map in IIS. That is every single page request, ColdFusion or not, goes through ColdFusion for ColdFusion to decide whether or not it wants to handle it. So when ColdFusion dies, NO PAGES GET SERVED FROM THAT BOX AT ALL. It’s really quite annoying.

Yes, there are some things we could do to mitigate it. The logical one would be to remove the wildcard script map, but that actually breaks ColdFusion (some wonderful work you’ve done their, Macromedia/Adobe).

So, we’ve actually decided to segment ColdFusion users and use a native JDBC driver rather than the ColdFusion JDBC-ODBC bridge. I’ve spent the last few weeks of my life on this, moving customer sites, testing, writing Perl code to automate the process. It’s been fun taking existing ColdFusion DSNs and recreating them in the new format. It’s been more fun finding out the various little things that the new JDBC Microsoft Access driver doesn’t support that the traditional ODBC driver does. (Hey, for some ridiculous reason you’ve got an Access replicated table? Fantastic, the JDBC driver won’t read it. Hey, you’re using RND in your query? Fantastic, the JDBC driver doesn’t support it. Hey, you’re using a raw ‘Yes’ in your query to match a checkbox column. Fantastic, change your query to != 0 to pick up the positive values.)

It’s been a long few weeks, but it’s gone pretty smoothly, all things considered. But I hate ColdFusion, and I hate Access.

Thus, I’m off to the beach. Where there will be no ColdFusion. And no Access. Just my iPod, newly loaded up with all of the episodes of the Band in Boston podcast, as pointed out by Bostonist. I’ll be back soon feeling refreshed and ready to deal with more ColdFusion fun.

The company that hosts my site, DreamHost, has been experiencing some serious problems recently. Hardware dying, power outages, and just general instability has caused this site to blip up and down a few times, and has caused more outages for The House That Dewey Built. DreamHost does a nifty thing and has a little bit of transparency into the process through their status blog, where they attempt to keep folks up to date with what’s happening on the hosting side. It’s a really great idea, assuming you have the stomach to expose yourself to the world.

Having worked in the shared web hosting industry for about 10 months now (not at DreamHost), I can relate to the troubles they’re having. No matter what you do, hardware dies in unforseeable ways, and when that happens, occasionally backup hardware goes down too. Or one problem masks another, and you peel back the layers until you get down to the main issue.

In a dedicated hosting environment, this usually means you take down a server or two and have a couple of expensive customers mad at you. It’s usually a manageable situation.

In a shared hosting environment, each problem means potentially thousands of customers have an issue, and it means they come looking to you for answers.

This DreamHost Blog post is a pretty good example of what can happen when you expose yourself to your customers. You get the good:

The thing that Dallas didn’t say, is how smooth DreamHost is taking it, I had my doubts about the company before I signed up, but this week has proven that I made no mistake, and that the admins know what they are doing.

Good job DreamHost, lets hope next week brings better luck.

And you get the bad:

My clients are frying about the recent reliability issues. I am loath to move them (over 100), but clearly DH needs to spend more of its profit margin on hardware!

Can we all say, together, “R E D U N D A N C Y” ! !

And, if your landlord can’t seem to get its power systems stable maybe they need to be sued for non-performance.

We’re using DH for business, not for play. It’s got to be more reliable

Maybe Dreamhost will be nice enough to cover all of the ad sales I’m losing off my website. Now it’s JUST my ad server database that’s down… they’re really sticking it to me…

Again, I love transparency, and I think this is just some of what you have to deal with when things go wrong. I wish the company I worked for was more transparent sometimes (and, the reason I’m not being transparent about where I work is because, as a corporation, I don’t think they’re ready to be transparent just yet, which is unfortunate).
And, again, you’d think that I’d be right there griping with the rest of the masses, given that this has affected *my* site.

But I’m not.

Because I’ve seen the other side. I’ve *been* on the other side. I’ve been the guy trying to explain to a customer why their site was down, or why it’s still down, or why it may not be back up right away. The customer never sees the planning side–they just assume when something goes wrong that it was preventable and someone’s fault. Sometimes it is.

Sometimes, however, it’s the customer’s fault. Let me explain …
DreamHost is a shared web hosting provider, and a low cost one at that. They spread a bunch of sites out across servers, to better utilize the resources of the server, and they make money off of the folks who barely use their sites. The 5% of users who completely pillage the servers are money losers, but they’re the price you pay for bringing in all of the less active users. It’s all about economics and economies of scale.

Here’s where customers are to blame. They expect that the slice of the server they’re renting for $7 a month is *theirs* and they should be able to do whatever they want to it. If they want to run a CGI script that uses 100% of the processor, why not? If they want to move all of their binary data into their MySQL database so that they can pretend that they’re not really using any web space, who are you to argue that they’re killing the database server? They’re the customer, they’ve paid for their chunk of the server.

Well, yes they have. But it doesn’t give them the right to kill things for the rest of the folks on that box. Unfortunately, most shared web hosting customers don’t think about things in those terms, in terms of a community. They just think about the site they own.

This is what leads to problems. The 5% of customers complain that they want to use cron, and 5% of a big number (total customers) is a big number, so you give them cron. Then they bog down the server, so you spread them out to other servers, which they then bog down. Replace cron with anything (unlimited databases, unlimited bandwidth, multiple websites on the same account, etc.). All propositions that lose the host money, but are done with hopes of attracting a large enough clientele to make it a winner.

But all of the additions occasionally come with the cost of downtime, or instability, or whatever. It certainly sucks, and it’s not excusable, but it’s shared web hosting.

So I say to the fellow complaining about his ad revenue: why the HELL are you on shared hosting if it’s so important to you? Spend the money and get a dedicated host.

If your site is so important that it can’t afford downtime, you should be paying for a dedicated host. It’s that simple. The 5% of customers who cause server problems are almost inevitably the same ones who should be on dedicated hosting. Remove them from the mix and things would be far more manageable.

The flipside is that you can’t drive them away. You can’t afford to. They’re the advocates. They’re the folks who pimp the hosting (and get their affiliiate kickback) and bring in the other 95% who make the money. It’s a double-edged sword, and one that’s made all that much tougher when a company like DreamHost takes an admirable stance and tries to be transparent. It’s the same thing that Jason Calacanis has been discussing with his offer to pay DIGGers to post stories on the new Netscape DIGG-alike–your most active users tend to be partially responsible for the success of your product (even if they often cost the most).
Shared web hosting is a fun, exciting, and often stressful biz. Doing it naked can sometimes bring the pain, and I can relate to what the folks at DreamHost are experiencing.

Now please make sure my site stays up.

So, Working Stiff just keeps on plugging along. Greg Joyce‘s attempt to get his movie some visibility online is working.

10000 views of the trailer. That’s a lot. You can make it 10001.

I’m interested in this for a few reasons:

  1. Greg’s an awesome guy.
  2. Working Stiff is a funny movie (really! I have it on DVD and have even seen it in a theater. And I actually did some advertising for it …. which I’ll link to someday).
  3. I’m utterly interested in the idea of turning conventional media on its head by leveraging the internet. Here’s a movie that played a few times in the Boston area and I think once out in LA. It’s a small, independent film with out a single name attached. It’s not uber-trendy, nor shocking. It’s a straight up romantic/workplace comedy. It’s better than 90% of the Nora Ephron-esque tripe that gets released. But, it simply will never get seen via conventional means. But now, via the ‘net, it’s viewable online. You can subscribe and get chunks of it automatically downloaded to your computer or iPod to watch when you want. I have no idea how many people are actually viewing the movie via the Podcast, but anyone who’s subscribed is getting a chance to see a movie they never would have otherwise. That’s pretty cool.

I just finished watching Season One of The Adventures of Pete and Pete which arrived from Netflix a week or so ago. If you’ve never seen Pete and Pete, it’s a show that aired on Nickelodeon in the early-to-mid 90s about two brothers named … Pete. The show started out as some 60 second shorts, which were popular, so Nick said “here’s more money, make some 30 minute specials,” which were more popular, which lead to Nick saying “just make us lots of shows.” And they did, and it rocked.

It rocked because it was this surrealist, absurdist kid’s show, teaching a moral in each episode, but doing it in a style that was edgy for the time (and holds up surprisingly well 10 years later). Topping it off, the creators/writers (who’ve gone onto stuff like Newsradio, Shrek, King of the Hill, and Buffy) worked in as many pop culture references and jokes as they could. What other show would have Juliana Hatfield as a cafeteria worker, Steve Buscemi as a nerdy dad, and Iggy Pop with a recurring role as a dad who acts remarkably like Iggy Pop. It’s the type of show where the family finds a car buried at the sand in the beach, uncovers it, and drives it home … like it’s completely normal.

Watching it now it reminds me a lot of Scrubs. So much so that I don’t think it’s possible to say that Scrubs wasn’t at least partially influenced by Pete and Pete. Both shows about a nerdy character who narrates the show, with a dizzying array of transitions into fantasy/surreal situations, that play as if they’re completely common place. Both shows featuring a soundtrack of the “indie” rock sound of the time, and playing basically with the single camera format.
All of this made me think about how cool it is that a show like this can survive and live on in DVD format. Poking around this weekend, I found that there’s two really cool video podcasts on iTunes that send out an old cartoon that has entered the public domain a few times a week. The coolest one is ReFrederator. A few times a week you download a 5-10 minute cartoon of Bugs or Daffy or Mighty Mouse. It’s insanely cool and a wonderful way to keep those old cartoons fresh. The same idea is done by Vintage Tooncast, though they seem to be focused more on showing things that you wouldn’t see today (because of the racial and cultural sterotypes that were so pervasive). It’s an ingenious use of syndication technology.
It also made me think about how cool it would be if networks did this with more content. Sure, the big networks are putting there shows on iTunes for 99 cents a pop. And Fox has talked about putting shows online with ads for free. All fantastic stuff. However, wouldn’t it be great if networks (especially networks that own most of their own content) put up old shows on iTunes? NBC has done this with some stuff, but I’d love if Nickelodeon let me grab an episode of Welcome Freshmen or Disney let me grab an episode of Duck Tales at my leisure. Pay them $30 and get a weekly podcast of shows automagically downloaded to your computer until they ran out of shows. Or pay the 99 cents to get the ones you want.

Outside of content clearances and figuring out how royalties and whatnot are paid out, there’s not a legitimate reason not to do this. Well, other than fracturing an already fragile television landscape. The first network to really embrace this is going to make lots of money (assuming they do it right).

For the past year or so, I’ve been dorking around with some of my friends with the idea of a basketball statistic that attempts to measure what a player brings a team. You know, take his points, his assists, rebounds, blocks, etc., and throw them all into one big number. It’s been a fun diversion, and an excuse to think about math and some web programming again.
The idea is based on the work done at sonicscentral.com towards something that’s been called Points Created (an attempt to parallel Bill James’ baseball stat “Runs Created”). It’s not perfect, but I’ve had some dorky fun, and, quite frankly, the ratings have come out moderately ok.
Recently, when I realized I could dynamically update this from the web rather than doing it via Excel, I set out to create a Perl script that would enable me to run it, have it grab the latest stats from the invaluable dougstats.com, and then generate the stats for everybody in the NBA.

A few hours later, I had something working.

#!/usr/bin/perl
use LWP::Simple;
use CGI;
my $query= new CGI;
print $query->header;

The basics: the hash-bang, and includes for LWP (to get the data over the web) and CGI (so I can pass in parameters).

my %Data;
my @row;
my @PlayerStats;
my $PlayerName;
my $PlayerStatsString;
my $url = "http://www.dougstats.com/05-06RD.txt";

my $PointsCreated = 0;
my $PCperG = 0;
my $PCper48 = 0;

Here we set up all of the variables. A hash to contain the player data. Arrays for handling a row of data and a row of player statistics. Scalars for the player name, the string of text representing the data, the URL to get the data, and then some internal values for calculating statistics that aren’t in the downloaded data.

my $sort = $query->param('sort');

my $stats = get($url);
die "Couldn't get data" unless defined $stats;

@row = split(/n/, $stats);

shift @row;

Here we get the data. We grab the sort parameter (so I can determine which value to sort on — more on that later). We go out and get the data (or die, if we can’t get it). We split the data on new lines into rows of data in the array—each array element is a full row of text data. Finally, we shift off the top row, since it’s the category text and we don’t want that in our stats.

foreach (@row) {
($PlayerName, $PlayerStatsString) = split(/s+/, $_, 2);
@PlayerStats = split(/s+/, $PlayerStatsString);

my $DefRebs = $PlayerStats[11] - $PlayerStats[10];

$PointsCreated = $PlayerStats[18] + (0.75 * $PlayerStats[12])
+ (1.03 * ((0.75 * $PlayerStats[10]) + (0.25 * $DefRebs)
+ $PlayerStats[13] + (0.5 * $PlayerStats[15]) - $PlayerStats[14]
- (0.71 * ($PlayerStats[5] - $PlayerStats[4]))));
$PCperG = $PointsCreated / $PlayerStats[2];
$PCper48 = ($PointsCreated / $PlayerStats[3]) * 48;

$Data{$PlayerName} = [@PlayerStats, $DefRebs, $PointsCreated, $PCperG, $PCper48];
}

delete $Data{"Player"};

Ok – here’s where some of the magic happens. I iterate through each row of data, and split the row into components: the player name and then the combined player stats. Then I split the player stats into individual stat buckets. I build some of the intermediate stats that aren’t in the dataset—defensive rebounds, and then the Points Created and Points Created per Game and per 48 minutes.
Toss everything into a big hash, with the hash key set as the player name, and just make sure there’s not an element that is the row of column headers (the delete line). I could probably toss this last line …

I won’t get into the details of the Points Created formula right now, but there’s some (limited) intelligence behind those coefficients. Basically, it’s an attempt to quantify how many possessions a player creates or loses, turn that into points, and then add in the points the player actually scored to come up with a final total. I’ve been working on a more refined version with some other folks that better integrates assists and the fact that not all hoops are created equal.
Quite frankly, that’s about it. The rest of the script is just output, dumping the data in a simple table to the screen, and throwing in some links to allow some basic sorting. If you check out the Points Created display (or the possibly improved adjusted Points Created), you can see the results of the work.

The basics: both metrics say that LeBron James has created the most overall points this season. The adjusted method has Allen Iverson edging out James for PC/G, whereas the original has James edging out Iverson. The adjusted method likes point guards a lot more than the original method. Part of me thinks it likes them too much, but what do I know.

In summary: I’m a dork, but not a big enough dork to do this stuff as anything more than a part-time hobby.