Archive for March, 2006

Week in Progress

Wednesday, March 29th, 2006

Monday: Saw Metric
One of the better live shows I’ve been to. The opening bands (The Islands, End of Fashion) were more than just bearable, which was quite a surprise to me after some of the openers I’ve endured in the past. On the topic of openers, I really enjoyed The Islands and will have to pick up one of their discs.

I long suspected Metric would sound fantastic live since their sound isn’t over-produced. Almost every sound you hear on their albums can be easily reproduced live with the four people on stage. I hope they don’t change that very important quality as they grow in popularity.

Tuesday: Spicy Chicken Goes Straight for the Damned Eye
We didn’t want to bother with cooking dinner, so Emily and I went to the local Whole Foods Market (organic grocer) for their excellent hot/cold soup & salad bar.

I’d just dished some red spicy chicken into a box and was buttoning it up when one of the flaps slipped and catapulted an ant-sized chunk of concentrated hellspice straight under my glasses lens and into my favorite eye.

It’s moments like these when the world around you dissolves, you drop all of your inhibitions (and spicy chicken), and madly paw at your eye with a napkin trying to sop up the burning red pepper oil before it reaches your brain.

Thankfully, I walked away from that situation with both eyes intact, but I count myself among the lucky. Also, that spicy chicken can burn in hell. Last night I ate beef.

Also Tuesday: House Cleaners Aren’t PC Savvy
We hired some cleaners to a nice deep spring cleaning on our home. When I returned home, I found my computer’s mousepad with built-in wrist rest rotated backwards; 180° from normal.

My amusement was furthered when I noticed that there wasn’t a mouse on the pad, but a stapler. And a razor blade.

After righting the pad and putting away the other stuff, it occurred to me that the mouse was nowhere to be found. It took me a minute to locate it on the bookshelf.

I suppose the cleaning people weren’t so PC savvy. They were, however, very handy with the sponge and broom, so I’m not complaining.

Protect Your Bandwidth From Leeches

Sunday, March 26th, 2006

For those with concerns about limited bandwidth and images that others might frequently link to, you will benefit from my anti-leech script generator.

This easy to use page will quickly generate a mod_rewrite script that you store in an .htaccess file in the root of your website or the directory you wish to protect. I’ve added images.google.com to the allowed domains by default so you don’t block innocent searchers.

Convert Your Website to Subversion

Thursday, March 23rd, 2006


This guide is a simple step-by-step guide to help you convert your currently live website to a handy-dandy revision-control system called Subversion. It also contains information on how to create a discrete development environment where you can modify your site and test your changes without affecting live traffic.

If you:
…want a place to beta test your code
…want to keep a revision history of your code
…have every modified or deleted a file on your website and wished you hadn’t done that
…realize that CVS sucks cat ass
You should try subversion!

Assumptions:
•You are hosted on a *nix machine*.
•You have SVN installed or can install it.
•You have SSH access to your site.
•Your web host allows for the creation of subdomains (if you want a dev environment).
•Your HTTP document root is not the root of your home directory. For example, when you log in to SSH, you see a list of system directories but one of them contains all of your web content.

Notes:
•The command syntaxes that I provide are those used by FreeBSD. If you’re savvy on another system and you notice a syntax difference, please speak up.
•Throughout this text, I will refer to your web documents directory’s absolute path as docroot, the repository as repos, and your home path (the full path from the drive root to your home directory) as homepath.
•The paths described above do not contain trailing slashes. Do not add a trailing slash to paths unless specifically indicated.
•Always always always back up your data, especially before attempting something like this!
Subversion official documentation

Preparation:

You backed up, right?

1) First, start by running svn –version to verify you have Subversion installed. If not, you or your sysadmin should install it for you!

2) In your web root (~), create a directory named svn_import.

3) Within svn_import, create the following directories: trunk, branches, and tags. **

4) As a security precaution, add the following line to a file called .htaccess in your document root: RewriteRule (^|/)\.svn - [F]

5a) If you wish to add your entire site to revision control, use the command cp -R -p -v docroot/* ~/svn_import/trunk/ to copy your entire site into the import directory.
Note: The -p tag on the copy command is important, as some files on your site may have nonstandard permissions that you wish to preserve.

5b) If you only wish to add parts of your site, copy only those files and directories into ~/svn_import/trunk/

Creating and importing:

6) Create the repository: svnadmin create ~/.svnrepos

7) Import your site: svn import ~/svn_import file:///homepath/.svnrepos -m “Initial import”
Note: Yes, file:/// is correct. Make sure you’re using the full system path to your home directory for homepath

Creating your development area:
(optional, but highly recommended)

8) Using whatever means you use to manage your web site, create a subdomain for your web site called dev.yourdomain.com.

9) HTTP password protect your new subdomain, so people don’t go poking around your work in progress. After all, you’re creating this dev area for a little privacy while you’re working.

10) Check out the repository into the beta site document root: svn checkout file:///homepath/.svnrepos/trunk dev_docroot
When this process completes, you should see the message At revision 1.

11) Look over your new dev subdomain. All of your pages *should* work on this new site, but if you hard-coded any full system paths, these ought to be changed to relative or dynamic.

Converting your live site to SVN

12) Create a temporary checkout directory: mkdir ~/svn_stage

13) Check out the repository to the temp dir: svn checkout file:///homepath/.svnrepos/trunk ~/svn_stage

14) If you got this far without backing up your site content, you should do so right now and then kick yourself in the ass for not paying attention to important instructions.

15) Move your repository onto your live content: mv ~/svn_stage/* docroot/
This previous step may seem odd, but the SVN repository you just moved onto the live site contains lots of .svn directories that are required to make your site act like a repository. It also helps ensure the shift doesn’t interrupt your site traffic, and if you opted for #5b, it doesn’t harm the files you didn’t want to add into SVN but still want to keep.

16) Test your fancy new revision control-enhanced web site!

Post-Conversion

17) Remove ~/svn_import

18) Remove ~/svn_stage

19) Edit the file ~/.subversion/config and uncomment the global-ignores line. You may wish to add space-delimited files that you want SVN to ignore. If you chose to follow step 5b, you should add the files and directories you didn’t want to import here.

Usage
•Now that your site is under this wonderful new revision control system, you can FTP all of your file changes into the dev_docroot path and test them on your dev subdomain. Whenever you make a change you want to store to SVN, you should commit your changes with the following command:
svn commit dev_docroot
•If you don’t provide during a commit using the -m flag, you will be prompted for notes about the commit. Add a brief description of the changes you made so you can revisit that version later on.
•Once you have checked in your fancy changes, you can check them out in your changes by running the following: svn update docroot
•Both your regular and beta document root are now repositories. This way, you can technically make and commit changes from either location. Whether or not you choose to do that is up to you, but the entire point of a dev area is that you don’t ever program in the live path again. It is my suggestion that you don’t directly modify SVN-controlled files in your live path.
•Now might be a good time to remind you to RTFM. This manual is well written and contains gobs of good information on how to use SVN.

If you’re used to CVS
First, congrats. You’re on the road to saner and happier revision control.
Next, you should read this.
Perhaps the biggest shift that you should know about is that revisions in SVN are commit-based, not file based. It is not accurate to say “file X is version 56.” You should say “file X was modified in version 43.” The key difference here is that any number of files and directories can be changed in any revision.

As an interesting aside to all of this, I like to keep a repository of every binary I download and install (applications and drivers). So I don’t have to keep the entire collection on my laptop or desktop, I have a media server that is also running SVN for Windows. My laptop and desktop use the SVN server to store all of the binaries I’ve downloaded. I can delete all but the most recent versions of the apps so they don’t appear on my other computers, but I’ll always have a backup copy of old versions within SVN.

* This document was written based on FreeBSD commands. If you are running RedHat or some other linux, the command flags may differ slightly. Refer to your help documentation to verify command flag differences.

**Click here for an explanation of the trunk, branches, and tags directories. If you don’t care about branching or tagging, you really don’t need to worry your little head about this.

Gmail Invite Spooler Post-Mortem

Thursday, March 23rd, 2006

Nine months after closing down the Gmail Invite Spooler, the page remains one of the most popular landing pages on my site. Over the past several months, this page has averaged around 2,500 unique visitors a day. I’ll explain the arc of this wonderful service, but first I’d like to make one thing very clear:

Sorry, I do not have any Gmail invites. Please don’t ask me for Gmail invites. I am truly sorry that I cannot provide you with any. Please go to Google Mail for more information on how you may get your own account.

You may obtain an account without an invite these days. All you need is a cell phone.

Background
In 2004, Gmail was a very hot commodity. Since April 1st of that year, people were clamoring to get in on the exclusive beta of pre-IPO Google’s hottest new offering. In late summer, by the time I decided to write the spooler, Gmail invites were no longer selling for $100 or more on eBay, but there was a large amount of clutter on the internet with people asking for or offering invites.

The forums and blogs that I visited were littered with chatter about trading invites, but the givers and seekers didn’t seem to be coming together efficiently. It wasn’t uncommon to see multiple posts back to back asking for and offering invites.

My wife may not always like it, but when I see a problem, my mind immediately gets to work on a solution. This was one problem that I knew I could make a simple fix for in a matter of hours.

The First Incarnation
Throughout its lifetime, the basic workings of the page remained the same: People with gmail invites would send them to a specific email address. The spooler would then read those emails and store the invites in a database. Site visitors could come and claim available invites on a first-come, first-served basis. There was no backordering of invite requests. When demand exceeded supply, one had to wait until someone else donated some.

Originally the spooler was made solely for the use of the people on the forums I noticed suffered most from inefficient offers and requests. It was a very simple system that was only workable on a small scale, but I assumed it would only ever see a few hundred hits.

On the first day that I had the spooler open, I received 2,592 Gmail invites. The second day saw 4,574 more coming in. By the end of the second day, I had over 3,000 unclaimed invites.

It didn’t take long for word of the “magical free Gmail site” to leak out to the general internet. Within a few days, demand exceeded supply and I had to implement controls on the page to prevent people from refreshing constantly while waiting for a new invite to come in. I also got the first of many lessons in writing code with scaling in mind as I divorced the mailbox checking from page loading.

Ups and Downs
After a month of running the service, the average inbound invites per day dropped below 1,000 for the first time. It seemed that most of the people who had extra invites on hand had heard about the service and donated all they were willing; Google was not giving out new invites on a regular basis at that time. The inbound invites continued to decline through most of December 2004 until they hit a low around 50.

All this time, demand for invites remained strong. I recorded as many as 100,000 visitors and over a million hits per day. On December 20th, the drought was over as Google started to give Gmail users about five fresh invites each day. The average day saw around 2,500 new invites, but they were still being snapped up as soon as they came in. I implemented more restraints to prevent abuse and further streamlined my code in order to keep my server load at a reasonable level. During this time, my web statistics began to break down because Webalizer couldn’t process all of the data without choking.

Way, Way Up
On February 2nd, 2005, Google decided to open the flood gates. They began giving out around 100 new invites per day to Gmail users. My service experienced demand increases like I’d never seen before. For the first time, I was forced to benchmark my code and decide which methods to use based on how many milliseconds they took.

For only the second time, supply was greater than demand. Anyone wanting a Gmail invite could get one through my service without any delay. Unique visitors increased, but hits dropped way down since users had no need to refresh frequently to see if new invites had arrived.

Way, Way Down
Monday June 6th, 2005 was the day I received an email from Stephanie Hannon, Gmail’s Product Manager. Later that day, I had a conference call with Stephanie and her superior regarding my service. They felt that services like mine had become a threat to the quality of Gmail. Their reasons for making the service invite-only were many:

• Limit new subscribers
• Heighten demand and curiosity
• Limit accessibility of accounts to potential abusers

The last reason was the one that made them care about my site. Spammers and abusers have a higher threshold of entry without the spooler. Despite the fact that I think Google should do more on their part to prevent automated account creation and duplication, they do have more random people gaining access to invites through a service like mine.

In short, Google felt as if too many spammers and abusers were getting invites that they obtained from me and saw this as a threat.

Why I Pulled the Plug
I’ve received a few thousand emails asking for invites, complaining about how “unfair” this is, or asking for source code. In the early days after pulling the plug, I would respond to every request with an individually written response explaining the situation. This generated many replies suggesting I just re-open the system in defiance.

Aside from the fact that I really don’t wish to burn any bridges with Google (heck, maybe they’d forget all this and hire me if I ever applied), I have good technical reasons for not re-opening the spooler: My service relied on people with Gmail accounts constantly inviting the now blocked email address gmail@isnoop.net.

Google is no dummy. They know full well that they must track the email addresses that the invites are sent to. They can (and did) automatically invalidate every invite sent to my site. All 1,240,162 invites I had left over the day I shut the service down instantly became duds. To continue the service, I would have to change the method of catching new invites to one substantially more inconvenient for the donor.

In the end, insistence on keeping the spooler open would have certainly summoned the massive lawyering machine deep within the “don’t be evil” company and I don’t think reasonable person wants that fight.

Fast Forward to Today
The former Gmail Invite Spooler page is now a brief testament to what was once the most popular Gmail invite spooler on the internet.

The bulk of the current 2,500 visitors per day come from non-English speaking blog sites that haven’t yet gotten the message that the page is closed. While the rest of isnoop.net has a 66% US visitor rate, the spooler is only 17% US traffic; it holds the #1 slot by less than one percent.

Almost all of the dozens of emails and stray blog posts requesting invites ask the same thing (in broken English). I saw the need for folks who didn’t speak my native language to get the full story, so I wrote a simple script to help them out. This has helped reduce the confused request flow, but it has also crimped the last of my dwindling AdSense revenue. Oh well. I ran this site before it ever brought me a penny and I’ll continue to do so for as long as I have the energy.

I turn down all requests for the source code for the spooler. If Google doesn’t want me starting fires in their back yard, I’m certainly not going to give away my matches to all of the other neighborhood kids.

I have considered revamping the spooler for use with other invite-only services, but I’ve yet to see one of great enough popularity and of proper nature to justify the effort. I refuse to open up such a thing for a community-based website on the principal that it breaks the “six-degreesâ€? network they’re trying to build up by bringing in random people with no association to the inviter.

Media Coverage
Aside from a number of blogs and forums that mentioned the service, these are the print media references I am aware of:

Book: Google Search & Rescue for Dummies – 2005
Text

Book: Google Hacks – 2005
Text

Popular Science Magazine – June 2005
Close-up
Full page

The Mercury News (San Jose) – May 23, 2005
Online version

PC World – April 13, 2005
Online version

Sydney Morning Herald – April 9, 2005
Close-up
Full page
Online version

Are you seriously still reading this?
This post covers almost all of the points I regularly discuss with folks who have questions about the service. I hope this overly long post has satisfied your curiosity.

Whatever you do, don’t click my ads!

Wednesday, March 22nd, 2006
Sponsored by:


If you are an adsense user, you may have seen this email:

Google AdSense Policy Enforcement
Hello,

While reviewing your account, we noticed that you are currently displaying Google ads in a manner that is not compliant with our policies. For instance, we found violations of AdSense policies on pages such as http://isnoop.net/gmail/

Publishers are not permitted to encourage users to click on Google ads or bring excessive attention to ad units. For example, your site cannot contain phrases such as “click the ads,� “support our sponsors,� “visit these recommended links,� or other similar language that could apply to the Google ads on your site. Publishers may not use arrows or other symbols to direct attention to the ads on their sites, and publishers may not label the Google ads with text other than “sponsored links� or “advertisements.�

Please make any necessary changes to your web pages in the next 72 hours. (truncated…)

It’s nice of them not to bring the hammer down on me for having text that said “Please patronize our fine sponsors,” but it’s even more interesting to see where different ad companies draw the line.

The previous text was officially approved for use on my site when I was serving up AdBrite ads. In fact, AdBrite called me on the phone one morning to ask me to change it from the original text which read something like “Please support this service by visiting the sites below.” I assumed that sort of direct phrasing was frowned upon, but I wasn’t sure and ignorance is bliss.

I assume that Google would disapprove of me posting “Whatever you do, don’t click my ads!” above my AdSense, so that’s why I’m not going to do it. Instead I’ll just publish this blurb about making nice for the kind folks who might just pay me a few dollars towards the costs of running this dedicated server.

Whatever you do, don’t click my ads.

On Geocoding

Thursday, March 9th, 2006

I learned some valuable lessons in high traffic geocoding this week. All this because Google doesn’t offer geocoding services for Google Maps, so you must send them latitude/longitude numbers for any point you want to plot.

This begs the question: How do I quickly come up with the lat/lon coordinates for Shanghai, Anchorage, Indianapolis, Portland, and Seattle? Google does provide a handy link to a Google search for “free geocoder” in their maps API documentation, but none I’ve found have a decent API, work for free, or can sustain the amount of traffic I might request. I’d greatly prefer owning a database and performing the lookups under my own processing power.

The answer I came up with for the low-demand Seattle Emergency Events Map was to screen scrape Google’s own mapping service to see what coordinates they come up with for a given location. It wasn’t pretty, and it wasn’t mine, but I was already using Google so what the heck.

That solution worked beautifully until I got 20,000 visitors to my Maps + RSS package tracking page on Monday of this week. Apparently, Google doesn’t appreciate being hit that much. They temporarily shut down access from my server’s IP to the page I was scraping with a message indicating they’d detected excessive automated behavior. They said something about my tools maybe being a virus. They also kicked my mom in the shin.

When I was notified Google-scraping geocoding wasn’t working anymore (never screen scrape without setting a failure mechansim), I pulled the code and provided a nice message for my site’s visitors. Google dropped the block shortly thereafter, and I hear they gave my mom flowers and apologized for that regrettable shin thing.

I checked out various solutions, trying to find a geocoding database that suited my needs. The US Census TIGER database was far too in-depth and only dealt with US locations. I ended up deploying a commercial IP-to-location database that contains the coordinates for any city that has an IP range associated with it.

Google employees, please skip the following paragraph.

My current geocoding solution involves a lookup in the ip2location tables. If I cannot find a position from there, I check a database cache of locations obtained from Google. If that fails, I scrape the location from Google Maps and cache it for future reference. If that fails for any reason, I go back to the ip2location database and make a darned good guess as to where to point. This typically means centering on a state or even entire country, but it’s better than nothing. This method results in very low traffic to Google, but my goal is zero external reliance.

This geocoding method shouldn’t be long-lived. I plan on converting a copy of the TIGER database for US addresses and purchasing a listing of a few million world locations. I’m always in favor of saving money, so if anyone knows of a free world cities geocoding database, or already has the TIGER database converted to a query-able format, please let me know.

Once I’ve got a satisfactory geocoding system built up, I’d like to open the access and make a public API. That’s down the road a little way, but keep your eyes open for that.

Package Tracking With Google Maps

Sunday, March 5th, 2006

Package tracking with Google MapsI’ve just published an update to my universal package tracking tool that now enables you to view a map of your package’s progress as it travels across the country.

This new mapping addition builds on the original features of being able to track UPS, FedEx, USPS, and Airborne/DHL packages all in one place and having that tracking information published into a personalized RSS feed. The system automatically detects which company your tracking number belongs to and loads the package data for you.

A nice side benefit of this new addition is that I’m developing a pretty robust Google mapping class, helping my other map projects to evolve.