Log Everything (And I’m Not Talking About Weblogs)

Posted on November 12, 2007
Filed Under Web Programming | Leave a Comment

Heart Chart - Monitor the health of your site

More specifically, log anything that could impact your visitor’s experience if it all goes horribly wrong… and make sure you are keeping sufficient detail.

The most obvious case would be the error codes in your basic access log. If many people are getting a 404 on a particular page then you know something is wrong. Perhaps you moved the page and your visitors have bookmarked it or are following an external link to the old address. In this case you now know that you need to redirect your visitors. If the page has never existed then you’ve likely created an incorrect link somewhere in your site. Look in the access logs for the page your visitors were on before they requested the non-existent page then view the source of that page and search for the incorrect link.

Remember, humans are capable of being stupid in far more creative ways than any computer (although the computers can repeat it faster)

An example…

The contact and tell-a-friend forms on Website To Dos use a number of ideas to make them less attractive to spammers. One of these is a custom coded text based captcha. Logging the activity on these forms really showed how creative some visitors can be.

Currently the captcha has only one test (although it has been designed to make it easy to add new types). It asks visitors to concatenate characters (though using smaller words than that).

About the only concession it made in its first incarnation was to treat lowercase and uppercase letters as equivalent. Initially the most prevalent error in answering the captcha was to mix up certain letters and numbers ’1′, ‘l’ and ‘I’ for example, also ’5′ and ‘S’. This error was virtually eliminated quite easily by simply restricting any given captcha to only letters or only numbers. Given the difficulty even a simple captcha seemed to present I also shortened the length of the captchas generated.

With that problem fixed it revealed the next most common error, punctuation and whitespace; some people would type the quotes around the characters, some would separate the characters with commas or whitespace. When I first noticed this, as a quick fix I added an instruction to the captcha to not type any quotes that might surround the characters. This made virtually no difference. Clearly, people do not read the instructions. A captcha has to be as “discoverable” as possible. A more effective fix was when I got the code to remove all non alphanumeric characters before checking against the expected answer. 100% effective against this issue.

Without logging I would have never known how many people were finding my forms too complicated to use. Without full details in the logging including the question, the expected answer and the given answer I would never have known in what common ways they were failing my custom captcha.

Now I still get many bots a day banging their collective heads against the forms with, in most, cases an empty captcha answer, but I no longer get many failures from real visitors.

I still get some though. What should I do about the type of creative visitor who will merrily type in 25sevennine when asked to concatenate 2, 5, 7, 9 (yes, they were all presented to the visitor as digits in that case)?

Sometimes you’ve just got to say enough is enough.

And on that note…

Comments

Leave a Reply