new webloggers confirmed
Yet Another Explainable Problem (YAEP)
saturday morning bug
url escapes
crawler confusion
i'm hot on the trail
one million mark
July 2003
June 2003
April 2003
March 2003
January 2003
December 2002
November 2002
October 2002
August 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
it appears that the folks at nupedia.com are using the address blogdex.com to direct traffic to their service. i just wanted to mention that we are in no way associated with the nupedia project, nor are we quite sure why they're using the domain name 'blogdex.com'.
after being away for over a week, the number of unconfirmed, newly added webloggers had grown to nearly 700. this was a daunting task for our crew, but we tackled it on friday. we should be back on a tight schedule (~ 1 day) again for site additions.
so last night the crawler failed to complete AGAIN, thanks to the fact that I ran out of disk space :) I guess I didn't really need a copy of the database from EVERY DAY SINCE BLOGDEX BEGAN. got rid of those and now we're good to go.
oops.
somehow, the weirdest bugs always seem to create themselves on saturday morning. i was at home, relaxing with my morning tea, watching a bit of the ballers cribs, when for some reason the index page stopped displaying any links. after getting to work, i still don't know what happened, but it should be working again for the time being.
the statistics were fine, as the other index pages and rss would reveal, but the main index was broken.
some of the information pages were not being found due to unescaped url characters. i'm not sure how this one got by me for so long, but it's fixed nonetheless. unfortunately, now the urls are twice as long as they were before.
for brevity's sake, i did not escape "/" and "\" since these don't seem to affect anything. i was also forced to escape the non-standard "%" and "+" due to the escaping of escaped url's problem (i need to prevent perl from unescaping urls that are in the database in that form).
anywho, let me know if you experience any issues, or think there is a better solution . . .
for some reason the crawler didn't complete successfully last night. it's running right now, and the new statistics should be up shortly. all of these hiccups should finally be resolved when we get the new crawler in place, probably within the next week or so.
sorry for the garbage interface, but i'm hot on the trail of the new problems that i've created by tweaking a few of the underlying systems.
we just broke 1 million observed links. and we're not stopping. in honor of this joyous event, we'll be testing out our new crawler over the next couple of days. it should provide nearly striking up-to-the minute results and fantabulous efficiency. plus, it should enable us to try all kinds of new experiments that were impossible under our old model. more to come...


