source browsing fixed

the most recent redesign (to allow for configuration) used an old version of browseSource.asp, which introduced an error in the listing. i just fixed it, so things should be back to normal. i also changed the default listing to 25 items per page, as opposed to the original 10.

Posted by cameron on October 31, 2001 at 02:46 PM
configuration has arrived

the long awaited set of configurations that i promised have finally arrived! you can now edit your preferences so that blogdex looks appealing to you.

i'm planning on adding more color schemes sometime in the near future (the design is pretty extensible). thanks to some persistence by people, i've also added a checkbox to open links in a new window. as soon as i get some time, i'll engineer it so that this too is a configurable preference. likewise, i'll get around to switching this weblog over to the new design system as soon as i get some time.

if you can think of any other simple preferences that would make your viewing habits easier, please don't hesitate to ask.

Posted by cameron on October 30, 2001 at 02:03 AM
xml feed fixed

thanks to a tip from eric, i've fixed a "bug" that might have affected anyone trying to use xml::xpath to parse the blogdex feed. the problem resulted from some heading whitespace -- i'm not sure if there is a standard regarding what line the xml header needs to appear on, but it's fixed nonetheless.

Posted by cameron on October 22, 2001 at 12:06 PM
announcing (finally) an rss feed!

after a bit of thinking on the issue of feedback (with help from paul nakada and dan chan), blogdex finally offers its first rss feed:

http://blogdex.media.mit.edu/xml/recent.asp

this offers the current top 10 links on blogdex. if you're interested in presenting more than the top 10, you can specify the count:

http://blogdex.media.mit.edu/xml/recent.asp?c=50

will get you the top 50. there is a hard maximum of 1000 links, but any given day only has ~500 links in the range that blogdex notices. this feed is now available on newsisfree.

in the near future i hope to add rss for all of the other pages, so that a weblog can include the last 10 people to link to them, or follow a link as it progresses through memehood. let me know if you have any suggestions. i'm also working on an appropriate button (88x31, by netscape's standards).

Posted by cameron on October 18, 2001 at 11:46 AM
blogdex button

in the process of wasting time, i made a little blogdex button, that will be going out with syndicated content:

it's free to use, available at http://blogdex.media.mit.edu/blogdex-site.gif . i might make a smaller one as well, but for the time being i thought i'd just shrink the logo. i've never made a logo button before, and am not sure whether or not there are any "logo button standards" that i'm supposed to be adhering to.

Posted by cameron on October 17, 2001 at 03:12 PM
weblog in the latimes

in case you didn't catch it, there was a nice story about weblogs in the context of 9/11 last sunday. there's a bit of a mention of blogdex, in addition to an interesting quote from me: "Tech geeks, Marlow pointed out, are Trekkies at heart." the good stuff comes out under pressure.

i'm still keeping track of all of the news that has come out referencing blogdex, to describe blogdex as a sort of meta-meme. maybe i'll work on that right now.

Posted by cameron on October 16, 2001 at 10:58 PM
titles are working

i had a bug in my title crawler which was preventing it from retrieving content from foreign sites. now that it is back up and working, all of the recent titles should be crawled nightly (the older ones will come with time).

Posted by cameron on October 16, 2001 at 10:53 PM
who needs useless words?

as many of you have noticed, the new interface to blogdex includes a list of phrases used by people to describe a given site, giving a context to an otherwise meaningless piece of information. one of the downsides of this information is that sometimes people don't really add any information, but instead refer to sites as "this" or "here." in order to increase the amount of good information, i've stoplisted a few phrases:

1. contextually meaningless words (link, site, url, article, story)

2. certain prepositions (this, that)

3. prepositions + contextually meaningless words (this article, that site, etc.)

4. url's (http://www.thissite.com, www.thissite.com)

if you can think of anything else that's distracting, let me know and i'll recompile the descriptions. i think that it's a lot easier to read now.

Posted by cameron on October 15, 2001 at 12:04 PM
yet another color scheme

for those of you that are frustrated with the 3LIT3 color scheme (white on black), i'm brewing up a more resepectable look for you which should look something like this:

i said that i'd have a solution by friday, but the events of the past week left me broken on friday and the weekend, so i'll get around to it sometime early this week. i'm also allowing for the configuration of the font size and brevity of descriptions.

Posted by cameron on October 15, 2001 at 11:20 AM
a bit of good timing

so i had this talk to give yesterday about blogdex (which is what originally prompted the redesign that has been so controversial). to make a long story short, there is a point in the talk where i make the claim that blogdex is good at identifying memes. sometimes the top links are just links to other news stories, which tends to make people start asking questions. but not yesterday.. because the bert meme was going strong! it really helped drive the point home. it's such a weird meme that even a seasoned memester like myself didn't know how to interpret it. good stuff.

Posted by cameron on October 12, 2001 at 11:35 AM
pardon our packets

it's the middle of our sponsor week here at the media lab, and amist all of the ideas and spittle flying around the room, there's also a huge number of webcast packets flying around our network. this is making blogdex flaky, at least from my end. if you get some weird reaction from the web server, just persist and it will work eventually.

Posted by cameron on October 10, 2001 at 04:05 PM
why do people have to be so lame?

after dealing with a huge spamming of the urls (~ 10000 sites a day), i've added an ip address field to my database so that i can pick out the culprits.

all of the spam has come from one domain, t-dialin.net, which seems to be a pretty major isp in germany. instead of blocking all ip addresses from this domain (which is a pretty big lot), i'll just censor them after they are added.

i really do not want to block any ip addresses, but if the traffic from this guy gets out of hand, then i'll have to. has anyone had any experience with this sort of thing before? since this person probably speaks german, there is little chance that this message will dissuade him, but i thought i would post it anyway.

if you have added your system recently, give me a little bit of time to overcome this problem. it might take a couple of days.

Posted by cameron on October 08, 2001 at 02:17 PM
optimizing mysql

i've been tweaking some of the representations that i use in mysql for better performance. despite some frustrating setbacks (first and foremost being the inability to create an index which has two columns sorted in reverse of each other), i think that things should be faster now. browsing the indexes should be much much faster.

Posted by cameron on October 07, 2001 at 11:35 PM
redesign completed

in preparation for the coming week here at the media lab, i have completely redesigned the site. if you were fond of the old site, there is good news: i will be implementing a bit of personalization so that you can make things look more like the old way:

1. you will be able to get a brief listing, without all of the extra information

2. the colors will be either white on dark blue OR black on white

3. the fonts will be adjustable.

if you have any general comments about the design, please let me know here. thanks!

Posted by cameron on October 07, 2001 at 08:44 PM
site redesign

i'm working on a new version of the site, which should be going up next week. for those that are curious, it can be viewed immediately here. i've take a little bit from the design of google and daypop to provide more information in the initial interface.

the list of phrases given for each site is the references that were made to that site, ordered most popular to least popular. this way you can get a good feel for what each site is about before you visit it.

again, this site is in development, so go easy on it. many of the links do not work yet. i should have it up and running by the end of the weekend.

Posted by cameron on October 03, 2001 at 02:08 PM
we have not been subverted!

i apologize to those that came to the index today to find a majority of the websites dominated by this "freesites.net" stuff. blogdex was not, in fact, subverted, but rather victim to a common problem that could be avoided.

all of the weblogs that use the server "crosswinds.net" last night were taken down by their provider last night. in their place was a generic "this site not found, but please use crosswinds" page. this page linked a number of times to sites that blogdex has not seen before.

the system is aware of all pages that are exactly the same, but right now i am manually using a tool to delete those that are actually the same system (such as the culprits last night). sometime soon i'll put a check in to mark all of the links with duplicate pages as "questionable," and not use them in the statistics.

thanks again to everyone that pointed this out to me, and i'm sorry i didn't get to it sooner.

Posted by cameron on October 03, 2001 at 02:00 PM