gnumatt

Finding the slender threads This

Finding the slender threads

This is something I’ve been kicking around a little bit. People’s blogs are really cool to me because people tend to share personal things about themselves in them. Not only are these personal things shared but they are archived. I’ve long been fascinated by the notion of lateral discovery. You can go to places like yahoo and try to describe your interests so others can discover you. However, interests change over time and your fanatical devotion to the A-Team just doesn’t mean what it used to. What if yahoo could use some sort of introspection by studying your journals to determine your interests? In this case, what if I had some software that studied people’s journals and determined their interests?

At first it would be very simple, just a histogram of common words. I would have to figure out which words were meaningful as interests so I could eliminate all the “the”,“a”,“an”, etc. words. This also presents interesting problems as I’m losing context. How do I separate an interest in the band Anthrax from the deadly spores? I think it would be good to track when the interests were recorded because that might help in regards to context. What does it mean when a whole bunch of people put the word WTC in their blog on the same day? I don’t think I’m ready to build a Bayesian network to predict interest commonality just yet. At any rate, I think I will start crawling the dfwblog’s list and archiving the words in them.

You know it could do stuff like crawl the news sites so it knows which words are news oriented, and music sites to know which words go with music and so forth. You might get some neat context voodoo as it tries to guess whether the blogger meant anthrax in the musical or news context. You could use the timestamping to guess that since anthrax has been in the news context a lot recently that’s probably what they are referring to.

I think it will be interesting to see what my program thinks is on the hearts and minds of the dfwbloggers. Maybe I can even teach it to start blogging its observations about dfwblogger’s blogs.