February 11, 2007 12:24 PM

Nobody cares

Over the past few years, blogs have become the biggest fad on the internet since porn. And like its slutty older sister, blogging has hit the mainstream big-time and has wormed its way into the lives of people beyond the circle of sweaty, lonely social outcasts who sustained its humble youth.

At its core, blogging is merely a public diary. What makes it "au courant" (French for "fucking stupid") is that it is the latest and most expensive way yet invented to flap your piehole at millions of anonymous rubes. The barrier to entry, as they [1] say, has been lowered to the ground and the great yawping masses have at last embraced the webpage and all it represents. The full flower of individual opinion, stated loudly and ubiquitously, has bloomed in all its shrill and stupid majesty.

No longer must you eschew hobbies and relationships and stand on a street corner yelling at passers-by, leaflets flapping in the wind, in order to get your message across to the masses. Instead, you can now sit comfortably in the living room, eating ice-cream from a bucket and tell the world exactly how Apple's latest pricing structure has affected your sex-life.

The anal bleaching of hobbies, blogging is both life wreckingly narcissistic and breath-takingly painful to watch. Like their authors, blogs offer nothing new to the world. No observation is too trite, no angry denunciation too overwrought and empty to avoid being slathered across a livejournal page. Fitting its fatuous lineage of unicorn-sticker-covered diaries and spiral notebooks covered with crudely-drawn devil skulls, the average blog contains little more than a generic set of observations about whatever suburban "scene" the author haunts, mentions of the inevitable cat (perhaps dressed as a pirate for Halloween, or impishly photographed with a $700 digital camera in a hi-larious pose) and a few limp sentences about whatever band played last weekend.

If I sound like I'm generalizing and glossing over the varied and rich splendor of blogs, forgive me [2]. But in reality, blogs are have all the rich diversity and individuality of a crate of Jello pudding cups. The same dozen topics and links circle endlessly among the listless blog authors/audience like flecks of rot in a slowly draining sink.

You're a fat goth who was bad-touched by your uncle and you cut yourself? Congratulations: there are millions of slack-jawed butterwhales in whiteface just like you, galumphing around our benighted nation's stripmalls. Delete your xanga account and shut up.

Or perhaps you're a gamer with a sense of humor and a willingness to tell it like it is? Cram it, virgin. Any observation you make about the Xbox 360 has been well expressed already by the many thousands of game publications available anywhere porn, cigarettes and Taschen books are sold.

Worse yet are political blogs: the online equivalent of that irritating guy at the local bar who won't shut up about Clinton. Like those ranting, urine-scented wrecks, political bloggers appear to believe that shuffling around second- and third-hand opinions is a fine substitute for expertise. Excuse me, madam blogger, but you haven't changed out of your pajamas in three days. You are not a beltway insider, and don't have a fucking clue about how the real world operates. (Hint: people in power don't have 60th level Rouges in World of Warcraft).

If it's not the diversity of opinion that makes blogs "interesting" to the media, perhaps its the independent, DIY-aspect of blogging? Wrong. Bloggers proudly claim that they are "breakin' the rules" of so-called "dinosaur media". As if the pomposity and college-freshman-level arrogance of this claim wasn't awful enough, the truth of the matter is that they have it all backward. Far from breaking any hidebound rules, bloggers have merely adopted all the worst aspects of journalism and writing and simultaneously eschewed the traits that make these professions valuable. Personal anecdotes passed off as profound insights? Check. Ads? Check. Ads for porn? Check, check and CHECK. A degree from a journalism school? Uh, the Starbucks manager is kicking me out of the chair near the outlet, I'll have to get back to you.

Despite their lack of qualifications, original opinions or, indeed, even basic literacy [3] many bloggers like to pontificate about how blogs, and by extension themselves, pose some sort of threat to traditional media, newspapers, television studios, and so forth. Bullshit. The only threat that blogs and blogging pose is the danger that they will actually convince someone in "old media" of their claims. This is a threat similar to the one that AOL posed to Time-Warner (or, more accurately, its stockholders) in 2000.

Journalists and authors, i.e. people who are trained and paid for writing, got where they are today not because of some secret handshake or exclusive club that only people in their 40s can join. No, wait! Actually, they DID. But that club is called "having skills and paying your dues", neither of which will happen while you sit in cafes behind your sticker-bedecked powerbook. Getting stoned and blowing a Java programmer at Burning Man is not the same as going to journalism school, and writing 1200 words about the latest "wacky" Japanese cultural trend is not the same as getting a real job.

Finally, and most recently, we have the resurgence of "social networking" sites; simulacra of real life for people who can't borrow the keys to mom's leased Escalade. First Friendster (whose pathetic, incompetent crashing and burning should be an object lesson to every Vox, Blurt, Jookster and Gazzag[4] out there), then Orkut, and now MySpace. I have nothing against people using new communication media for making friends and meeting like-minded souls. If only it stopped there. But no, the blank page of ones MySpace profile just begs to be filled with incoherent prose, like a clean field of snow beckons to the dog with a full bladder.

The blight of MySpace cannot be overstated, not only aesthetically (who decided that we missed embedded MIDI and the blink tag?) but also culturally. Endless browser-crashing pages of 20-something whores in training, white "thug" poseurs and horrible, horrible bands is what MySpace and its ilk offer and, sadly, corporations seem to be buying. My only consolation is that all the professed love to Insane Clown Posse, all the braggadocio about drug use and fumbling, unsatisfying sex, all the posing, posturing and humiliating self-aggrandizement of adolescence that we see is being cached and archived forever by the tireless robots of Google, and will come back to haunt the simpletons who offered to share it with the world in the first place. The last laugh will be a bitter one, but it will be ours.

So what's my point here? My "conclusion", if you can call it that, is the precise phrase that will be directed at me by the readers of this frothy rant: "shut the hell up." Sounds like good advice to me.

[1] Assholes.

[2] Fuck you.

[3] Real writers do not "lol" or "rofl."

[4] I am not making these up.


Posted by Jeff | Permalink | Categories: Rants

November 16, 2004 10:29 AM

Naive Bayes Classification

I've put together a quick summary of I've been playing with for the past few weeks. I have the set of all URLs (and their tags) bookmarked at delicious as of 22 January 2004. Before FOO camp, I downloaded the text from these urls (not following any links and not downloading images or flash), removed the HTML tags and cleaned the resulting text a bit (removing non-"typewriter" characters). This resulted in about 17000 different texts, varying in length from 100 bytes (my arbitrary lower limit) to 1 megabyte (a significant outlier).

I then took a median-length subset of these texts (about 2000) and cleaned their tags. This involved some semiautomatic stemming ("blogs" became "blog", etc) as well as some spelling corrections. I set aside 500 of these texts as "test" data and trained a Naive Bayes classifier on the rest (which took about 5 minutes).

The resulting classifier could assign a correct tag to the previously unseen testing texts about 80% of the time. For example, a given URL might have the three tags: "blog linux technology". The NB classifier would, 80% of the time, tag this URL "blog", "linux" or "technology." (Becuase of the data sparsity, as well as the numerical behavior of the probability calculations, the NB classifier isn't really capable of giving multiple, accurate tags in all cases.)

When I returned from FOO camp, I decided to reimplment the NB classifier, and re-run it on the entire corups of 17000 web pages. Unfortunately, the much larger size of this corups meant that I couldn't clean the tags and texts as thouroughly as before, and the results have been slightly less encouraging: about 70% accuracy. Interestingly, about 80% of the misclassifcations were pages being mis-classified as "blogs," which probably means that "blog" is an overly broad tag.

The next step for me is probably to either tweak the Naive Bayes classifier, to see if I can coax better performance out of it, or perhaps to try a different algorithm. I implemented a support vector machine classifier a while ago; I may dust it off and give that technique a shot.

There are some aspects of this data set that make it both interesting and difficult to deal with.

a) It is multilingual (and unlabelled). There are quite a few pages in Spanish, Italian and German that I have grabbed, and these add pure noise. Not only are many of the words in these documents unique (compared to the english portion of the corups), but we lose the efficiency boost of stopwords. A friend of mine (another FOO camper, Maciej Ceglowski) has written a language iddentification Perl module. If I can use that to "weed out" non english pages, that would be a help.

b) The tags are unstructured. It is common to see tags like "linux/hardware" or "design+awful", which implies a heirarchy or structure that other users do not use.

c) The taggers are heterogenous and untrained. Most text classification depends on categories that are stable, orthogonal and uniformly applied. However, delicious users tag urls inconsistently and subjectively. For example, what one delicious user might tag as "funny" is not necessarily what someone else would. Or, in a better example, one user may tag a URL as "nl," short for "neurolingusitics" while someone else may use that tag to mean "the Netherlands."

However, some of these bugs are features. For example, a given URL might be bookmarked (and tagged) by multiple users, and we should be able to leverage this information. For example, if "boingboing.com" has ten tags total and nine of them are the tag "blog" and one of them is "silly", then boingboing is clearly "blog-ier" than it is "silly." Currently, I only consider unique tags for classification.


Posted by Jeff | Categories: Project updates | Permalink

November 13, 2004 1:19 PM

Delicious Tags

A few months ago at FOO camp, I gave a small talk about machine learning, both a technical overview of some algorithms (Bayesian learning, SVM, neural nets and so forth) as well as some practical applications on text-based data. After this talk, I became interested in the idea of analyzing the del.icio.us data with these same machine learning tools. Specifically, I wanted to see if I could implement a good "URL classifier" or "URL recommendation" engine.

A "URL classifier" is some process or algorithm that, given a new URL, generates a reasonable set of tags describing it. Similarly, a "URL recommender" recommends a url that is similar to one you've already tagged.

Although the two tasks are, in some sense, sides of the same coin I decided to tacke the second task first because it seemed easier. "If we tag page A with 'apples' and 'alchemy'," the naive reasoning goes, "couldn't we just offer up another page that someone else has tagged with 'apples' and 'alchemy' as well?" This works for many simple cases, but runs into two problems:

1. Ambiguity of tags. The first example of ambiguity I ran into was during my own searches for papers on natural language parsing. Natural language is frequently abbreviated, and thus tagged, as "nl." Unfortunately, so are many pages about the Netherlands. So, the lesson here is that the meaning of a specific tag depends on the meaning of the page it is tagging (obviously), and the meaning of the tags around it.

2. Different taste in tags. In short, it is possible (even likely) that two different users will tag the same page with two completely different sets of tags.

To be incredibly simplistic, the first problem will result in false positives -- a wrong page being presented as a correct one -- and the second problem will result in false negatives -- no page being returned when there might in fact be excellent matches out there.

Thus, after some study and experiments I determined that tags, while a valuable classifier of data, are "noisy" enough that they cannot be solely depended upon for useful recommendation.

Next: The URL classifier problem, and how it encounters the same difficulties.


Posted by Jeff | Permalink | Categories: New project

September 21, 2004 3:01 PM

Tangled up in FOO

I went to FOO camp two weekends ago, and ended up having a pretty good time. I was worried, at first, that my interests and skills (computer graphics, numerical optimization, and to a less accomplished degree, robotic/new media art) would be irrelevant in the face of the various blog people and "future of SOAP" technical discussions. However, the other attendees were by and large interesting people with wide-ranging minds, and we found a lot of common ground.

Although the people there were sharp and fun to talk to, the conference sessions themselves ended up being a bit weak. They suffered from a certain lack of organization, not only in timing and place, but also in focus. About a third of the talks I went to started out interesting, but kind of wandered off into irrelevent territory. The best example of this being the "Future of Email" talk, which started out with some interesting input on "email as a message" vs. "email as instigating a validated channel of communcation", and degenerated into a room of people listlessly watching some guy in a cloak doing nslookups on his laptop. However, I think disorganization is the price you pay for an ad hoc conference and, as disorganization goes, FOO wasn't bad at all.

Not to end on a negative note, I had some very good times chatting with Tim Anderson, who co-founded Z Corporation, a 3D printer company, MIT media lab art freaks Cameron Marlow and Jonah Peretti, as well as my friends Joshua Schachter and Maciej Ceglowski.


Posted by Jeff Smith | Permalink | Categories: News

August 06, 2004 7:42 PM

Another Old Project

After a long hiatus, I've added another new project to my projects section. Traces, which I worked on during late 1998 and 1999, was an art project that explored the ideas of presence in virtual spaces. In addition to helping to develop the ideas, I did the graphics programing for the CAVE 3D display. Feel free to download the code, but please let me know if you use it.


Posted by Jeff Smith | Permalink | Categories: New project

March 06, 2004 1:36 PM

Two old projects added

I've blocked out the style of "project" entries, and added two of my more recent peices of research to the projects page. For now, the source code download links are dead -- I haven't uploaded the code to smokingrobot from my research machine yet. Also, I'm undecided whether I should give away my research code. I may want to resturn to some of these projects some day, and giving away the code kind of gives away any "advantage" I have over other researchers. Also, I don't want to get into the "I downloaded your code and it doesn't work. Help me!" hellpit that offering code for download opens.


Posted by Jeff Smith | Permalink | Categories: New project