Who The Hell Writes Wikipedia, Anyway?

small_Wikipedia.jpgThe bulk of Wikipedia is written by 1400 obsessed freaks who do little else but contribute to the site, says a post racing up the Hacker News charts. The post pulls this number from an essay Aaron Swartz wrote more than two years ago, based on some comments by Jimmy Wales.  

Wikipedia's growth has exploded in the past two years, so today's number would presumably be a lot higher. But Swartz conducted his own study after hearing Jimmy's comments, and his more detailed findings are even more interesting.

Swartz analyzed percentage-of-text instead of number of edits, and what he found was slightly different: The bulk of the original content on Wikipedia is contributed by tens of thousands of outsiders, each of whom may not make many other contributions to the site. The bulk of the changes to the original text, then, are made by a core group of heavy editors who make thousands of tiny edits (the 1400 freaks).

When you put it all together, the story becomes clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site -- the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it's the outsiders who provide nearly all of the content.

This insight likely offers one explanation for why Google's Wikipedia-killer, Knol, has bombed. What users love about Wikipedia is the ability to make minor contributions (on the fly) to an existing piece of work--they don't want to read or vote on a handful of competing "articles" and petition a single author to make changes.  As long as the original bulk contributor gets the big picture right, the crowd's wisdom can then be applied to the details, improving the collective whole.

Here's an excerpt from Aaron Swartz's post explaining his findings. Read the whole thing on Raw Thoughts here.

I first met Jimbo Wales, the face of Wikipedia, when he came to speak at Stanford. Wales told us about Wikipedia's history, technology, and culture, but one thing he said stands out. "The idea that a lot of people have of Wikipedia," he noted, "is that it's some emergent phenomenon -- the wisdom of mobs, swarm intelligence, that sort of thing -- thousands and thousands of individual users each adding a little bit of content and out of this emerges a coherent body of work." But, he insisted, the truth was rather different: Wikipedia was actually written by "a community ... a dedicated group of a few hundred volunteers" where "I know all of them and they all know each other". Really, "it's much like any traditional organization."

The difference, of course, is crucial. Not just for the public, who wants to know how a grand thing like Wikipedia actually gets written, but also for Wales, who wants to know how to run the site. "For me this is really important, because I spend a lot of time listening to those four or five hundred and if ... those people were just a bunch of people talking ... maybe I can just safely ignore them when setting policy" and instead worry about "the million people writing a sentence each".

So did the Gang of 500 actually write Wikipedia? Wales decided to run a simple study to find out: he counted who made the most edits to the site. "I expected to find something like an 80-20 rule: 80% of the work being done by 20% of the users, just because that seems to come up a lot. But it's actually much, much tighter than that: it turns out over 50% of all the edits are done by just .7% of the users ... 524 people. ... And in fact the most active 2%, which is 1400 people, have done 73.4% of all the edits." The remaining 25% of edits, he said, were from "people who [are] contributing ... a minor change of a fact or a minor spelling fix ... or something like that."

Stanford wasn't the only place he's made such a claim; it's part of the standard talk he gives all over the world. "This is the group of around a thousand people who really matter", he told us at Stanford. "There is this tight community that is actually doing the bulk of all the editing", he explained at the Oxford Internet Institute. "It's a group of around a thousand to two thousand people," he informed the crowd at GEL 2005. These are just the three talks I watched, but Wales has given hundreds more like them.

At Stanford the students were skeptical. Wales was just counting the number of edits -- the number of times a user changed something and clicked save. Wouldn't things be different if he counted the amount of text each user contributed? Wales said he planned to do that in "the next revision", but was sure "my results are going to be even stronger", because he'd no longer be counting vandalism and other changes that later got removed.

Wales presents these claims as comforting. Don't worry, he tells the world, Wikipedia isn't as shocking as you think. In fact, it's just like any other project: a small group of colleagues working together toward a common goal. But if you think about it, Wales's view of things is actually much more shocking: around a thousand people wrote the world's largest encyclopedia in four years for free? Could this really be true?

Curious and skeptical, I decided to investigate. I picked an article at random ("Alan Alda") to see how it was written. Today the Alan Alda page is a pretty standard Wikipedia page: it has a couple photos, several pages of facts and background, and a handful of links. But when it was first created, it was just two sentences: "Alan Alda is a male actor most famous for his role of Hawkeye Pierce in the television series MASH. Or recent work, he plays sensitive male characters in drama movies." How did it get from there to here?

Edit by edit, I watched the page evolve. The changes I saw largely fell into three groups. A tiny handful -- probably around 5 out of nearly 400 -- were "vandalism": confused or malicious people adding things that simply didn't fit, followed by someone undoing their change. The vast majority, by far, were small changes: people fixing typos, formatting, links, categories, and so on, making the article a little nicer but not adding much in the way of substance. Finally, a much smaller amount were genuine additions: a couple sentences or even paragraphs of new information added to the page.

Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that's not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account.

To investigate more formally, I purchased some time on a computer cluster and downloaded a copy of the Wikipedia archives. I wrote a little program to go through each edit and count how much of it remained in the latest version. Instead of counting edits, as Wales did, I counted the number of letters a user actually contributed to the present article.

If you just count edits, it appears the biggest contributors to the Alan Alda article (7 of the top 10) are registered users who (all but 2) have made thousands of edits to the site. Indeed, #4 has made over 7,000 edits while #7 has over 25,000. In other words, if you use Wales's methods, you get Wales's results: most of the content seems to be written by heavy editors.

But when you count letters, the picture dramatically changes: few of the contributors (2 out of the top 10) are even registered and most (6 out of the top 10) have made less than 25 edits to the entire site. In fact, #9 has made exactly one edit -- this one! With the more reasonable metric -- indeed, the one Wales himself said he planned to use in the next revision of his study -- the result completely reverses.

I don't have the resources to run this calculation across all of Wikipedia (there are over 60 million edits!), but I ran it on several more randomly-selected articles and the results were much the same. For example, the largest portion of the Anaconda article was written by a user who only made 2 edits to it (and only 100 on the entire site). By contrast, the largest number of edits were made by a user who appears to have contributed no text to the final article (the edits were all deleting things and moving things around).

When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site -- the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it's the outsiders who provide nearly all of the content.

And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job, you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.

On the other hand, everyone has a bunch of obscure things that, for one reason or another, they've come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.

Other encyclopedias work similarly, just on a much smaller scale: a large group of people write articles on topics they know well, while a small staff formats them into a single work. This second group is clearly very important -- it's thanks to them encyclopedias have a consistent look and tone -- but it's a severe exaggeration to say that they wrote the encyclopedia. One imagines the people running Britannica worry more about their contributors than their formatters.

And Wikipedia should too. Even if all the formatters quit the project tomorrow, Wikipedia would still be immensely valuable. For the most part, people read Wikipedia because it has the information they need, not because it has a consistent look. It certainly wouldn't be as nice without one, but the people who (like me) care about such things would probably step up to take the place of those who had left. The formatters aid the contributors, not the other way around.

