The woes of Wikipedia

Robert McHenry has been a vocal critic of Wikipedia, the open source online encyclopedia. In his latest article, he points out several problems with the Wikipedia concept.

The first is that people are malicious, but the normal social disincentives against acting out of malice do not function with the present Wikipedia setup. This is because the authors and editors of Wikipedia articles effectively enjoy complete anonymity. McHenry describes the Seigenthaler libel as the two-by-four that ought to have gotten the donkey's attention, but Wikipedia's response was astonishingly tepid, consisting only of removal of the libelous article and institution of a new requirement that authors of new articles register with Wikipedia. This registration process is almost entirely cosmetic, providing no guarantee that the registree can actually be identified, and there are still no controls at all over who edits existing articles.

I am not a lawyer, but I don't think you have to be one to understand the potential legal liability that Wikipedia is setting itself up for. Up to now, the courts have been hesitant to hold Internet service providers and the sponsors of bulletin boards, blogs, and other discussion forums responsible for libelous comments posted by visitors. The sponsors of Wikipedia have tried to claim the same protection, but I find it difficult to believe that the courts will shrug off repeated libels at a site claiming to be the premier free online encyclopedia if the best defense offered is the very weak claim that authors and editors of articles are "visitors." Wikipedia's head-in-the-sand attitude, if not changed, almost guarantees a devastating legal verdict against the site sooner or later. If repeated libels prove impossible to redress because those responsible are impossible to identify, then the court may well rule Wikipedia a public nuisance and order the site shut down.

Another problem with Wikipedia is that people are stupid. As McHenry points out, there are large numbers of people who are often wrong, but never in doubt:

Not everyone who believes he knows something about Topic X actually does; and not everyone who believes he can explain Topic X clearly, can. People who believe things that are not the case are no less confident in their beliefs than those who happen to believe true things.

This challenges the entire premise behind Wikipedia, which is that everyone who wanders by has some bit of wisdom to offer that can only improve the value of the encyclopedia. Commentator M.Murcek at Outside the Beltway put it best: "If you put a teaspoon of wine in a barrel of sewage, you get sewage. If you put a teaspoon of sewage in a barrel of wine, you get sewage..."

There is a lot of good stuff in Wikipedia. I have found that most of the science and mathematics articles are very good, even excellent. The problem is that I have encountered a few that are not, such as the article on degenerate matter. If I'm ignorant enough about a subject to be looking it up in an encyclopedia, then there is a good chance that I will be unable to determine whether the article I am looking at is reliable.

What is particularly distressing is that none of the problems with Wikipedia are obvious to the ordinary user. Readers of ordinary bulletin boards, blogs, and other discussion sites recognize that those posting comments are visitors to the site, whose veracity and good faith are open to question. I think many of us realize that the Letters to the Editor sections of most newspapers and magazines have become freak shows, and we extend the same skepticism to the analogous sections of Internet discussion sites. The problem with Wikipedia is that the barriers to posting and editing articles are little greater than those of a commentator at a blog, but this is not at once obvious to the casual user of Wikipedia, who assumes -- with the encouragement of the Wikipedia project itself -- that the articles are authoritative.

McHenry asserts that he is not hostile to the idea of a great Internet encyclopedia. One must conclude that what he is hostile to is the open source model.

I am deeply impressed with the success of open source in the software business. My entire department at LANL uses Linux as our workstation operation system. Linux is also used on some of our massively parallel supercomputers. Open source obviously works well in certain contexts.

I am much less impressed with Wikipedia. I find articles that are incorrect, misleading, biased, or grossly incomplete on a fairly regular basis. (That Wikipedia marks many of its grossly incomplete articles as "stubs," and invites readers to complete these article, does not alter in any way the fact that they are grossly incomplete.) This has not been my experience with conventional encyclopedias.

So why should open source work well for software, but not for encyclopedias?

I am reminded of a Dilbert strip in which the evil pointy-haired boss asks Dilbert and his colleagues why a potato chip company can slice, oil, and fry a chip in less than ten seconds when it takes months for Dilbert's company to turn out a new gadget. The shtick, of course, is that the boss is too clueless to understand that potato chips and high-tech gadgets are completely different products. I think that this is the kind of false analogy that the sponsors of Wikipedia have fallen into. So let me outline some of the differences.

The open source model seems to work best when the consumers of a product and the producers of a product are the same people. I believe that the reason why Linux has been so successful in the server and workstation environments is because the consumers of operating systems for these machines -- LAN managers, computer scientists, software developers, and sophisticated users -- are the very people who have the skills and inclination to develop Linux software. Thus, there is little conflict of interest between producers and consumers. I find it noteworthy that the areas where Linux has made the least inroads -- home and small business computing -- are precisely the areas where the consumer community does not significantly overlap the producer community.

In the case of Wikipedia, I believe that there is a clear distinction between producers and consumers. This distinction will only widen if Wikipedia succeeds in becoming what it claims to want to become -- the premier free online encyclopedia. Since Google is only too happy to list Wikipedia hits high in its search results, one suspects that a huge fraction of consumers of Wikipedia are persons with neither the talent nor inclination to write or edit Wikipedia articles. At the same time, the producers of Wikipedia articles have little reason to use their own product. If I develop a new utility or a useful patch for Linux, the odds are that I will make heavy use of the new utility or patched service myself. I am unlikely to make heavy use of an encyclopedia article that I have written myself. The distinction between consumer and producer cuts both ways, and I believe it undercuts the premises of open source.

I find it notable that the areas where I have found Wikipedia the most useful and reliable -- highly specialized and technical articles on math and theoretical physics -- are areas in which the distinction between producer and consumer is unusually blurred. An article that explains the difference between algebraic and transcendental numbers, or gives the definition of an NP-complete problem, is most likely to be looked up by someone who has already heard of transcendental numbers or NP-completeness. This means he already has considerable mathematical or computer science training himself. The highly technical nature of these subjects may also reduce the likelihood of a person falling into McHenry's class of individuals who are ignorant of their own ignorance, though I am far less certain of this.

Why is the distinction, or lack of it, between consumers and producers important to the open source model? When the producers and consumers are the same people, there can be no conflict of interest between them. However, when producers and consumers do not overlap significantly, such conflicts of interest are not only possible, but highly likely. The usual way that our economy reconciles conflicts of interests between producers and consumers is through market processes. But the whole concept of open source is a repudiation of the market.

I like my grocer, Boyd, quite a lot. But I have no illusions that he would regularly provide me with steaks and milk and breakfast cereal if I did not give him money to supply me with these things. As Thomas Sowell pointed out, you don't make money by doing what you want to do; you make money by doing what someone else wants you to do. This is how the market reconciles the interests of producers and consumers. And it doesn't apply to Wikipedia. The authors and editors of Wikipedia articles write what pleases them. If their work also pleases me, it is by happy coincidence rather than because of any clear incentives built into the system.

Another important distinction between operating systems and encyclopedias is the nature of the source code. Linux source code is written in a mixture of assembly language and C/C++ that must be compiled and linked for execution. Wikipedia articles are written in a custom markup language that includes a TeX-like formula typesetting system, but which otherwise hardly merits being described as source code. I am not quibbling over semantics. Source code for operating systems must conform to a strict grammar and set of semantic rules whose violation is immediately disclosed when the code is compiled and linked. Acquiring the ability to write such code is a significant barrier to playing in the Linux game. By contrast, anyone who is literate enough to reach the Wikipedia site is capable of writing or editing articles, at least if he is unconcerned with links or formulas. A person of reasonable intelligence can figure out links and other formatting from examples within a few minutes. Formula typesetting is considerably more difficult -- but I have already pointed out that mathematical articles tend to be much better than average, and I believe that this is another part of the explanation for it.

But it is neither the conflicts of interest between consumers and producers, nor the relative lack of entry barriers to writing "source code," that is the most important difference between Linux and Wikipedia. The most important and damaging distinction is the lack of a decent validation procedure for Wikipedia.

I make my living as a computational physicist, which means that much of my time is spent developing software that is supposed to perform useful scientific calculations for me or my customers. Because I work at a federally funded research and development center (LANL), I am somewhat insulated from market forces, though less than you might think. I cannot maintain my competitive advantage as a computational physicist unless I am able to quickly detect and repair defects in my software. Our group has adopted a number of practices to assure software quality, including Design By Contract, peer reviews, and adoption of coding standards; but one of the most important practices is levelized design and unit testing.

When I implement new features in our software, I am expected to break these features into discrete units that fit into a level hierarchy. These units can call upon services provided at lower levels of the hierarchy, but not on services from the same level or higher. This means that each unit can be tested independently of other units at the same level or higher. These unit tests are meant to tell us at once whether a unit of software is doing what it is supposed to be doing. They also serve as regression tests: If we modify a unit of software, we can immediately double check that the software still correctly does all the things it correctly did last week. We do this by invoking a script that runs the unit test suite.

I am almost certain that the major Linux distributors use a similar model. Yes, their source is open -- you can get a copy and do what you like with it -- but they are under no obligation to take back your modifications, and will do so only if your modifications appear to be an improvement and if they pass all the existing unit tests.

Now contrast this with Wikipedia. How do you validate an article in Wikipedia? How do you perform automatic regression testing to ensure that articles only improve with time? As a former editor with Encyclopedia Brittanica, McHenry is familiar with the tedious validation process used by publishers of conventional encyclopedias:

I was once an encyclopedia editor, but I wasn't one just because I said so. It's not like being an artist, after all. When I began I first learned to proofread, then to fiddle about with galleys and page proofs, then to fact-check, then to write clearly and concisely, and so on; at length I learned (so we agreed to say) editorial judgment. Late in my days I took a hand in training others. There really is something to the job -- skills, knowledge, experience, and maybe even a touch of talent.

The editorial process I learned was very elaborate. Nothing was published that had not been seen by four or more sets of eyes. Why? Because it was believed that our continued employment depended upon our reputation, which in turn rested on the reliability of what we published. There came a time in the 1980s when, like so many other firms at the time, our company was invaded by swarms of consultants. Though our business was explaining and we were, by most accounts, pretty good at it, we found we could not explain to the satisfaction of someone laboring under the burden of a business-school education just why we would proofread an article three times, or fact-check something written by an expert. I finally went out and got an MBA myself just so I could oblige these folks to listen. But they had been taught that they needn't know anything so specific about the businesses they "advised"; theirs was the grand general view, and in that view we spent too much time and money in needless busywork.

There is nothing to suggest that Wikipedia has any analogous process at all.

My feelings about Wikipedia are about the same as my feelings about Java: Neat idea, poor implementation.

The most likely outcome of my posting is that a few people will read it, none of whom have much to do with Wikipedia, and it will have no effect on the Wikipedia controversy. At most, I might be "honored" with a Wikipedia article mentioning my part in the debate, with a link to my blog followed by a link to a snarky ad hominem response. After all, that's pretty much what happened to Robert McHenry.

A vastly less likely outcome is that my posting here will contribute to the collapse of the Wikipedia enterprise. In light of all that I have just written, it may surprise you to learn that this outcome is less desirable to me than having my posting ignored.

The ideal outcome -- which is unlikely, but one can always hope -- is that my criticisms, and the suggestions I am about to make, will be taken to heart by the Wikipedia community, and will lead to a better free online encyclopedia.

My first suggestion is that Wikipedia put an end to anonymous writing and editing. My feeling is that, if Wikipedia refuses to do this on its own now, it will be legally compelled to do so later. The former is preferable. With the advances in electronic signature technology that have been made in the last two decades, there isn't any excuse for not requiring contributors to unambiguously identify themselves and present their credentials. It is not even necessary for the contributors to identify themselves to the public; it is sufficient for Wikipedia to gain enough control over contributions to be able to identify to a court of law (under subpoena) those responsible for libelous articles, and to be able to permanently shut off Wikipedia "contributions" from vandals, idealogues, and posters of thinly veiled advertisements or other vanity articles.

My second suggestion is that Wikipedia end the practice of permitting instant editing of articles. This practice eliminates any real possibility of meaningfully validating contributions. May I suggest a different model, one that is in keeping with the spirit of open source as practiced by the Linux community: ownership of articles.

Under this model, the original author of a Wikipedia article retains "ownership" of the article, not in any copyright sense, but in the sense that he retains control of its content at the principal Wikipedia site. Any edits to the article must have his approval before they are published. Presumably the author is in a position to know whether an edit has improved his article. If he is acting in good faith, he will acknowledge such improvements and permit them to be published.

Of course, one of the problems with the current model is that people are evil and ignorant, and the ignorant are often ignorant of their own ignorance. The author might refuse to acknowledge that an improvement is, in fact, an improvement. Under the ownership model, the solution to this problem is an appeal process. If an author rejects an edit to his article, the editor can appeal the author's decision. One or more recognized experts would then be assigned to evaluate the merits of the appeal. They could choose any course of action from reassigning ownership of the article to the editor to banning the editor from any further postings to Wikipedia.

Who choses the experts? Why, Jimmy Wales, of course, or those to whom he delegates this authority. After all, what's a constitutional monarch for?

Meanwhile, I still find the Wikipedia concept appealing, even if the present implementation is seriously flawed. Maybe I'll wander by later and try to improve that article on degenerate matter ...