Limits of self–organization: Peer production and “laws of quality”
First Monday

Limits of self-organization: Peer production and laws of quality by Paul Duguid



Abstract
People often implicitly ascribe the quality of peer–production projects such as Project Gutenberg or Wikipedia to what I call “laws” of quality. These are drawn from Open Source software development and it is not clear how applicable they are outside the realm of software. I look at examples from peer production projects to ask whether faith in these laws does not so much guarantee quality as hide the need for improvement.

Contents

Laws of quality
Gracenote
Project Gutenberg
Wikipedia
Modularity and granularity
Conclusion

 


 

Open Source software has shown that networked communication can build individual contributions into collective, synergistic projects without intervention from formal institutions or dependence on conventional expertise. Not surprisingly, people have sought to repeat this success beyond the coding of software. Jimmy Wales, the founder of Wikipedia, has claimed that “the software that really runs the Internet ... it’s all written by volunteers ... and it’s really good quality stuff,” before going on to argue that peer production, the social system behind Open Source, “could be expanded to other types of area” (Wales, 2005). Wales thus raises the question what “other types of area” are suitable. In a speech to an Open Source conference, Paul Graham (2005) suggested that business is one viable area, arguing that firms can acquire much more than free code from Open Source. They should learn, he argued, “not about Linux or Firefox, but about the forces that produced them. Ultimately these will affect a lot more than what software you use.” In The Wealth of Networks, Yochai Benkler (2006) looks well beyond the boundaries of the firm to “every domain of information ... from peer production of encyclopedias to news and commentary, to immersive entertainment.” [1] Benkler has a broad view not only of where peer production will work, but also of what it might achieve, concluding that it may lead to “qualitative improvement in the condition of individual freedom.” [2]

Claims so far reaching deserve serious consideration, but their very reach makes them hard to assess. One way to approach assessment, I suggest, is to take up the question of quality introduced by Wales and Benkler and discussed elsewhere in Graham’s essay. The appropriate question to ask may not be “Are there high quality products from peer production?” Answers to that are likely to depend very much on the projects and examples chosen. More useful answers might emerge by asking, “What is it about peer production processes that assures quality?” I argue that two “laws” of quality, also borrowed from Open Source programming, explicitly or implicitly back up quality claims for peer production. It is important to understand the applicability, the strengths, and the limits of these laws.

First, peer production projects constantly change. What is flawed today may be flawless tomorrow.

While I hope this focus on quality will allow discussion to rise above the partiality of particular examples, it nonetheless requires examining specific cases drawn from a selection of projects, a task which I am quite aware is difficult, tendentious, and usually tedious. The task is difficult for several reasons. First, peer production projects constantly change. What is flawed today may be flawless tomorrow. Unless we can guarantee improvement, however, we should not use this ideal of perfectibility as an excuse not to scrutinize the quality of projects as they exist now [3]. Second, the more successful projects are very large. The English Wikipedia claims more than one million entries from some 50,000 contributors. Thus projects that are more easily assessed because comparatively small may also be comparatively weak, and conversely projects that are strong will be difficult to assess [4]. Third, following on from this last point, from its beginning, analysis of the Internet has attended primarily to quantity, not quality. When Lyman and Varian (2003) estimated annual production of information, they concluded that five exabytes or about “37,000 new libraries the size of the Library of Congress” were produced each year. Comparisons between quantities of this sort are not easy, but they are easier than comparisons between the quality of the Internet or domains within it and the Library of Congress. Nonetheless, the Internet, often casually compared to a library, hums with qualitative comparisons [5] — Linux is better than Windows, blogs are better than newspapers, Apple is better than everything, and so forth. Of course, some of these are of the same type as claims that the Red Sox are better than the Yankees or Barca than Liverpool, more grounds for a good argument than a search for a defensible answer, but for more serious questions we need more defensible answers. Benkler shows understandable hesitance when noting, “anecdotally Wikipedia appears to be a reasonable substitute for most commercial encyclopedias.” [6] Such hedging becomes problematic when the quality of Wikipedia is offered not as grounds for discussion, but grounds for claims about peer production, human freedom, or the need for a new intellectual property system [7].

Evaluating quality is also, as examples just given suggest, tendentious, quickly dissolving into partisanship supported by anecdote or assertion. Wikipedia’s “Replies to our critics” reflects the binary divisiveness of many debates: “Some [people] are nearly instantly hooked, and love the idea; others think the idea is so absurd as not to require any serious consideration.“ [8] Summing attitudes into polar opposites may be fine for discussions of the designated hitter or 13–player vs. 15–player rugby. For peer–production projects like Wikipedia, however, it excludes the large middle ground, which is where most ordinary users might be assumed to stand. As with dogmatic religions, where to embrace one you must renounce the other, these sorts of arguments expect people to vote up or down — Google or the library, Britannica or Wikipedia. For most of us, however, judgments of quality require finer tuning: When does an inconclusive answer on Google indicate an ill–formed question, and when a “dark area” of Google or the Internet? Where is Wikipedia or Britannica likely to be strong or weak? Questions of quality, that is, are less about what single source to trust for everything than about when to trust a particular source for the question at hand. When is AltaVista likely to be more helpful than Google? When is it wiser to turn to the Oxford Dictionary of Biography than to Wikipedia? When is a Penguin paperback a better source than Project Gutenberg? When might Instapundit be more insightful than The New York Times? And, for all these questions, when not? These choices confront the complex complementarity of different resources. Non–partisans, people who can entertain both enthusiasm and skepticism at the same time, face such choices every day.

Benkler’s argument that peer production will lead to a “more critical and self–reflective culture” [9] may thus put the cart before the horse. To fulfill the potential of peer production, we must first become more reflective or self–critical. We need a better understanding of the connection between the means of production and the quality of the outcome, to be aware of the likely strengths and possible weaknesses of different approaches, to consider why a method works when it does, and to become constructively critical of systemic weaknesses when it does not.

In the spirit of constructive critique, I begin by suggesting that many claims about the quality of peer production rely on two notions generalized from software which I call the “laws” of quality. The writ of these laws may perhaps run within the bounds of software production; we should be cautious before assuming they run beyond [10]. With these laws and their limits in mind, and with the quality of peer production in question, I go on to look at three peer–production processes: Gracenote, Project Gutenberg, and Wikipedia [11]. I have chosen emphatically cultural areas of Benkler’s “cultural production system” aware that these are not areas where peer–production is known to be strong. But these are areas where Benkler and others [12] believe peer production works [13].

Many of the problems I highlight can be fixed in minutes. Some already have been. What is significant for this argument is the extent to which particular problems throw light on systemic issues with a process or project. It is one thing to arrest a politician for corruption. It is another to understand what it is about the political system that fosters corruption. With that analogy in mind, we should consider whether faith in the possibility of continuous fixes leaves general problems in place. (Captain Renault, after all, ordered a round up of “the usual suspects” exactly to ensure the root cause went undetected.) My examples also suffer from a level of minutiae that might at best be described as tedious. That unfortunately is often the level at which the projects I look at are assembled and their quality assessed. Indeed, having discussed these examples, I go on to discuss the levels of both contribution and analysis, arguing that what Benkler calls the “modularity” and the “granularity” of distributed projects affect the capacity of laws of quality to generalize from software to other domains. In closing, I offer some broad suggestions for improving the sorts of project I have been discussing.

 

++++++++++

Laws of quality

Two ideas are often invoked, either directly or indirectly, to defend the quality of peer production. The first is “Linus’s Law” (Raymond, 1998). This holds that “given enough eyeballs, all bugs are shallow.” (The name is a tribute to Linus Torvalds, who initiated the Linux project). The idea that any problem is ultimately trivial comes from software development where, according to this law, the number of people contributing to a project provides a useful indication of its quality. Hence Linus’s Law neatly bridges the gap between the quantitative assessments the Internet facilitates and the qualitative judgments people tend to make.

Raymond (1998) formalizes his aphorism to argue that “given a large enough beta–tester and co–developer base, almost every problem will be characterized quickly and the fix obvious to someone.” This justification introduces some important limits to the law. First, it suggests a qualitative threshold for participation. The eyeballs of software beta–testers and co–developers are pre–selected for competence and often separated one from the other by the complexity of the systems they are working on [14]. Development is not a task for ordinary users. The law is further protected by another aspect of software coding: solutions must compile and run. Hence, while Open Source software has relied heavily on peer production and to a lesser extent on peer review, for quality, it relies as heavily though perhaps less obviously, on the chip and the compiler as ultimate arbiters. These two both identify problems with the code and reject inadequate solutions [15]. In the absence of such a stern gatekeeper, we have to ask what in other forms of peer production enforce Linus’s Law. What might it mean to “compile” a Project Gutenberg submission? How might a Wikipedia entry be said to run? Or to crash? Finally, we should note that Linus’s Law is primarily about debugging. It says little about building [16].

The second implied law of quality comes from Paul Graham who claims that “The method of ensuring quality” in peer production is “Darwinian ... People just produce whatever they want; the good stuff spreads, and the bad gets ignored” (Graham, 2005). A surprising inversion of Gresham’s Law (that bad money drives out good), this claim deserves to be known as Graham’s Law. Like all laws, it is an assertion, not an argument. And like all such laws, it needs to be bounded. Gresham limited his law to money; Moore limited his to microprocessors. Does Graham’s Law apply to all peer production? Wikipedia implicitly invokes Graham’s Law when it urges readers to trust articles because they are subject to “potentially constant improvement over a period of months or years, by vast numbers of experts and enthusiasts, possibly updated mere minutes before you read it.” [17] The Wikipedia entry on Project Gutenberg makes a related argument, “A marked improvement in preserving such text can be seen by comparing earlier texts with newer ones.” [18]

Freedom of speech is not the same as the freedom to replace other’s versions of the truth with your own.

Such assertions reflects an optimistic faith that the “truth will conquer.” While this optimism has roots in Milton’s Areopagitica, it is perhaps a particularly American, democratic belief, enshrined in the First Amendment. Such optimism no doubt makes good political principle, but it does not dictate political process. Freedom of speech is not the same as the freedom to replace other’s versions of the truth with your own. The authors of the U.S. Constitution and the Bill of Rights may have believed that open debate leads to political truth, they did not believe that the Constitution would improve were it changed at the whim of each citizen’s changing view of truth. Consequently, the U.S. Constitution has significant built–in inertia. Committing bug fixes is intentionally a complex process. As this example may suggest, Graham’s implication that continuous tinkering only makes things better is highly suspect. It is hard to see why entropy would be indefinitely suspended by peer production. In areas of “cultural production,” in particular, progress is not necessarily linear, and neither the latest (nor the earliest version) of a work always the best. As Miles Davis had occasionally to persuade John Coltrane, a good deal of art is the outcome not of ceaseless work, but of knowing when to stop and recognizing when, as Shakespeare put it (anticipating the notion of the “tipping point”) “a little more than a little is by much too much.”

Together these “laws of quality” and their subterranean presence in many discussions of peer production argue that more people making more changes only make things better — that numbers and time work in favor of quality [19]. In some — even many — cases they may. In some they appear not to. Rather than taking the laws on faith, we need to ask in which cases the laws work, in which they do not, and if they do not, why not. So we need to look at cases where the laws have failed to work and then to ask — in general, systemic rather than individual, particularistic terms — why. Without other constraints at work, these highly idealistic assumptions can be, I shall argue in looking at the examples that follow, quite misleading and mask serious problems not merely with individual products but with the general process.

 

++++++++++

Gracenote

Far less well known than it is used, the Gracenote database provides the information when someone puts a music CD into a networked CD player for the first time and the details of the CD (track names, artist, album title, etc.) are displayed in, for example, an iTunes window (see Figure 1 below). Though not usually numbered among peer production project, several features of Gracenote make it a reasonable candidate. It is widely used (leaving other music players aside, some 50 million iPods had been sold by the first quarter of 2006) so attracts a lot of eyeballs. The owners of those eyeballs have the right to contribute, to inspect, and to make changes and many do. Consequently, it is reasonable to argue that both Linus’s and Graham’s Law should apply. Further, music players offer some constraints that might be thought of as similar to the demand that code “compile” and “run.” The laws of quality should work well here.

To consider whether they do, take, for example, the 1958 recording of Don Giovanni released by London Records. In March, 2006, Gracenote provided the following data for iTunes users:

 

Gracenote data for Don Giovanni

 

 

Gracenote data for Don Giovanni

 

 

Gracenote data for Don Giovanni

Figure 1: Gracenote data for Don Giovanni.

 

Although the categories (name, artist, album) could hardly seem more simple, different users had made quite different decisions about how to catalog the entries. The contributions for the first disc were clearly confused. The data for the second and third are both plausible, but both quite different. While it’s not too surprising to find different contributors making different decisions about different discs, it is disconcerting to find so much variability within a set of discs that are expected to work together. (Gracenote presents four different conventions for the nine–disc set of Arte Nova’s 1997 recording of Beethoven's String Quartets [20].) The idea that that Act 2 of Don Giovanni should follow and not precede Act 1 is a reasonable requirement for any such system. With the discs cataloged this way, however, the music would not play in order. So, despite minimal conditions for coherence, numerous eyeballs, the potential for constant improvement, and a system wherein bugs should reveal themselves, the system was surprisingly buggy.

The bugs help identify limits to self–organization in such peer production. For example, they show that even quite simple cataloging and ordering is a curiously uncertain task (Bowker and Star, 1999). Stray from music usually found in the Gracenote “Top 10” and apparently well–defined categories like “Name,” “Artist,” and “Album” blur. There are plausible candidates for the “Name” of a track while the “Artist” in Don Giovanni might reasonably be the composer, the conductor, the soloists, the orchestra [21]. There are even different ways to identify the album [22]. Consequently, without strong coordination, quite simple data input distributed among volunteers can quickly become inconsistent. Apparently straightforward tasks defeat self–organization and often require a guiding hand to keep them on track. Benkler (2002, 2006) has pointed to NASA’s “Clickworks” project as a good example of peer production producing high–quality results from small contributions by numerous independent volunteers. Lessig (2001) has reached back to the production of the Oxford English Dictionary in the nineteenth and early twentieth century for another example. Both were remarkably successful projects. But both, in place of a chip below, had formal organizations above, dividing the tasks, vetting the input, coordinating the results, and driving eyeballs towards neglected nether regions.

So, in trying to estimate the qualities of peer production systems, we should not infer the quality of its quieter backwaters from the gross numbers of users.

Problems with Gracenote also remind us that, though many eyeballs survey a project and many hands update it, work on the system is not necessarily distributed equally. In recognition of this, Wikipedia limits the trust it puts in “improvement” in the quotation above to “widely circulated articles” and you can probably put equal trust in the Gracenote “Top 10.” In both projects, by contrast, the far reaches of the long tail truly suffer from neglect. So, in trying to estimate the qualities of peer production systems, we should not infer the quality of its quieter backwaters from the gross numbers of users. Conversely, assessing the quality of a system overall should involve not judging the entire project as a unit, but distinguishing backwaters from the central aqueducts. Before we put faith in a particular topic, it would seem, we need to know whether it is a hot topic [23]. As I will argue later, that requirement puts an odd burden on ordinary users.

 

++++++++++

Project Gutenberg

Project Gutenberg is one of the earliest instance and best–known examples of networked peer production. Founded in 1971 by Michael Hart, volunteers have used this project to make “etexts” of out–of–copyright works available online. With a “principal of minimal regulation,” the Project is predominantly self–organizing. If the Project accepts that a submitted text is free of copyright restrictions, volunteers do the rest, preparing the edition for the database, usually by scanning. Since 2000, the Distributed Proofreading Project has provided quality assurance, distributing proofreading tasks among yet more volunteers [24]. By 2006 Project Gutenberg had some 17,000 titles and two million monthly downloads. Though it has been described as a “large, well–organized” and “comprehensive scholarly” initiative (Roberts, 1999), Hart insists that the Project is not only minimally regulated, but also minimally interested in the punctilios of scholarship. “We do not write,” Hart claims, “for the reader who cares whether a certain phrase in Shakespeare has a ‘:’ or a ‘;’ between its clauses. We put our sights on a goal to release etexts that are 99.9% accurate in the eyes of the general reader.” [25]

Yet in certain cases, Project Gutenberg’s process may have the perverse result that its texts are more useful for the scholar, whose practiced eyes can discount the problems, than for the ordinary reader [26]. Take, for example, the case of Laurence Sterne’s Tristram Shandy. The recent film Cock and Bull Story (2006) has brought unexpected attention to this curious eighteenth–century novel, so a general reader might well turn to Project Gutenberg to find a free but reliable copy. The film, as viewers will know, is a film about making a film because the book, as readers will know, is very much a book about making a book — an exploration and exploitation, as the critic Hugh Kenner (1962) argued, of “typographic culture.” [27] The film makers took seriously the task of conveying in an alternative medium Sterne’s experiments. Actors within the film constantly agonize about the making of movies just as Sterne’s narrator, Tristram, worries about the process of writing, the restrictions of linear narrative, the oddities of books, and the limitations of paper and print. Submitting simple CD details to Gracenote is, as we have seen, more difficult than it looks. Putting such a self–interrogating, eighteenth–century book online, particularly uploading it as ASCII text, would never present itself as anything but an editorial challenge.

Project Gutenberg’s overall approach implies that editorial decisions are somebody else’s concern and that at heart texts are self–evident and well–ordered. From the first lines of Tristram Shandy, however, editorial decisions are inescapable:

The Life and Opinions of Tristram Shandy,
Gentleman.
A work by Laurence Sterne
(two lines in Greek)
[28]

Though the code did not compile — Greek text did not make it into ASCII — the program continued to run because what in this case must be called “the editor” decided to route around the problem by inserting a quick patch — parentheses to indicate a change. That is a reasonable decision, but also a problematic one because the Gutenberg editor later uses parentheses to route around different kinds of problem. Hence we get challenging passages such as:

By inspection into his horoscope, where five
planets were in coition all at once with Scorpio
(Haec mira, satisque horrenda. Planetarum coitio
sub Scorpio Asterismo in nona coeli statione,
quam Arabes religioni deputabant efficit Martinum
Lutherum sacrilegum hereticum, Christianae
religionis hostem acerrimum atque prophanum, ex
horoscopi directione ad Martis coitum,
religiosissimus obiit, ejus Anima scelestissima
ad infernos navigavit — ab Alecto, Tisiphone &
Megara flagellis igneis cruciata perenniter. —
Lucas Gaurieus in Tractatu astrologico de
praeteritis multorum hominum accidentibus per
genituras examinatis.) (in reading this my father
would always shake his head) in the ninth house
...

Here, the closing of one parenthesis (“... per genituras examinatis.)”) and the immediate opening of another (“(in reading this my father...”) are clearly oddities. In a book that is full of typographic oddities, Gutenberg’s general reader might well shrug in puzzlement and pass on. These are not Sterne’s oddities, however. They are the editor’s. What Sterne actually introduced after Scorpio was a footnote. He then continued with the parenthetical comment,

By inspection into his horoscope, where five
planets were in coition all at once with Scorpio1
(in reading this my father would always shake his
head) in the ninth house ....

Unable to deal in ASCII with footnotes any better than Greek, the editor again used parentheses [29]. General readers, if they catch on to the convention (they are given no warning), might wonder whether what we had on the title page was a footnote (it was not — though later readers confront the odd phrase “(footnote in Greek Philo.)”). As the book proceeds, readers are likely to get yet more confused. For Sterne himself not only uses parentheses, as this passage makes clear but also footnotes within parentheses [30] and parentheses within footnotes [31].

Tristram Shandy has yet more challenges. For example, one of the most famous pages, as a page, in all literature comes after the death of the vicar, Yorick, when Sterne inserts two black pages into the book as if in mourning. Presenting a page, as a page, in ASCII is undoubtedly difficult. Making it black even more so. But first, in the spirit of Linus’s Law, the bug has to be recognized before it can be classified and fixed. Later in the book when Sterne inserts marbled pages, the editor recognizes this and inserts another editorial interpolation between parentheses to mark their absence (“(two marble plates)”). Here, it may be hard for Gutenberg’s general reader again to recognize that this is not one of Sterne’s own parentheses, footnotes, or editorial asides, but rather a Project Gutenberg editorial intervention. In the case of the black page, however, there is nothing. The editor chose to overlook it altogether. In this case, we might assume that an HTML version could deal with the problem. There is an HTML version of Tristram Shandy on Project Gutenberg’s site. It also fails the test. It does insert blanks, but they are white not black (see Figure 2).

 

Tristram Shandy a la Project Gutenberg

Figure 2: Tristram Shandy à la Project Gutenberg.

 

The appearance of a white page rather than a black page appears to present readers with an isolated problem, but it is a decision with ramifications, for Sterne also inserts blank pages and even blank chapters in his book.

In the end, however, it is not pages but chapters and volumes that present the biggest challenges to a general reader — even though (unlike Greek text and footnotes), such divisions present no inherent problems for ASCII. The novel was originally published as nine volumes with two– to three–dozen chapters numbered sequentially within each volume. The Project Gutenberg version appears to come from a four–volume edition that ignored the original volume divisions and chapter numbering. It bundles into the four new breaks — the epigrams, dedications, mottoes, etc. — whose relevance depended on their placement in the original nine volumes. The new divisions create unnecessary puzzles. For example, when Sterne writes “if I thought you was able to form the least judgment or probable conjecture to yourself, of what was to come in the next page, — I would tear it out,” he is making a slight joke. This occurs on the last page of the first volume and so theoretically there is no “next page.” A modern reader with a competent text would at least see an editorial insertion along the lines “end of the first volume in the original edition” and with new volume and chapter numbering, get the joke. For the Project Gutenberg reader, the joke, slight as it is, is lost without warning between Chapter 1.XXV and Chapter 1.XXVI. Meanwhile, the simple statement at the beginning of the next book “I have begun a new book” now makes no sense at all, or at best appears to be another inscrutable Sterne joke.

While the Gutenberg version is a victim of the four–volume edition used, it is probably the Gutenberg editor who compounded the problem by deciding to call each of these new divisions a “chapter.” Thus the Project Gutenberg ASCII version opens with Chapter 1.I, where Sterne had Volume 1, Chapter 1. And it ends with Chapter 4.XCII, where Sterne had Volume 9, Chapter 33. As Sterne repeatedly refers to other chapters in the book and has (or plans) chapters on holes, on sleep, on sash windows, and even a “chapter on chapters,” this decision only adds to the mayhem.

At a stroke, Project Gutenberg has undone this effort, honored Gresham’s Law, made one of the most egregious editions in the 240 years of the book into one of the most readily available, and added flaws of its own.

In sum, the Gutenberg Shandy will challenge all but the most determined general reader, who unfortunately is likely to blame the difficulties on Sterne. It’s not hard to guess how some difficulties arose. The editor couldn’t use a twentieth–century edition as that would pose copyright problems and, no doubt reluctant to damage a valuable book on a scanner, the editor wouldn’t use an eighteenth–century edition. Instead, a safe, nineteenth–century, out–of–copyright, four–volume version appears to have been chosen. Unfortunately, the “best” edition from these practical viewpoints was one of the worst from the point of view of Gutenberg’s general reader. The critic R.C. Bald noted that the majority of errors in Tristram Shandy “originate in some popular nineteenth–century editions.” Since then, a century of careful work by editors, publishers, bookshops and libraries has pushed many misbegotten, four–volume texts to the margin. Until now, only the intrepid or the unlucky general reader would stumble upon this text in a library [32]. At a stroke, Project Gutenberg has undone this effort, honored Gresham’s Law, made one of the most egregious editions in the 240 years of the book into one of the most readily available, and added flaws of its own. In a discussion of plagiarism, Sterne wrote in Tristram Shandy, “Shall we for ever make new books, as apothecaries make new mixtures, by pouring only out of one vessel into another?” [33] With the Project Gutenberg texts, we seem in danger of doing exactly that, with little thought to the quality of what’s in the first vessel or the challenges of transferring it to the second. The editor deserves sympathies for confronting problems of contemporary IP law, the cost of early editions, and the peculiarities of Sterne’s text. But IP law alone does not drive people to bad editions when there’s dozens to choose from.

Some but not all these problems may be limited to books as odd as Sterne’s. Elsewhere in Project Gutenberg poor quality texts, better be forgotten by all but scholars, are put before the general reader. Take Pan, a novel by the Nobel laureate Knut Hamsun [34]. Unlike the Shandy edition, this text comes with a note revealing the source:

Translated from the Norwegian of
Knut Hamsun
by W. W. Worster
With an Introduction by Edwin Bjˆrkman
[sic]
New York
Alfred A. Knopf
1927
Published July, 1921

A comparison with the translator’s note at the head of the current Penguin edition suggests problems with using this text:

While the translation history of Hamsun’s Pan (1894) is not as deplorable as that of Hunger (1890), the first version by W.W. Worster (Knopf, 1921), was bowdlerized, all the expressly erotic elements, however innocuous, having been deleted. [35]

So where the Project Gutenberg text has

And when she comes, my heart knows all, and no longer beats like a
heart, but rings as a bell. I lay my hand on her.
 
“Tie my shoe–string,” she says, with flushed cheeks. ...
 
The sun dips down into the sea and rises again, red and refreshed, as if it had been to drink. [36]

the Penguin edition has

And when she comes my heart understands all, and no longer beats, it peals. And she is naked under her dress from head to foot, and I lay my hand upon her.

Tie my shoelace, she says with flaming cheeks. And a little later she whispers directly against my mouth, against my lips, Oh, you’re not tying my shoelace, sweetheart, you’re not tying ... not tying my ...

But the sun dips its disk into the sea and then rises again, red, renewed, as if it had been down to drink. [37]

Certainly, copyright laws make developing a commons absurdly difficult. But it doesn’t help to answer one absurdity with another and to suck back from oblivion texts that, for almost every purpose but quite abstruse research, are better forgotten.

Most of the cuts in the Worster–Gutenberg editions remove passages far more innocuous than this. So, in an age that flatters itself with its sense of its sophistication and with a technology that is admired for its ability to “route around censorship,” we find cutting–edge peer production systems presenting the general reader, to the detriment of the author, the prudery and paternalism of the past. And readers are given no indication that they are being cheated. St. Clair (2005) argues that the imposition of new copyright laws in the nineteenth century forced back into circulation corrupt and bowdlerized texts. Something similar is happening here, but faith in the laws of quality makes us blind to it. Certainly, copyright laws make developing a commons absurdly difficult. But it doesn’t help to answer one absurdity with another and to suck back from oblivion texts that, for almost every purpose but quite abstruse research, are better forgotten. It is also unwise for critics like Benkler (2006), who are highly sensitive to copyright laws and their effects, to applaud Project Gutenberg unreservedly without more reflection on the limitations on quality presented by IP law and masked too easily by faith in the laws of quality [38].

I do not want the arguments above to suggest that Gracenote is worthless or Project Gutenberg useless. Far from it. Both are immensely useful. Nonetheless, both suffer from problems of quality that are not addressed by what I have called the laws of quality — the general faith that popular sites that are open to improvement iron out problems and continuously improve. In the case of Gracenote, it may be that only users with minority tastes suffer and they should be prepared to look after themselves. In the case of Project Gutenberg, by contrast, the Project does greatest disservice to those it most seeks to serve, the general reader who may not know enough about the the texts he or she is reading to be able to distinguish nonsense from complexity, editorial misjudgment from authorial teasing, bowdlerization from Nordic prudery. In both cases, whether to guide users better or to improve the system, these limitations need to be recognized.

 

++++++++++

Wikipedia

Wikipedia also relies on the laws of quality, assuming, vandalism aside, that numerous eyeballs and changes continuously raise quality. Nevertheless, it faces some problems quite similar to those of Project Gutenberg but here, because it is less a victim of restrictive copyright laws and more a beneficiary of “fair use,” these seem less excusable [39]. Consider, for instance, these extracts from the entry for Laurence Sterne, Tristram Shandy’s author.

Sterne lived in Sutton for twenty years, during which time he kept up an intimacy which had begun at Cambridge with John Hall Stevenson, a witty and accomplished bon vivant, owner of Skelton Hall in the Cleveland district of Yorkshire. Without Stevenson, Sterne may have been a more decorous parish priest, but might never have written Tristram Shandy.

Sterne, who used his wife very ill, was one day talking to Garrick in a fine sentimental manner, in praise of conjugal love and fidelity. “The husband,” said Sterne, “who behaves unkindly to his wife, deserves to have his house burnt over his head.” “If you think so,” said Garrick, “I hope your house is insured.”

Sentences, in which people keep up intimacies and talk in “a fine and sentimental manner,” have an odd ring for a twenty–first century encyclopedia. A search of the Web reveals that the first comes, with incidental changes, from the Encyclopedia Britannica of 1911; the second from The Mirror for Literature of 1828 [40]. Given the disdain some Wikipedians have for the current Britannica, it seems especially strange to be raiding its predecessors without acknowledgement. And for a project that puts great faith in the latest correction, it is bizarre to lift undistinguished text from the 1911 edition, let alone from an obscure periodical almost a century older, as if nothing had changed in the between [41].

As another example, take the page for Daniel Defoe as it existed in a fairly stable form in October of 2005 [42]:

Daniel Defoe (1660–April 24, 1731) was an English writer and journalist, who first gained fame for his novel Robinson Crusoe. Defoe is also notable for being arguably the earliest constant practitioner of the novel form.

Born Daniel Foe, the son of James Foe, a butcher in Stoke Newington, London. He later added the aristocratic sounding “De” to his name as a nom de plume. His gravestone gives his name as DANIEL DE–FOE. He became a famous pamphleteer, journalist and novelist at a time of the birth of the novel in the English language, and thus fairly ranks as one of its progenitors.

Most reliable sources hold that the date Defoe’s his birth was uncertain and may have fallen in 1659 or 1661. The day of his death is also uncertain. (This version of the Wikipedia article itself later records that Defoe died on April 21.) His father did not live in Stoke Newington at the time of Defoe’s birth, nor was he a butcher. Defoe probably didn’t add the “De” to his name as a nom de plume. (Not only did he come to use it in all sphere’s of life, not just writing, but at the time he changed his name most of his writing was unsigned and some was signed “D.F.”) His gravestone provides little evidence as it was erected 150 years after his death.

Several of these slips reveal the almost invisible tripwires that surround anyone trying to write a biographical entry. The remarks about his father’s profession, the nom de plume, and gravestones are reasonable guesses, but also unfortunate ones. Getting such points right is tricky. Most revealing of the trickery of encyclopedia entries is, perhaps, the tension between the first and last sentence over how and when Defoe became famous. The opening sentence claims it was only with Robinson Crusoe (published in 1719), while the last sentence suggests that it was pamphleteering, which he began in the previous century. The confusion turns on quite what claim is being made. An entry on the Defoe discussion page argues,

I strongly suspect that most English speakers would only recognize Defoe through RC; the Esperanto article mentions two or three translations of Robinson Crusoe, but no other works of Defoe, and a quasi–random sampling of the Library of Congress catalog turned up a number of translations of RC into French, German and Japanese, but no obvious translations of any of Defoe’s other works [43].

Much as Defoe’s gravestone provides evidence for what people called him in the nineteenth century, but not for what he called himself in the seventeenth, so Defoe’s fame today is not evidence for how he “first gained fame.” The point is small and reflects the minutiae that go into a good biographical encyclopedia entry. It also reveals the way that Wikipedia, despite its creed of continuous improvement, can defy Graham’s Law. In its early days, the Defoe entry reads

He is most famous for his novel Robinson Crusoe [July 3, 2002] [44].

The claim is perfectly reasonable. It was later changed to read

[Defoe] gained fame for his novel Robinson Crusoe [January 30, 2004] [45].

This comes closer to being misleading and slips quietly over the edge of reasonableness when it is changed to argue that Defoe

first gained fame for his novel Robinson Crusoe [September, 2004]

This version endured until the following October and was vigorously defended [46].

The example, though slight, highlights a few concerns about Wikipedia. First, like cataloging for Gracenote or editing for Project Gutenberg, writing encyclopedia entries is more difficult than it may first appear. Casual additions, particularly if they pay no attention to what is written elsewhere in the article, can make life quite confusing for those who consult an entry. Did Defoe die on the day given at the beginning or at the end of the 2004 entry? Was it pamphleteering or novel writing that made him famous? Is his posthumous fame based on the same works as his early fame? Small changes in one part of the article make unnoticed trouble for other parts. Second, as with Gracenote, the quieter reaches of Wikipedia sustain remarkable problems despite the aggregate number of eyeballs that the site draws. And third, neither Linus’s nor Graham’s Law holds unequivocally. In the example of Defoe’s fame, once again Gresham’s rather than Graham’s Law seems more apt. A simple entry went steadily downhill, the good being lost and the bad enduring, and, despite Linus’s Law, the bad entry stabilized despite the increasing attention (and so eyeballs) given the site from 2004.

The Defoe entry makes a further point about Wikipedia as an encyclopedia. The small facts of an entry like this are, all things being equal, not particularly difficult to correct. The balance, proportion, and trajectory of an article, particularly for a figure like Defoe, who may have produced 500 publications, are very difficult. Take again the claims about Defoe’s fame. Rightly, it said to rest on his pamphlets and novels, in particular Robinson Crusoe, even if the sequence is unclear. Anyone trying to understand these claims would expect to find evidence in the rest of the article. There is no discussion in the article of his pamphlets or journalism. The discussion of Robinson Crusoe can at best be described as sparse:

Defoe’s famous novel Robinson Crusoe (1719), tells of a man’s shipwreck on a desert island and his subsequent adventures. The author may have based his narrative on the true story of the shipwreck of the Scottish sailor Alexander Selkirk.

Instead, as it currently stands, readers would learn primarily about his role in the Anglo–Scottish Union of 1707. This, however, is referred to neither in the introduction nor the biographical section of the entry. In the spirit of Linus’s Law, people have worked on parts of the entry, but they have done so with little concern for the demands their contributions make on the article as a whole, a point I will return to in the next section [47].

But first a couple of general points about peer production, the nether reaches of large projects, and the peculiarities of encyclopedias. The entries for Defoe and Sterne do appear to fall into a backwater of Wikipedia. Thus it may seem unfair to choose these as examples to illustrate aspects of the whole [48]. I suggested earlier, however, that judging overall quality from the less– rather than the more–frequented parts, the weak rather than the strong links, is not a bad idea. After all, how is the ordinary user to know when he or she has landed in a backwater? [49] With Linus’s Law in mind, we should acknowledge that the eyeballs that consult encyclopedia entries are, in the default case, quite unlike those beta testing or developing code and quite unsuited to recognizing or characterizing any but the most obvious errors. To use an Open Source programs is in itself often an acknowledgment of a certain level of skill. To turn to the encyclopedia is, by contrast, more likely a confession of ignorance. If I want or need to find out about Defoe, then I’m not likely to be in a position to critique an entry on him. And even if I were, that would not indicate that I was capable of fixing it (or should be trusted to fix it, even if I thought I were). As Samuel Johnson once argued, you don’t have to be a carpenter to know a good chair.

Before extending the laws of quality from Open Source software to highly democratic projects such as Wikipedia, we need to acknowledge the difficulty Open Source has had developing software for ordinary users. Understandably, developers have found it easiest to develop for people like themselves, and ordinary users are quite unlike developers. The gap is significant and problematic. Encyclopedias have traditionally made firm separation between contributors and users, to the benefit, most would claim, of the latter. Wikipedia seeks to erase this distinction. It is a bold venture, but it may have the worrying result that, as Open Source’s inclination towards experts has tended to keep ordinary users at bay, so Wikipedia’s inclination towards ordinary users may keep experts at bay, though these are the exact individuals they need. If this is so, neither of the laws of quality will easily transfer from Open Source software to Wikipedia.

 

++++++++++

Modularity and granularity

As I noted above, it is problematic to try to make general assessments of quality based on individual mistakes within large projects, but it is also necessary. Defenders of both Wikipedia and conventional encyclopedias have a tendency to point to errors in the other while belittling criticism of small problems within their own projects [50]. Yet it seems no more adequate to defend Wikipedia by pointing to holes in the Britannica than it does to invoke implicitly the laws of quality. Wikipedia relies on a different process to get the job done and it is important to examine that process, particularly if we are interested in improving it.

One way to understand difficulties of all the three projects outlined above is to return to the underlying parallel between projects like Wikipedia and Open Source software. As Benkler (2002) points out, Open Source projects are modular — made up of quasi–autonomous pieces on which people can work more or less independently [51]. All being well, the compiler then gathers these together into a coherent program. But, as Benkler also notes, ideally these modules are granular, too. That is, these quasi–independent parts should be small. All three projects outlined here are modular to a degree. Gracenote is made up of tracks for individual albums; Project Gutenberg, of individual books; and, Wikipedia, of individual entries. Yet each has problems with granularity. As we saw with Gracenote, choosing the wrong level of granularity creates problems of interoperability. You can catalog each disc as a separate module. If the disc belongs to a set, however, there’s a good chance that granular decisions will prevent the set from running in sequence.

The laws of quality, which presume both modularity and granularity, will not come to the rescue.

For Project Gutenberg, at one level the entire book is the module. The Distributed Proofreading Project found a way to grind books into finer granules for proofreading. Editorial decisions, however, do not break down so easily, consequently decisions about how to classify Greek text on the first page or footnotes on the nth, have implications that ramify throughout the entire work. Thus for a certain kind of bug, the granule is as large as the text as a whole, yet the error–correction mechanism works with smaller chunks and there is no process in place to reconcile the difference.

Wikipedia seems particularly beset by issues of granularity. With many Open Source projects, contributors can work on small subroutines independently, then slot these together into larger entities. There is, inevitably a limit to the level of granularity at which this works for software, as there is for encyclopedia entries, but for software, compilers force a reconciliation of many difficulties. With Wikipedia, though granularity has been reduced below the level of the entry as a whole to individual sentences or phrases and even words and numbers, there is no process of reconciliation. With prose pieces, unless they are completely incoherent, inconsistencies don’t prevent contributions from either compiling or running. Hence small changes can easily run away with the coherence of the entry as a whole. With granularity set so low, individual contributors need have no overview of the piece, no awareness of where it begins and ends and how it gets from one to the other, and no sense of or obligation to the overall balance. A view of the 2004 Defoe entry — unbalanced, misleading, and self–contradictory — shows what can happen when something modular but not granular is treated as if it were both. The laws of quality, which presume both modularity and granularity, will not come to the rescue. Indeed, the more eyeballs Linus’s Law brings to microproblems, the more they may undermine the optimism of Graham’s Law.

 

++++++++++

Conclusion

Given the bulk of these projects (52 million tracks in the Gracenote database, one million entries on the English Wikipedia site, 17,000 books on Project Gutenberg), sampling for quality is, as I said at the opening, both difficult and tendentious. Clearly, mine is not a scientific survey [52]. Nor was my intention simply to find flaws. Rather, I have used these examples to try, however inadequately, to raise questions about the transferability of Open Source quality assurance to other domains. My underlying argument is that the social processes of Open Source software production may transfer to other fields of peer production, but, with regard to quality, software production remains a special case. As Weber (2004) has argued, Open Source software development itself is not the self–organizing system it is sometimes imagined to be. Not only is it controlled from below by the chip on which code must run, but projects are also organized from above by developers and maintainers whose control and authority is important to the quality of the outcome. Thus, for software, Linus’s Law and Graham’s Law exist with other, significant constraints that do not necessarily obtain elsewhere. If we are to rely on peer production in multiple different spheres of information production, as Benkler (2006) suggests and I hope, we need to look for other ways to assure quality.

Project Gutenberg and Wikipedia are tremendous achievements. That does not entitle them to a free pass. Both, because free, tend to get some of the condescending praise given a bake sale, where it’s deemed inappropriate to criticize the cakes that didn’t rise.

Having gone on so long, this is not the place for me to turn to the question of producing quality in any depth, but let me offer some suggestions in outline. First, protagonists of the sorts of peer production projects discussed here should reflect on the extent to which, explicitly or implicitly, they rely on the laws of quality. If they don’t, they should ask themselves what they do rely on. Second, projects should be mature enough now for participants to admit their limitations. Project Gutenberg and Wikipedia are tremendous achievements. That does not entitle them to a free pass. Both, because free, tend to get some of the condescending praise given a bake sale, where it’s deemed inappropriate to criticize the cakes that didn’t rise. Third, they should draw closer to their roots in Open Source software. Software projects do not generally let anyone contribute code at random. Many have an open process for bug submission, but most are wisely more cautious about code. Making a distinction between the two (diagnosis and cure) is important because it would suggest that defensive energies might be misplaced. Project Gutenberg has built defenses against violations of copyright. It needs to be sensitive to violations of good sense. Editing is a hard task and needs to attract people prepared to think through the salient issues. Wikipedia is very sensitive to malice. It needs to be as sensitive to ineptitude. Compiling correct and coherent encyclopedia entries is hard work. Allowing anyone to make changes to the text without discussion is unlikely to attract people willing to work hard on an entry. Thus, pace Linus’s Law, more eyeballs may ultimately lead to a downward, not upward, spiral. Finally, both projects would benefit by more competition. Were Wikipedia, for its part, to force Britannica out from behind its subscription wall, the shock would probably be as profound for each. One would lose its business model, the other would the complacency that comes more from the ease of its links than the uniform quality of its entries. Should Project Gutenberg persuade all the libraries with balkanized scanning projects to build a single, ASCII database from all their work (even were it under the aegis of Project Gutenberg), everyone would gain because within libraries lie many of the skills that Gutenberg lacks, but Project Gutenberg (like Wikipedia) has an ease of search and linking that is quite unfamiliar to most libraries. All users would gain from the competition because different ideas of what quality is, how it is produced, and how maintained would confront one another in the open. There would be blushes all round, but we could begin to get beyond easy platitudes and recrimination of both fans and foes to organize for quality. End of article

 

About the author

Paul Duguid is adjunct professor in the School of Information at the University of California, Berkeley; professorial research fellow at Queen Mary, University of London, where he was an ESRC–SSRC Visiting Fellow in the spring of 2005; and, a research fellow at the Center for Science, Technology, and Society at Santa Clara University. He is also an honorary fellow of the Institute for Entrepreneurship and Enterprise Development at Lancaster University School of Management.
Web: http://socrates.berkeley.edu/~duguid/
E–mail: duguid [at] sims [dot] berkeley [dot] edu

 

Acknowledgements

My thanks to John Seely Brown, José Afonso Furtado, Joe Hall, and Karen Christensen who patiently read and kindly commented on earlier and even less readable versions of this essay. To Ed Valauskas for patient editorial work. And to students in the 2005 “Quality of Information” class in the School of Information and the “Advanced Legal Research” class in Boalt Hall, U.C. Berkeley, as well as audiences, both sympathetic and unsympathetic, at various other venues where I have discussed these ideas.

 

Notes

1. Benkler, 2006, p. 5.

2. Benkler, 2006, p. 137.

3. An article on NewsForge (Willis, 2006) calls a similar problem in the world of software the “CVS cop–out.”

4. For example, Wiktionary is both small and weak in comparison to Wikipedia.

5. Comparisons of the Net to a library are legion and commonplace. For examples from opposing perspectives see Benkler (2006, p. xii) and Steiner (2001, p. 300). I say “opposing” because Benkler sees the Net as a positive force for cultural production, where Steiner seems to see it negatively.

6. Benkler, 2006, p. 71.

7. Benkler becomes more assertive with time. Later in the book he asserts that “proprietary online encyclopedias are not better than Wikipedia along any observable dimension” (p. 174), though there’s still a bit of a hedge in that “observable.”

8. http://en.wikipedia.org/wiki/Wikipedia:Our_Replies_to_Our_Critics (visited 17 April 2006). Because Wikipedia is constantly changing, unless I note otherwise, all Wikipedia links will be from this date. While it is also problematic to attribute institutional pronouncements to peer production, as pages like this one discusses “our” critics, it seems reasonable to follow this lead.

9. Benkler, 2006, p. 2.

10. Even here there are often unnoticed limits; see, for example, Bezroukov (1999).

11. The last two are cited by, among others, Benkler (2002, 2006); the first, while not a canonical case of peer production fits most of the requirements.

12. See, for example, Lessig (2001).

13. For a recent example of an alternative where an Open Source structure works well, see Jones and Mitnick (2006).

14. Most projects have quite elaborate committal procedures that further select for competence.

15. Moody (2001) notes that Torvalds prepared himself for a life with Linux by studying the ins and outs of the 386 chip as a teenager.

16. Jørgensen (2001) suggests that in the development phase of Open Source projects, problems do not dissolve so easily. When they do not, projects abandon self–organization for a little central planning.

17. http://en.wikipedia.org/wiki/Wikipedia:Our_Replies_to_Our_Critics. The extreme sensitivity of Wikipedians about vandalism and the conservative attitudes shown in consequence to perturbation in the network and the desire to “fix” exemplary articles indicates some caution about Darwinian assumptions and militates implicitly against this law, though explicitly the project seems to rely on it.

18. http://en.wikipedia.org/wiki/Project_Gutenberg.

19. The two laws also bring to mind Smith’s notion of the “invisible hand.” I’m grateful to J.A. Furtardo for this observation.

20. For similar problems, see for instance the Gracenote entries for EMI’s 1978 live recording of the English National Opera’s performance of Wagner’s Twilight of the Gods, where an errant comma condemns the first disc to be played last; the CBS 1978 “Masterworks” edition of Puccini’s La Bohème; or RCA’s 1992 “Red Seal” version of Verdi’s Requiem.

21. The Gracenote catalogue has been changed since the screenshots in Figure 1 were made in March, 2006. The uniform version now available for all three discs gives the conductor (Krips) and Don Giovanni (Siepi) twice, under both “Artist” and “Album”. It fails, however, to give the name of the composer (Mozart).

22. While library catalogues can sometimes seem perversely proud of the number of finely separated categories that make up a catalogue record, these are crude in comparison to the capacity of the database the Borders brothers designed, which Daniel Raff (2000) reveals, allowed up to 10,000 descriptors per book. By either standard, Gracenote is elementary though nonetheless more demanding than casual inspection might suggest or self–organization control.

23. In an intriguing exchange on the “Crooked Timber” blog (http://crookedtimber.org/2006/04/24/wikipedian-utterances-of-the-gawping-soul/), one contributor suggests that Wikipedia may be weakest where controversy is strongest, another replies that it is likely to be weakest were controversy is weakest. The latter is probably true, but how is the innocent user to know?

24. See http://www.gutenberg.org/about/; http://www.pgdp.net/c/default.php.

25. See http://www.gutenberg.org/about/history.php. It is clearly odd that Project Gutenberg claims it “write[s]” books. The claim may reflect unwillingness to acknowledge that volunteers edit texts, excising front matter and other copy judged irrelevant to the reader’s experience and, where necessary, making intricate textual decisions, as I explain in what follows.

26. This perverse outcome was described by Robert Merton as the “Matthew Effect,” whereby new resources benefit primarily those who already have most advantages. I’m grateful to J.A. Furtardo for this observation.

27. Kenner, 1962, p. 48.

28. Tristram Shandy is EBook #1079 on Project Gutenberg at http://www.gutenberg.org/etext/1079. It was first posted on 25 October 1997, last updated on 23 October 2003. There is no indication of the changes made in between. Saying “it is not things themselves, but people’s opinions about things that upset people,” the Greek on the title page of print editions is perhaps well suited to this article (which recapitulates an earlier discussion of Project Gutenberg [Duguid, 2004]).

29. When Thackeray’s (1964/1853) History of Henry Esmond presents the double dilemma of a footnote that is all Greek text (Book II, Chapter 3), the Project Gutenberg edition (http://www.gutenberg.org/etext/2511) solves the problem by omitting the footnote completely. Newman (2006) points out that footnotes are very much a feature of print culture and make problems for recordings as well as ASCII texts.

30.

I wish, Dr. Slop,’ quoth my uncle Toby repeating
his wish for Dr. Slop a second time, and with a degree of
more zeal and earnestness in his manner of wishing, than he
had wished at first (Vide.))

Here the “(Vide.)” is the editor’s attempt to indicate a footnote. Unfortunately, it is an incomplete one. Paper editions follow it with a page number. As ASCII streams have no pagination, “Vide” is left blind.

31.

Had my mother, Madam, been a Papist, that
consequence did not follow. (The Romish Rituals direct the
baptizing of the child, in cases of danger, before it is
born; — but upon this proviso, That some part or other of
the child’s body be seen by the baptizer: — But the Doctors
of the Sorbonne, by a deliberation held amongst them, April
10, 1733, — have enlarged the powers of the midwives, by
determining, That though no part of the child’s body should
appear, — that baptism shall, nevertheless, be administered
to it by injection, — par le moyen d’une petite canulle,—
Anglice a squirt. — ’Tis very strange that St. Thomas
Aquinas, who had so good a mechanical head, both for tying
and untying the knots of school–divinity,— should, after so
much pains bestowed upon this,— give up the point at last,
as a second La chose impossible,— ‘Infantes in maternis
uteris existentes (quoth St. Thomas!) baptizari possunt
nullo modo.’ — O Thomas! Thomas! If the reader has the
curiosity to see the question upon baptism by injection, as
presented to the Doctors of the Sorbonne, with their
consultation thereupon, it is as follows.)

Here all but the first sentence is a footnote in Sterne’s text.

32. In discussing the Gutenberg Shandy with librarians, I have occasionally been told that such an edition should not be “blamed” on Project Gutenberg as general readers might stumble upon it on library shelves, so it is worth noting that I have not managed to identify this edition in my searches of libraries or of catalogues worldwide.

33. This is another of Sterne’s jokes as he plagiarized the line from Thomas Browne.

34. http://www.gutenberg.org/etext/7214. This has its own minor typographic foibles. “Æsop,” for example, becomes “_ sop” presumably because a scanner could not deal with the ligature in the original and no one noticed.

35. Hamsun, 1998, p. xxvii.

36. All the ellipses in these quotations are in the original.

37. Hamsun, 1998, p. 20.

38. J. Bradford DeLong championed Project Gutenberg in Wired (DeLong, 2003) before transferring his enthusiasm to Google Print in his blog (http://delong.typepad.com/sdj/2005/04/wheres_my_acces.html). His confidence may be misplaced. If you search Google Books for Pan or for Tristram Shandy you may find near the top editions of each published by Kessinger Press which claim to “publish and digitally preserve rare books.” Its editions are quite expensive. General readers might think they would get something more for their money. In fact, the Kessinger editions are merely a dump of the Project Gutenberg texts with all their flaws and thus only add to the velocity with which better forgotten and wisely buried texts are disinterred and circulate.

39. I prefer to avoid recent Wikipedia scandals such as the Seigenthaler obituary, self–promoting U.S. Congressional entries, Nature’s comparison, or the discontent with Wales’s decisions about in– and exclusion, though all inevitably reflect to a degree on quality.

40. See http://www.gutenberg.org/files/11362/11362.txt. In judging the adequacy of the content, let alone the style, of this example, we should note that Hall–Stevenson’s name is conventionally hyphenated, his home was more usually called Skelton Castle, and today the “Cleveland district” of Yorkshire would be described as the Redcar and Cleveland district of North Yorkshire. While overplaying Hall–Stevenson’s contribution to Tristram Shandy, the Wikipedia entry fails to mention Eugenius, the character Sterne modeled on Hall–Stevenson.

41. Reflecting its reliance on the 1911 Britannica, only one of the eleven books cited in the bibliography of the Sterne entry in Wikipedia was published after 1912. Such borrowings violate a principle of Open Source policy in software: borrowings should carry with it the identification of its author or origin. Linking, a critical feature of Wikipedia is also a source of the trust it engenders (Berners–Lee, 2005). Here it is passed over in silence, while the reader is left to assume that censorious judgments of a bygone age have been made by current Wikipedia users and are still valid.

42. I went to the Daniel Defoe entry as part of an exercise for a class I was teaching at U.C. Berkeley in fall 2005. As I made changes to that page, I shall refer to the state of the Defoe page at that time (http://en.wikipedia.org/w/index.php?title=Daniel_Defoe&oldid=25308696) unless I note otherwise.

43. http://en.wikipedia.org/wiki/Talk:Daniel_Defoe.

44. http://en.wikipedia.org/w/index.php?title=Daniel_Defoe&oldid=246810.

45. http://en.wikipedia.org/w/index.php?title=Daniel_Defoe&oldid=2281681.

46. I tried to correct this and various other statements. These changes met with resistance: “1660” was reinstated, as was the evidence of the nineteenth–century gravestone; the “first gained fame” was defended, and the comment that Defoe was a spy was rejected, although it was discussed later in the article. The history page and the discussion page document some of this.

47. While I do consider the balance of particular entries, I have set aside for this article the balance of encyclopedias as a whole. A little thought reveals encyclopedias, like dictionaries and the modern library, to be odd products of both the enlightenment and print culture. Print encyclopedias are, for example, both necessarily and deliberately bounded. Consequently, both inclusion and exclusion, as well as relative length, are informative. Recent fights involving Jimmy Wales over who should or should not be included reflect problems of inclusion; see http://yro.slashdot.org/article.pl?sid=06/04/16/1656208 and Orlikowski (2006). Meanwhile, users are entitled to wonder what significance should be read into the relative lengths of the articles on Seinfeld (c. 11,000 words), Shakespeare (c. 5,000 words), Barbie (c. 4,500 words), and Defoe (c. 2,300 words)?) An open–ended project, like Wikipedia, which too easily fails to achieve balance within articles, is quite unable to achieve such balance between articles and the encyclopedia as a whole. Similarly, while Project Gutenberg is often compared to a library, its open–ended character makes it quite unlike most libraries, which are usually careful selections — rather than mere collections — of books.

48. The sorts of problems found in the Defoe article are not limited to backwaters. Wikipedia has, for example, some “featured” articles that would seem to draw all benefits possible from the laws of quality. As the site itself says of them, such an article “exemplifies our very best work ... is well written, comprehensive, factually accurate, neutral, and stable.” The article on James Joyce is one such. As of May 2006, the opening paragraphs contain the following:

In 1891, James wrote a poem, “Et Tu Healy,” on the death of Charles Stewart Parnell. His father had it printed and even sent a copy to the Vatican Library. In November of that same year, John Joyce was entered in Stubbs Gazette (an official register of bankruptcies)) and suspended from work. ...

James Joyce was initially educated at Clongowes Wood College, a boarding school in County Kildare, which he entered in 1888 but had to leave in 1892 when his father could no longer pay the fees. ....

He enrolled at the recently established University College Dublin in 1898. He studied modern languages, specifically English, French, and Italian He also became active in theatrical and literary circles in the city. His review of Ibsen’s New Drama was published in 1900 and resulted in a letter of thanks from the Norwegian ...

After graduating from UCD in 1903, Joyce left for Paris; ostensibly to study medicine, but in reality he squandered money his family could ill afford. He returned to Ireland after a few months, when his mother was diagnosed with cancer.

I chose this article only because I had a copy of Ellmann’s (1959) James Joyce on my shelves. According to Ellmann, the poem is usually written “Et tu, Healy.” John Joyce reported that he sent it to the Pope — a much more reasonable destination for a proud father’s letter. John was gazetted in November of 1892. James, however, left Clongowes in 1891, withdrawn because of illness, though his father’s poverty did prevent him returning. The celebrated review was not of Ibsen’s New Drama, but called “Ibsen’s New Drama.” (It was of Ibsen’s play, When We Dead Awaken.) Joyce graduated from University College, Dublin, and left for Paris in 1902. He also returned in 1902 and not because his mother was ill. See Ellmann (1959), pp. 33n5, 34, 76, 109, 113, 120. On the highly populated areas of Wikipedia, see also http://www.roughtype.com/archives/2005/10/the_amorality_o.php.

49. The demand that ordinary users scrutinize and compare change logs seems unreasonable.

50. See, for example, McHenry (2004) or Giles (2004) and the responses these attacks on Wikipedia and Britannica elicited.

51. The question of the ideal granularity and the autonomy of modules in Open Source development is complex. Since its inception, Linux has given rise to periodic, fierce debate about the strengths and limits of microkernels and their internal modularity. Discussion has recently flared again, see Tannenbaum, et al. (2006).

52. Nor was it malicious. Of the examples shown here, only Tristram Shandy was the result of a deliberate hunch. Anyone who knew the book would know it would be difficult to transform into ASCII. The rest turned up serendipitously.

 

References

Yochai Benkler, 2006. The wealth of networks: How social production transforms markets and freedom. New Haven, Conn.: Yale University Press.

Yochai Benkler, 2002. “Coase’s Penguin, or Linux and the Nature of the Firm,” Yale Law Review, volume 112, and at http://www.benkler.org/CoasesPenguin.html.

Tim Berners–Lee, 2005. “Berners–Lee on the Read/Write Web,” BBC News (9 August), at http://news.bbc.co.uk/2/hi/technology/4132752.stm.

Nikolai Bezroukov, 1999. “Open Source Software Development as a Special Type of Academic Research (Critique of Vulgar Raymondism),” First Monday volume 4, number 10 (October), at http://www.firstmonday.org/issues/issue4_10/bezroukov/. http://dx.doi.org/10.5210/fm.v4i10.696

Geoffrey C. Bowker and Susan Leigh Star, 1999. Sorting Things Out: Classification and Its Consequences. Cambridge, Mass.: MIT Press.

Duguid, Paul. 2004. “PG Tips,” Times Literary Supplement (11 June), p. 13.

J. Bradford DeLong, 2003. “Any Text. Anytime. Anywhere. (Any Volunteers?) The Mechanics of a Universal Library are Simple. The Tricky Part: Harnessing the Free Labor,” Wired, volume 11, number 2 (February), at http://www.wired.com/wired/archive/11.02/view.html?pg=5.

Richard Ellmann, 1959. James Joyce. New York: Oxford University Press.

Jim Giles, 2005. “Internet Encyclopaedias Go Head to Head,” Nature, volume 438 (15 December), pp. 900–901, and at http://www.nature.com/nature/journal/v438/n7070/full/438900a.html.

Paul Graham, 2005. “What Business Can Learn from Open Source,” Essay derived from a talk given at OSCON (O’Reilly Open Source Convention) 2005 (1–5 August, Portland, Ore.), at http://www.paulgraham.com/opensource.html.

Knut Hamsun, 1998. Pan: From the Papers of Lieutenant Thomas Glahn. Translated with introduction by Sverre Lyngstad. New York: Penguin Books.

Calvert Jones and Sarai Mitnick, 2006. “Open Source Disaster Recovery: Case Studies of Networked Collaboration,” First Monday, volume 11, number 5 (May), at http://www.firstmonday.org/issues/issue11_5/jones/. http://dx.doi.org/10.5210/fm.v11i5.1325

Niels Jørgensen, 2001. “Putting It All in the Trunk: Incremental Software Development in the FreeBSD Open Source Project,” Information Systems Journal, volume 11, number 4, pp. 321–336. http://dx.doi.org/10.1046/j.1365-2575.2001.00113.x

Hugh Kenner, 1962. Flaubert, Joyce, and Beckett: The Stoic Comedians. Boston: Beacon Press.

Lawrence Lessig, 2001. The Future of Ideas: The Fate of the Commons in a Connected World. New York: Random House.

Peter Lyman and Hal R. Varian, 2003. “How Much Information,” at http://www.sims.berkeley.edu/how-much-info.

Robert McHenry, 2004. “The Faith–Based Encyclopedia,” TCS Daily (15 November), at http://www.tcsdaily.com/article.aspx?id=111504A.

Glynn Moody, 2001. Rebel Code: The Inside Story of Linux and the Open Source Revolution. New York: Perseus.

Andrew Adam Newman, 2006. “How Should a Book Sound? And What about Footnotes?” New York Times (20 January), Section E, column 3, p. 33.

Andrew Orlikowski, 2006. “A Thirst for Knowledge,” Guardian (13 April), at http://technology.guardian.co.uk/weekly/story/0,,1752257,00.html.

Daniel Raff, 2000. “Superstores and the Evolution of Firm Capabilities in American Bookselling,” Strategic Management Journal, volume 21, pp. 1043–1059, and at http://www-management.wharton.upenn.edu/raff/documents/SMJ_Superstores_article.pdf. http://dx.doi.org/10.1002/1097-0266(200010/11)21:10/11<1043::AID-SMJ137>3.0.CO;2-7

Eric S. Raymond, 1998. “The Cathedral and the Bazaar,” First Monday, volume 3, number 3 (March), at http://www.firstmonday.org/issues/issue3_3/raymond/.

Peter Roberts, 1999. “Scholarly Publishing Peer Review and the Internet,” First Monday, volume 4, number 4 (April), at http://www.firstmonday.org/issues/issue4_4/proberts/. http://dx.doi.org/10.5210/fm.v4i4.661

William St. Clair, 2004. The Reading Nation in the Romantic Period. Cambridge: Cambridge University Press.

Laurence Sterne, 1967. The Life and Opinions of Tristram Shandy, Gentleman. Edited by Graham Petrie, with an introduction by Christopher Ricks. Harmondsworth, Middlesex: Penguin Books.

William Makepeace Thackeray, 1964. The History of Henry Esmond, Esq.: A Colonel in the Service of Her Majesty Queen Anne Written by Himself.. With an afterword by Walter Allen. New York: New American Library.

George Steiner, 2001. Grammars of Creation: Originating in the Gifford Lectures for 1990. New Haven, Conn.: Yale University Press.

Andrew S. Tannenbaum, Jorrit N. Herder, and Herbert Bos. 2006. “Can We Make Operating Systems Reliable and Secure?” Computer, volume 39, number 5 (May), pp. 44–51, and at http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1005&path=computer/homepage/0506&file=cover1.xml&xsl=article.xsl&. http://dx.doi.org/10.1109/MC.2006.156

Jimmy Wales, 2005. “Interview with Brian Lamb, host, Q&A,” C–SPAN (25 September), transcript at http://qanda.org/Transcript/?ProgramID=1042.

Steven Weber, 2004. The Success of Open Source. Cambridge, Mass.: Harvard University Press.

Nathan Willis, 2006. “The CVS cop–out and the stranded user,” Newsforge (20 May), at http://programming.newsforge.com/programming/06/05/09/2011212.shtml?tid=25.


Editorial history

Paper received 21 May 2006; revised 11 September 2006; accepted 20 September 2006.


Contents Index

Copyright ©2006, First Monday.

Copyright ©2006, Paul Duguid.

Limits of self–organization: Peer production and “laws of quality” by Paul Duguid
First Monday, volume 11, number 10 (October 2006),
URL: http://firstmonday.org/issues/issue11_10/duguid/index.html





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2014.