Do Open-Source Books Work?

Ben Crowell has written an excellent article on Open Source Books.

How will the internet change book publishing? This article examines a new crop of math and science textbooks that are available for free over the internet, and discusses what they have to tell us about whether the open-source software model can be translated into book publishing.

Ben Crowell has written an excellent article on Open Source Books.

How will the internet change book publishing? This article examines a new crop of math and science textbooks that are available for free over the internet, and discusses what they have to tell us about whether the open-source software model can be translated into book publishing.

Do Open-Source Books Work?

by Ben Crowell      
http://www.lightandmatter.com/article/article.html

 

How will the internet change book publishing? This article examines
a new crop of math and science textbooks that are available for free over the internet,
and discusses what they have to tell us about whether the open-source software
model can be translated into book publishing.

 

This article is copyright 2000 by Benjamin Crowell,
and is open-content licensed under the OPL license, http://opencontent.org.

This article was
discussed on the Slashdot forum
26 Sep 2000.

Ben Franklin[1]
figured out that information wants to be free, so in 1731 he
invented the lending library.
It was no Napster: this eighteenth-century
information superhighway was meant for such serious purposes as
education and fomenting revolution. Franklin wrote,
\”These libraries have improved the general
conversation of the Americans, made the common tradesmen and farmers as intelligent
as most gentlemen from other countries, and perhaps have contributed
in some degree to the stand so generally made throughout the colonies in
defense of their privileges.\” Words mattered. In the golden age of
ink and wood pulp, Uncle Tom\’s Cabin and Zola\’s J\’Accuse letter[2] were data
that packed a punch.

We take the information revolution seriously, but how serious are we about
serious information? Can we really free our minds if we power on, dial up,
and log in?
You wouldn\’t think so based on any changes in the U.S. education system.
A young relative of mine brought home
his grade-school science textbook, and one of its main modules
was a detailed discussion of dinosaurs, yet it never mentioned evolution.
Bad textbooks are the rule, not the exception. A recent critical survey of
American history textbooks[3]
is dedicated \”to all American history teachers who
teach against their textbooks,\” but the author might have well included the
rest of the curriculum. Poor textbooks were probably already inspiring complaints
back when they were scratched on clay tablets with a pointed stick, but I\’ll
argue below that books are actually getting worse, and that both the problem
and the possible solution have to do with technology and economics.

The Problem is Economics

Many e-businesses have found out that technology can make you broke as easily
as it can make you rich; in publishing, it seems that technology has driven
the profit out of textbooks. Color printing has been getting cheaper, and
full color, though still fantastically expensive to set up for production,
is now considered mandatory for high school and introductory college textbooks.
At the same time, desktop publishing software and the increasing digitization of
printing have made it possible to prepare new editions more rapidly. The confluence
of these technologies has created a vicious circle. Rising production costs drive
up bookstore prices, which makes more students buy used books, which reduces sales.
To kill off the used book market, publishers bring out a new edition every few
years, with just enough changes to make it impractical to use it side by side in the
same classroom with the previous edition. To compensate for the added cost of
tooling up for so many new editions, publishers raise their prices,
which starts
the whole cycle over again. After decades of merely keeping pace with inflation, textbook
prices have recently headed through the roof.[5]

In this climate of vanishingly thin margins, the most successful textbook
is little more than a loss leader, and one with more modest sales is a disaster.
Every book has to be a home run. K-12 biology books often don\’t mention evolution for fear
of losing sales in socially conservative school districts. History books avoid
controversy by propagating
the myth that John Brown was insane, or by failing to mention that the Vietnam
war began as a war of independence in a French colony.[3]
The home-run syndrome\’s
most consistent effect is to inflate the list of topics, so that no book will
be rejected by anyone for leaving out a specific item. In my field, physics,
it is commonly observed that each edition is worse than the previous one,
as the pressure for more topics squeezes out the room for honest explanations,
resulting in a cookbook of formulas.

Free Books, But No Open-Source Books

If bad books result from higher prices, free books would seem to be the solution.
Textbooks, besides their intrinsic importance as gateways to industrial-strength
information,
are a good test bed for evaluating innovations in how books are
written and distributed. The authors of math and science textbooks in particular
are unlikely to be intimidated by technology, but their goals and methods are
more representative of the practical approach of authors in general than
in the case of computer manuals and computer science textbooks,
[4]
whose authors may be willing to put up with a great deal of pain to be on the
bleeding edge of information technology. When I set out to write my own free physics textbook, I found that it was
quite hard to get any information on how a free book could be done in practice,
and this article is the result of my
attempt at a (completely unscientific)
survey of how free textbooks have actually been done.
Quite a few free math and science textbooks are on the web now,
[6],[7],[8],[9],[10]
but interestingly,
none of them seem to have followed the successful, highly publicized, and legally
solid open-source software approach.[11]
In fact, the most highly publicized digital textbooks are based on a model that
is to open source as antimatter is to matter: a dental school[12] has required its
students to buy all their books on a single DVD, which
expires and stops working if the students don\’t pay a hefty annual fee!

Does the neglect of the open-source book concept outside the computer arena mean that
there is something intrinsically wrong with the idea of an open-source book? Or does
the rest of the world just not \”get it\” yet? As we\’ll see, the reality is more
complicated than either extreme point of view.

Among the free books I\’ve studied, the one that comes closest to the collaborative
spirit of the open source movement
is the Biophysics Textbook On-Line (BTOL),[8]
in which each chapter has been written by a different
author. The most important reason why the open-source software movement emphasizes
collaboration-building is that the projects they tackle are often simply too big
for the lone-wolf approach. Likewise, the BTOL was written because it
had become apparent that the field was getting so large that the previously standard
text was never going to be updated. When I wrote one of the authors, Lou DeFelice,
to ask how the BTOL folks had been so successful in their community-building, he repled,
\”The BTOL is tied to a Society that already has an established community,
regular meetings, newsletters, etc. We tap into all of this structure.
For example, when a new article is posted we announce it in the Biophysical
Society Newsletter. I would think that other fields might benefit from
endorsement by an established society that already serves the field.\”

The most surprising result of my survey, however, was that there were no books
that were really open source in the sense in which the term is used in the
open-source movement.
The BTOL is collaborative but closed-source.
Some authors have made their source code available,
[7],[10]
but none of the source-available
books are collaborations, and they do not
have licensing agreements of the type developed to make sure free
software stays free.

Do We Need Open Source?

Maybe that sounds like a criticism, but I don\’t intend it that way. My own
book, although free, isn\’t even source-available, much less open-source. (This is
mainly because of certain technical and economic issues discussed below.)
But the open-source software model is designed to solve some real problems.
For example, open-software licenses and culture are designed to prevent the
problems that can arise when different people\’s software has to be put together
in one package, e.g. to make sure that Linux can\’t be stopped dead in its tracks
because some critical part of it turns out to be patented.
The BTOL, on the other hand, might be difficult to publish as a single, bound
book, because the individual authors own the copyrights on their own chapters,
and there is no licensing agreement. An important insight of the inventors of
open source was that copyrighted information with a carefully designed licensing
agreement (a \”copyleft\”) is in some sense more free than either
copyrighted information or uncopyrighted information.[11].

Do authors even want other people to be able to modify what they wrote?
Although software and books are not perfectly analogous, I feel that this particular
concern about applying open-source methods to books is based on a misunderstanding
of what open source is. While open-source software licenses do guarantee anyone the right
to modify the program, they do not guarantee that those modifications will become
standard or widespread. I could, for example, fiddle around with the delicate inner
workings of my own copy of the Linux kernel, most likely breaking it due to my deficient
programming skills. But I simply would not be allowed to tinker with the version everyone
else depends on until I had proven my transcendent programming talents to a very critical
cadre of the world\’s most fanatical software geeks. Nobody was ever able to force
Linus Torvalds to take his Linux project in a direction he didn\’t want, because he
owned the copyrights to its vital parts. The open-source approach allows the
project\’s originator to exert whatever degree of control she/he deems appropriate.
If I want to limit other people\’s contributions to my book severely, so that they can
only report errata and provide supplements and add-ons, I can do that (although an
approach that strict would probably not inspire very many people to participate).
When it comes to sharing the pen, \”if\” and \”how much\” are up to the author, but
a more interesting question is \”how?\” What legal and cultural framework will work?
Are open-source software methods directly applicable?
The BTOL collaboration, for instance, has an original take on this. Writes Victor
Bloomfield,
\”It is important, of course, to maintain the integrity of each author\’s chapter (closed source). However, the volume editor can choose to include more than one treatment of the same material (semi-open source).\”

It\’s also not hard to imagine creative projects that would be
impossible with a closed-source model.
In my field, for example, the phenomenon of textbook bloat is particularly out of
control when it comes to the number of homework problems at the end of each chapter.
One of the main things that deterred me from shopping my book around to the traditional-
style publishers was knowing that I would be expected to crank out roughly a thousand
additional homework problems in addition to the few hundred I\’d already written.
Writing homework problems is an activity that can be done in parallel by many people,
and a stockpile of problems on the web would be a valuable resource for every teacher
in the field. In fact, quite a few physics teachers already have their own individual
collections on the web. A more general collection would also fit well with the collaborative
approach used in open-source software, since there is no need to maintain a consistent
authorial voice, and the bug-finding philosophy of the open-source software movement
is applicable: homework problems can have bugs, people can usually agree on what
constitutes a bug, bugs are hard to find, and bug-finding can be done in parallel by
many people. (Incidentally, when publishers kill off the used book market by bringing
out gratuitous new editions, one of their standard techniques for creating
incompatibility between editions is to fiddle with the homework problems. Having
a public collection on the web might help to eliminate this particular dysfunctional
behavior.)

Another possible application of the open-source paradigm to textbooks would be the
creation of sets of notes on applications. In physics, for example, ideas about torque
and angular momentum can be applied to martial arts and gymnastics, but I simply don\’t
have the expertise to write anything interesting on these topics. The availability of
such a set of resources online would help to reduce textbook bloat, and would also allow
students to read about applications that truly interest them. Likewise, scientists
who lament the sparseness of applications in math textbooks could be invited to
contribute applications themselves.

Do Technical Problems Prevent Open-Source Books?

Unfortunately going open-source isn\’t as simple as just adopting an open-source license.
As I toyed with the idea of open-sourcing my own book, and
then began to study how other people were doing things, it became clear that
there were some serious technical hurdles. Imagine that Linus Torvalds was
trying to get the Linux collaboration off the ground, but none of the prospective
partners used the same computer language. This is pretty much the situation with
desktop publishing software. Quark Express and PageMaker are the most popular packages
for laying out books, at least among professionals, but they are very expensive and
not fully interoperable. Quite a few physicists and mathematicians know LaTeX, but
it\’s far from being a universal standard, and it does
not allow the kind of control that is necessary for a book with
a complex layout and lots of illustrations. (To be fair, many LaTeX users would consider
this a feature, not a bug, since it results from the philosophy of separating
form from content.) The true lingua francas are word-processor
formats. Victor Bloomfield of the BTOL project writes, \”Authors typically send
word-processing (most commonly Word, but others as well) and graphics files.
It is indeed a hassle…\” The sheer amount of work involved in getting a book
ready for open-sourcing has also deterred authors like Jim Hefferon and me.

A more subtle problem is that except for LaTeX, none of these formats lend themselves
to communal editing. The open-source software community uses a program called CVS
(Concurrent Version System) to allow people within a trusted community to change
and edit the files from a large software project, and to resolve conflicts that
occur when two people are simultaneously working on the same file. CVS can be used
for any kind of plain-text, human-editable files, not just computer programs, but
it can\’t be used with files from any of the
popular word processors or desktop publishing programs, since they\’re all in binary
formats.

No Paper, No Problem?

Nearly all the books I surveyed are distributed purely digitally.
Author Warren Siegel[7] says,
\”…I\’m trying to discourage printing as much as possible…
I see a lot of printing/publishing as more habit than convenience,
with dead trees rotting in people\’s offices rather than in the forests.\”
A few authors (e.g. Jim Hefferon[10]) distribute bound, printed books to their own students
and encourage instructors at other schools who use the book to do the same, but
this may have the effect of discouraging adoption of the book, since professors may not
want the hassle.

Students do want printed, bound books, and are willing to pay for them.
I now have my own self-publishing business, but
I originally distributed my book to students through
print-to-order sales at Kinko\’s. Although Kinko\’s was expensive,
roughly 90% of my students bought the books
from Kinko\’s rather than downloading and printing them, which, after all, results
in single-sided, unbound output.
(I explained to them that I didn\’t get any royalties from Kinko\’s, so there was no
personal motivation to buy the books rather than downloading them.)
I have never had a student
forgo dead-tree format completely and read the entire book from a computer monitor.

For my own book[6]
I\’m now using free digital distribution side by side with
commercial distribution of printed copies by wholesale.
The issue here is that
printing has high startup costs, and running a business is, frankly, a lot less fun
than teaching and writing. The big investment required to self-publish a book is
also in conflict with openness; giving up the monopoly on selling printed copies
would make it even more scary to try to make back my money.

Big booksellers such as Amazon.com and the bricks-and-mortar chains offer various options
that let authors avoid the hassles and risks of setting up their own
cottage industries,
but their systems are
not particularly attractive in my opinion. Amazon, for instance, offers a service
in which they handle the retail ordering side of things while the author simply
sends them wholesale shipments as needed. The problem is money. Amazon says they
pay a \”royalty\” of 45%, which sounds generous, but is misleading. The author is
responsible for production, so the 45% \”royalty\” is really an 82% retail markup,
expressed as a percentage of the author\’s net. Considering how expensive short-run
printing is, it\’s hard to imagine bringing a textbook to market at a reasonable
price via this service. Other services handle both production and marketing,
but are not able to do illustrated books.

What Next?

The solution to the difficulties of paper distribution is probably to limp along
with the variety of approaches we\’ve already been using, and wait for printing
technology to solve the problem. The increasing digitization of the printing process
and the emergence of efficient print-to-order systems is gradually making short-run
print distribution cheaper and easier.

I don\’t see any general solution on the horizon to the technical problems involved
in true open-source books. However, some of the interesting projects that require
an open source approach might be doable with HTML format, which can be used with
CVS. Although HTML is not printer-friendly enough to be suitable for a complete
book, it might be fine for some of the more limited, modular applications such
as homework sets and application notes.

I would like to form an e-mail discussion list for free book authors, which would
allow people to share their information about the technical issues and could also
serve as a platform for community building. If you\’re interested in participating in such a
list, please e-mail me at [email protected]. If this leads to the creation
of a viable community, I\’d be willing to open-source the HTML version of the
first book in my textbook series, Newtonian Physics, as an experiment,
using all the classic
machinery of the open-source approach (copyleft licensing, CVS, etc.).

References

[1] The Autobiography of Benjamin Franklin.

[2] See this online article about Zola and the Dreyfus affair.

[3] James W. Loewen, Lies My Teacher Told Me, The New Press, New York.

[4] Two good examples are
Open Docs Publishing
and Bruce Eckel.
[5] For more about the recent steep increases in textbook prices,
see this article.

[6] Ben Crowell, Light and Matter. >>Link.

[7] Warren Siegel, Fields. >>Download link.
>>Informational link.

[8] Biophysics Textbook On-Line. >>Link

[9] Frank Firk, Essential Physics. >>Link

[10] Jim Hefferon, Linear Algebra. >>Link

[11] The basic idea of open-source software is that the program is
copyrighted by its authors, but it comes with a licensing agreement that preserves
everyone\’s right to obtain the source code and modify it if they wish. (\”Source code\”
refers to the instructions as entered by the programmer, as opposed to the binary
form in which proprietary software is supplied, which is unintelligible to humans and
virtually impossible to modify.) The press has sometimes not done a good job of
distinguishing the open-source movement (I made it, now you can have it for free) from
piracy (you made it, now I\’ll copy it whether you like it or not), although some
open-source idealists are in sympathy with piracy, believing that all forms of
information should be free.
The classic exposition of the open-source software
philosophy is Eric Raymond\’s essay,
The
Cathedral and the Bazaar
. The Free
Software Foundation
argues for maximum freedom of information on moral grounds.
Two well-known licenses for applying the open-source concept to other forms of
expression besides computer code are the OPL
and the FDL.

[12] For more information about antibooks in dental schools, see this
Slashdot article
.

History of Revisions

2000 Sep 27 Added a reference to the FDL license in footnote 11.