Google Books Mutilates the Printed Past

Article in The Chronicle of Higher Education

In its frenzy to digitize the holdings of its partner collections, in this case those of the Stanford University Libraries, Google Books has pursued a "good enough" scanning strategy. The books' pages were hurriedly reproduced: No apparent quality control was employed, either during or after scanning. The result is that 29 percent of the pages in Volume 1 and 38 percent of the pages in Volume 2 are either skewed, blurred, swooshed, folded back, misplaced, or just plain missing. A few images even contain the fingers of the human page-turner.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I know!

Why couldn't they just wait until the book dissolves into nothing? Wouldn't THAT be a honey of a moral victory!?

Too much h8n on Google.

Too much h8n on Google indeed.

Of all the librarian arguments I hear regularly "It's not perfect!" is the one that bugs me the most (Yes, the author isn't a librarian, he's a medievalist, close enough). A "good enough" scanning strategy is GOOD ENOUGH for damn near everyone most of the time. I don't stop driving my car because I'll get in an accident or it'll break down, I know it's not perfect, it's good enough to get me where I'm going most of the time.

Good enough

If you could read the text with a little extra effort I would say that is good enough. Entire pages and sections missing I don't think is good enough.

Is some scanned material better than no scanned material? Sure. But if someone says that you can get rid of a book in your collection because it is available on Google Books you better check over that scanned version.

If you really read the story carefully...

You get to the point where the author identifies himself as (part of) a publisher and talks about "turning copyright on its head"--which has zero to do with the quality of Google's scans (frequently crappy as it is) and much to do with special pleading.

The strong case: Libraries who get rid of print books because "they're on Google" are not only self-destructive, they're socially destructive. Politicians and administrators who urge them to do so are misinformed.

Those are cases that need to be made. And the second para. in Bibliofuture's comment is right on the money. The article, though, deliberately confuses a range of issues (gee, not like that's ever happened before where Google Book Search is involved!).

Amens

I am tentatively giving amens to Walt & bibliofuture on this. The quality and structure of Google Books can leave me wanting at times. For example, I cannot use Google Books to read the Encyclopedia of the Stone Campbell Movement. This was a great work by Doug Foster at Abilene that came out a couple years ago. Except for very limited previews, I have to have the hard copy to learn anything. Leroy Garrett's text "A Lover's Quarrel" is found and has a title page showing that they scanned from Indiana University. When I added that text to the collection of Freed-Hardeman University while interning, I got to handle the monograph that remains today effectively inaccessible online.

Saying that something is good enough poses a problem. What audience is "good enough" sufficient to? For the casual reader, that may be okay if they're just looking for intellectual curiosities online. Beyond that, though, I am not sure Google Books is presenting a product that meets a need on my end.
________________________
Stephen Michael Kellat, Interim Coordinator, LISNews Netcast Network
PGP KeyID: 899C131F

The quality and structure of Google Books can leave me wanting..

Of course it can, but like I said, your car will leave you wanting too, are you going to start walking everywhere?

"What audience is "good enough" sufficient to?"
Almost everyone, most of the time, and that's more than good enough for me. It's far more powerful than just finding intellectual curiosities online.

What a Waste

Of time, energy & money... What is wrong with these people that they just don't care?

>^..^<

As an additional finding tool, GBS has its merits

The name of the service says what it's best for: Google Book Search--a way to find books you wouldn't have found otherwise. Sloppy scanning & OCR make the results less than they might otherwise be, but still worthwhile...as a way to find books, that you then proceed to find in the real world.

It's when things go beyond search that we get into trouble. All kinds of trouble.

Merits

Exactly. And that's a pretty damn huge merit. The glass is half full, and it would be 0% full if this was ever left up to librarians to do it. We'd still be in meetings talking about file formats and naming conventions.

0% full

Numerous libraries have there own scanning projects. If left up to libraries the glass would not be 0% full. It would not be as full as having Google also doing scanning but there would be something.

Here is an article about one scanning project the LOC did of 25,000 items.

Not only does the article show that libraries are doing something to fill the glass it has this paragraph that relates to this discussion: "To preserve book knowledge and book culture means preserving every word of every sentence in the right sequence of pages in the right edition, within the appropriate historical, scholarly and bibliographical context. You must respect what you scan and treat it as an organic whole, not just raw bits of slapdash data."

Mediocrity is never "good enough."

If you bring a poorly-scanned, or in many cases, incompletely-scanned book to Google's attention, their response is always condescending and obfuscatory. They don't care. They claim to care, they claim they're performing a service for the good of humanity, but the slapdash manner in which they've handled the scanning process, and the response they give when informed of problems, indicates that their only motive is profit.

And no, it's not "good enough." In fact, if you think "good enough" is good enough--regarding books, music, film, art, or indeed anything in regards to the preservation of human intellectual output--then you're a dreadful librarian, and you're part of why our culture is sliding into the cesspit of mediocrity.

"good enough" IS good enough

Because perfection is NOT an option. EVERYTHING is "good enough" and dismissing anything because of details that don't matter to most people is just nitpicking. I may be a dreadful librarian, but it has nothing to do with my views on what's "good enough."

And Google being condescending and obfuscatory has nothing to do with the usefulness of the book thing. Of course they're condescending and obfuscatory, that's how they are.

Missing pages

I would say that missing pages is not a "details that don't matter to most people". There was an online book that I was looking at and the very last page was missing. Only one missing page but it was the last one. I don't think it is nitpicking to want to read the last page of a book.

Blake you used cars as an example previously as something we use even though they are not perfect and can break down. When was the last time your car broke down. I bet if it broke down and left you by the side of the road or stuck at the store 4 or 5 times a year the car would not be "good enough".

ZOMG

Perfection is not an option, but are excellence and conscientiousness too much to ask for?!

I've just found two books

I've just found two books that were scanned sloppily, and thought that it was just because of "boring" subject matter, building model locomotives. The first book was horrible, with the half-turned pages everyone else is describing here.

The second book was just frustrating, 279 pages of wonderfully in-depth theory, and building notes about the NINE plates of plans in the back of the book. Plates which the scanner operator took a nice picture of, folded up and tucked neatly in the pocket at the back of the book!
They scanned the card full of check-out dates, and all of the blank pages inside the front and back cover, but skipped the most important part, rendering the book nearly useless.

Yuck.

Syndicate content