The LexisNexis data collection service has introduced CopyGuard, a program aimed at exposing plagiarists and spotting copyright infringement. According to John Barrie, chief executive of iParadigms, developer of the program with LexisNexis, the program can generate a report that calculates the percentage of material suspected of not being original, all within seconds.
“It should be that these institutions want to deter their problem before it happens,” Mr. Barrie said. He said that CopyGuard would have caught plagiarism by Jayson Blair, a former reporter for The New York Times, and disputed passages in works by historians Stephen Ambrose and Doris Kearns Goodwin. (The NY Times, where this story appeared, cooperated in the development of the software but is not currently a customer, according to a Times spokesman, Toby Usnik.)
False positives
In the future, won’t such programs be biased against people who write on the same topics they did in past lives?
Seriously though, I wonder what the accuracy is of these kind of things, even given enough of a base to sample (no small feat with the current state of copyright laws, although I suppose publishers would want to collaborate on such a thing) and the computing power to do the best calculations with it.
There was a script out couple of years ago claiming it could predict the author’s gender (The Gender Genie) by using simple statistical analysis (for example, apparently men use “I” more and women use “you” more, no word on the royal “we”). That sounds trivially easy to trick. And even if you’re not trying to beat the machine, there’s got to be a point at which the computer simply doesn’t know, and could therefore bias itself against those that follow particular styles of other authors.
Should those writers be blacklisted? It’s like the (currently hypothetical IIRC) health insurance that won’t cover people with particular genes, because they have a higher possibility of getting certain diseases. What’s next, automatic grade generation based on reading ease scores? Crazy days.