August 07, 2012

Mining User Data: E-Books & E-Journals

I've been meaning to blog about the recent Wall Street Journal article "Your E-Book Is Reading You" and now there's a companion post to write: "Mendeley Injects Some Pace into Academia with Fast, Big Data" (reporting by GigaOM).

Both talk about mining user data generated from use of a product. Alexandra Alter reported in the June 29, 2012 print edition of the Wall Street Journal (online July 19, 2012) that e-book vendors (specifically Nook and Kindle) have data "revealing not only how many people buy particular books, but how intensely they read them." The data "focuses on groups of readers, not individuals," and leads Amazon to identify popular passages of books (by looking at the most underlined sentences in books downloaded to their Kindle device). This is moderately interesting: the most underlined is a passage from the Hunger Games trilogy, followed by the first sentence of Pride & Prejudice (see for yourself on Project Gutenburg).

E-book vendors are starting to share data with publishers, "to help them create books that better hold people's attention." (according to Alter's interview with Jim Hilt, Barnes & Noble's vice president of e-books). ACK! Writers may start to use metrics to determine the outcome of their novels, or to shape their nonfiction. As a fiction reader, I would much rather that my authors construct the entire novel from their imagination instead of relying on a reader, or worse, the lowest common denominator of readers, to help guide the novel's conclusion. That's why I read fiction: because I want to inhabit the writer's world. Not the writer's world heavily influenced by my fellow readers' opinions.

Further, as a librarian, I'm very wary of the assertion that the data "focuses on groups of readers, not individuals." That may be true today, but will it be ever thus? Can I opt out of having an e-book reader report back what I am reading? Apparently not. I still read my fiction the old-fashioned way, so no one knows what I read. In fact, since most of my fiction is borrowed from the library, the only one who tracks what I read is me (via Goodreads). Most libraries actively do not keep data on what books patrons read, because we believe so strongly in a reader's right to privacy.  Alter quotes security expert Bruce Schneier, who "worries that readers may steer clear of digital books on sensitive subjects such as health, sexuality and security—including his own works—out of fear that their reading is being tracked."

I'm definitely not a fan of e-book vendors tracking my reading habits on a Nook, Kindle, or any other device.

And yet, I cheer at the prospect of "reference manager and PDF organizer" Mendeley offering me data on journals faculty are reading or not reading. TheNextWeb reports that "Users can gain insight into how academic research is consumed, discussed and annotated with social metrics in granular detail" through Mendeley Institutional Edition ("powered by Swets").  Dutch library subscriptions agent Swets says this would offer "real-time visibility into the usage of your library content," but it is not clear how this data would be shared, or at what level.  For instance, would we see only a list of the most and least popular journals? The most and least popular journal articles? Would we see this by discipline? By university? By university and discipline? The more granular the data goes, of course, the greater the chance for veering into user privacy issues noted above.

  • Then again, if I as a librarian who pays a lot of money for academic journals could see which articles or which journals are most and least popular with journalism faculty, or neuro-marketing researchers, I could make better financial decisions about journal subscriptions.
  • Then again, if I ceased to purchase journals because they were not popular, I might enhance a journal's demise by not making it available ... which veers towards the idea that the way e-books are consumed might influence the way fiction is written.
  • Then again, this seems to offer a viable alternative to the slow-moving and proprietary journal assessment tool offered by ISI's Journal Impact Factor.

I'm definitely conflicted on Mendeley's International Edition, but I look forward to hearing more. I'm not conflicted about e-book vendors keeping statistics on what I read, so I'll continue to use the library for my fiction fix.

For More Information


CogSci Librarian said...
This comment has been removed by the author.
CogSci Librarian said...

Here are more details from Victor Henning, co-founder of Mendeley:

Leading universities adopt Mendeley data to accelerate research analytics by 3 years | Mendeley Blog

Anonymous said...

I think the difference is that research is really collaborative and a conversation in a way that fiction is not. Fiction does not spring totally originally into the world - it does engage other fictions and it interacts with the reader's own personal bookshelf (and life experiences, etc.) but crowdsourcing a novel is a bit like asking other scholars to vote on what they want the results of an experiment to be. That's just not how it works, but the benefits of sharing some metrics about what articles (rather than what journals) are being read has real benefits for the scholarly conversation. Just as most of my recommendations for what fiction to read next come from readers with whom I interact (but not with publishers selling books as product).