Inside Google Book Search‘s Diego Puppin, a software engineer at Google, writes about a new addition to the book summary for some of the books at Google Books: “As with the other features on the Book Overview page, the word cloud is meant to offer a new way to explore our catalog. If you are trying to learn about Italian art, a search in our index will find many good books on the Renaissance period. Use the cloud of common terms to tell what each book is about.”
Word frequency is an interesting way to present the content of a particular work and could help people searching for specific information determine whether a book’s worth their time. Because many of the most useful nuggets of information are found in isolation, an odd connection or quote offered by a writer, the word count could also mislead readers into thinking that a book must be largely about a topic in order to be useful. Summaries cut both ways.
As a tool for understanding differences between books and their times, however, word frequency is extremely interesting, as Puppin notes. Mario Alinei did groundbreaking work on analyzing the transformation of Italian over decades in the 1960s. There are also interesting tools for surfing word frequency generally, across all language or within any text. For example, WriteWords, a Britsh site, is one of many lets visitors paste text into its site and returns a word frequency analysis of the text. A long-time favorite of mine is WordCount, a nicely executed navigable presentation of the instance of words in English based on an analysis of the British National Corpus, a 100-million word collection of writing and spoken examples of English.