Gathering knowledge: Esoteric e-book formatting thought problems apropos of something

Last week’s announcement that the IDPF (International Digital Publishing Forum) has opened its ePub maintenance process is tremendously important to the future of books and publishing, regardless of whether you believe books, the artifact made with ink and paper, or publishing, the process of assembling, producing and distributing books for a profit, have bright futures or are destined for the trash heap. Everyone concerned about books and e-books should be paying close attention to the evolution of ePub, because it represents the current best effort at an open standard for the display of text and other information across a variety of e-reader devices.

I’ve spent the past few days studying the existing ePub components to prepare some suggestions for the IDPF. ePub is made up of three components, the Open Publication Structure 2.0, Open Packaging Format 2.0, and Open Container Format 1.0, and is deeply related to related metadata and publishing standards initiatives such as the Dublin Core Metadata Element Set 1.1 and DAISY (Digital Accessible Information System) Consortium standards. The result is a series of postings to follow which will offer thought problems that explore the nature of thought, reading, authoring, references, citation and conversation.

Making books useful and accessible to all, including the visual and hearing disabled, is a complex technical undertaking. The ePub and related standards efforts are predicated on the existence of texts which must be delivered to readers, which is precisely the problem one would address if distribution were still the key challenge. Unfortunately, distribution is the easy part of publishing today. In the networked world, ideas arrive in bits and pieces instead of whole units between the covers of a book or in an article from the newspaper. Words are quoted or paraphrased and the enterprising reader can explore the sources to discover what credit to give the fragments of knowledge they find assembled by writers, bloggers, news aggregators and in short messages. Therefore, citable information and the ability to assess ideas in relation to events and previous expressed ideas—in short, whether a newly published adds to or merely repeats previously expressed ideas—are the new hallmarks of value.

In the print era, when moving books, magazines and newspapers around in a timely fashion created value, the reader couldn’t participate, unless they wrote a letter that survived the editor’s inbox and passed through an editorial gauntlet that resulted in the reader’s ideas being inserted into the publishing process at a point controlled by the publisher, the famous “gatekeeper” in scarcity-based markets. The new configuration of the publishing market can’t treat the reader as an end point (i.e., in the Open Packaging Format, the “reader” is “A person who reads a publication”), but as another participant in the production process, even if they don’t choose to add any value. There is no barrier to adding to the conversation, the barriers today relate to orienting ideas to one another within the network.

My initial assessment, after spending a lot of time thinking about how to improve the ePub format, is that it is ill-fitted to the nature of information in a network. We get ideas in dribs and drabs, often with tons of noise because a multiplicity of sources has commented on an event or issue. So, we do not look for documents about the topics we encounter, but begin to build a kind of private intellectual dossier on the idea/information by exploring discussions.

For example, when was the last time you ran across a quote you liked, something that intrigued you enough to find out who wrote it and to what it was related? Let’s say someone referred to the “incompatibility of aquacity with the erratic originality of genius” and you thought, “Oh, I like that.” You wrote it down or, as I did on my Kindle, highlighted and saved it from within James Joyce’s Ulysses. I got it from the source in this case, but say you heard it in conversation and searched for it in Google. The result would be 160 hits, shown here, partially obscured by the fact that “acquacity” is not a recognized word, except that many words used by Joyce weren’t words before he got round to them.

The first thought problem, then, is to examine how we acquire information. Does it arrive in documents or in shards, like the fractured light in a kaleidoscope, out of which we begin to make sense through our judgments and assemble what we think of it, as part of a process? Do we get ideas with a syntax that defines their use, as an XML-based document arrives on our computer when surfing the semantic Web or do we assemble a syntax into which the idea fits? If we default to the received syntax, or commonly held beliefs about the idea, do we really understand the idea?

Consider that many people thinking “begging a question” means asking one question that leads to another question instead of the correct meaning, that one makes a statement that demands the issue in question be conceded without argument, or arguing from “a proposition which which requires proof without proof.” Most people abuse the phrase “begging the question” but believe they understand it.

Back to my quote from Ulysses: Taken alone, this quote could be misconstrued in many ways, as a statement about the relationship of water to genius, for example. It’s actually the reason Leopold Bloom decides, in his own literary way, to rationalize that he can’t tell Stephen Daedalus to take a bath. I don’t have the referential structure in place when I receive the quote out of its original context to judge it and, if I got the quote via poorly written article about Michael Phelps, the world champion swimmer, which mangled its meaning, I’d be deeply misinformed about Joyce and Phelps and water. Yet, that is how most information comes to us in conversation, when every point of the compass provides access to numerous “published” ideas. For example, if I read quickly the Google results for my search on the quote in question, the third link is to the quote “Posted by Nietzsche’s Wife @ 9:29 AM.” Extrapolating from the Dublin Core Metadata rules, this statement could be parsed into a XML element that appeared to attribute the quote from James Joyce to the wife of Friedrich Neitzsche (Source), who was never married (so, the source doesn’t exist) and died 22 years before Ulysses was published, yet somehow managed to publish the quote in June of 2008 (date), 60+ years after Joyce died.

No, we don’t accumulate knowledge by the book. We may read a book, but we digest it in bits and pieces. We usually discover those bits mashed into something new and not necessarily correctly applied, so we rely increasingly on authority to judge the value or meaning of information. In networked non-hierarchical conversation, we attribute authority to friends and proven sources, though they may be wrong. A publisher—who could be a single blogger or HarperCollins—has the opportunity to create documents that are not closed and finished, but open and extensible by recognizing that the initial idea acquired by the reader isn’t the final product. It means, however, acknowledging that authors are in competition with idiots and empty-headed wits, many of whom begin as competitors of the publisher though they are in fact potential customers of the publisher.

That leads to what we expect of information in the era of networked reading (which is not begging the question): A form of citation and connections to sources, but one that does not fill in all the possible gaps without allowing us to review and authorize related information in the process of creating our own understanding—we could end up being wrong. One might argue that simply by providing a link to a book we provide that opportunity to “trust but verify”, and that can be good marketing, but so can providing links to many potential sources and allowing the reader to fill in their own blanks. This demands that, for all intellectual intents and purposes, we not fill in the voids in our knowledge with prefabricated information but actually acknowledge greater voids in individual knowledge that define what each of us don’t know, so that readers can continue to measure their progress toward understanding.

A newly acquired bit of knowledge, such as a quote or the title of a book someone suggests we read, is not a fully formed document that can be defined by the publisher for the reader. It is, instead, a newly identified lack of knowledge which we begin to fill with answers, context and so forth, well before it can be defined in the limited, though powerful, schema of an XML document. Making these decisions may require opening and comparing many documents, not serial reading of documents, because many sources may prove to be of no worth to our evolving understanding. Into the gaps go our judgments about friends’ thoughts on the data/issue, the sources we’ve decided to trust and documents relating to the idea in question, as well as, potentially, excerpts from a book or many books.

Building from the datum up, by facilitating readers’ gathering the wisdom of the crowd and the authority of experts, among other sources of information that may presented within the gaps in their knowledge, facilitates a more natural and critically valuable literacy, not to mention many opportunities for marketing of knowledge, including samples of books, articles and other sources (how about a lecture by an acknowledged expert on Ulysses?) that could drive revenue. It simply requires beginning from the needs of the reader-as-participant (peer in the network), not trying to deliver finished pabulum to a consumer of content (endpoint in the network).

