35 Thousand recognised books… and counting

Beyond V3s – What’s Next?

We are already part way through creating V4s of books in the Ultrapedia Library. Perhaps we would feel like we were making a bit more progress than we actually are if we had started at the letter ‘Z’ instead of ‘A’ like we did. The January 2008 edition of the Ultrapedia Library has nearly 1400 titles listed under the ‘A’ category.

Before I say more about the specifics of V4s I would first like to thank Bruji for introducing me to the concept of the Silent Update. I do this because I expect the introduction of the forthcoming Ultrapedia V4s to be silent in true Bruji tradition. There is something almost preternatural about using Bookpedia. It feels as if Conor and Nora pick one or two feature requests every week from the forums, and then implement these new features to see if anyone notices.

It is my vain hope that updating the Ultrapedia Library to V4 will be just as silent.

Creating V4s is a Manual Task and simply involves the extraction of each books Index and Table of Contents. You might well ask why it took us so long to realize that indexing an index was not such a hot idea… If I ever find out the answer I’ll be sure to let you know…

In the meantime however, the new and improved V4s will be introduced first on Ultrapedia Search.

Not Only… but Also…

Hand in hand with creating V4s, we have also created V5s of the first 1000 or so books in the Ultrapedia Library. We started making V5s when we started to see a growing number of ‘Mutilated Pages’ or ‘Gross Errors’ creeping into the V3 collections. I made a quick reference to this problem in my blog entry Turning the Tables.

We actually create the ‘V5s’ from ‘V1s’. To create a V5 we delete everything except ‘Plate’ type images from each book, so a typical V5 page will consist of two discrete parts – the plate image itself, and the textual ‘Legend’ or description of the plate– the bit of text under the picture.

We have decided to use the fantastic JALBUM to display the V5’s in a ‘Gallery Format’, but these galleries have not been opened to the public yet. You can keep track of our progress in creating V5’s by joining the ‘V5’ project in the forums.

Is that a V8 in your Pocket ?

I think now would be a good time to recap on the V-numbers we have used so far – so here goes.

V0 – The book is not suitable for OCR
V1 – The book is a good candidate for OCR
V2 – Only used in-house
V3 – The book has been OCR’d and published on the website
V4 – Presently used in-house only – see above
V5 – Presently used in-house only – see above
V6 – There is no V6 yet
V7 – There is no V7 yet
V8 – Presently used in-house only for ‘Proof-Reading’

We have only ever produced a few dozen V8’s, none of which are currently available online. We created the V8s initially in response to the looks of bewilderment on the faces of friends, family and colleagues who we subjected to many a long-winded demonstration of Ultrapedia and it’s precursors.

The V8 is a classic case of a picture being worth a thousand words. A V8 is made by ’collating’ the V1 and V3 versions of a book into a single PDF that displays the V1 page alongside the V3 page. In fact it’s such an effective demonstration of the benefits of OCR that I now feel compelled to create a few more V8’s to help prove my point.

Displaying the unrecognized (V1) and recognized (V3) pages side by side on a suitably large screen like the Apple 30 inch cinema displays we use in-house actually generated more than a few ‘WOW’s and at least once a jaw motion was changed from a yawn to a drop.

Self ingratiating comments aside, even now I think I would have been better off just including a couple of V8 screen-shots. I will track down a few V8s and post them later on.

Advertisements

Comments on: "Beyond V3s – What’s Next?" (2)

  1. […] Library. Book titles beginning with letter ‘A’ is now complete. See the eariler posting Beyond V3s – What’s Next? for more […]

  2. […] As promised in my December posting the first roll-out of V4s are now available for full text search and retrieval, via our Google Mini search interface. In this first batch we’ve added over 7400 V4s to the library. […]

Comments are closed.

%d bloggers like this: