E-Book Enlightenment

genCollectionInterface (gCI)

enlightenment.png

Introduction:

genCollectionInterface (gCI) is a set of templates and HTML generation tools written in Python which produce a web browser-based interface to a book collection. These tools were created during the summer and fall of 2009 as part of the Rural Design Collective's Summer Mentoring Program [1]. The goal of the project was to enable and/or enhance access to the Children's Book Collection of the Internet Archive [2] on the OLPC/XO laptop platform. The chosen solution was based on consideration of the state of the OLPC platform and XO hardware, as well as usability and "fun factor" for the end users (presumably children aged 5-15).

Interface design:

The collection interface is designed to be very accessible to children at low reading levels while also providing a feature-rich capability for readers. Books are organized and presented in topical "categories" so that it is easy to find material on a topic of interest. Any number of categories are supported by the tools, and the initial page of the interface displays these (see Fig.1):

FIG.1 - The Top-Level Page
categories.png

NOTE: the initial category list page is NOT generated by the tools, it is hand-edited to reflect what the tool generates. However the tool DOES output a category list that references the generated pages and which is a good starting point for creating the category page. The icons displayed with each category are part of the template design and must be developed for each new category added. The RDC design team spent considerable effort crafting the icons to be intuitive as well as to emulate the "look and feel" of the XO/Sugar OS interface so that it would be familiar and friendly to the users. The sidebar widget is also hand coded to reflect the tool output.

Browsing a category, the titles are presented as icons derived from a scan of the book cover or title page (see Fig.2).

FIG.2 - A Category Page
no_frame.png

This is in support of the way children are attracted to books by the colorful covers and illustrations. The titles are displayed below the icons, and the author, date, description and other "meta data" are displayed as a "tool tip" when hovering over the icon. Clicking the icon will allow the book to be read, by various means depending on whether the laptop is being used in an Internet-connected or stand-alone environment (see "Interface Adaptability" below). While browsing the category, another category can be selected using a navigation widget which is available on the upper right of the page.

Interface adaptability:

The interface is adaptable depending on whether the laptop accessing the collection is connected to the Internet. In the case of a connected laptop, the user may read a book by either a) using an embedded book reader which displays the books by downloading one jpg image of a page at a time (see Fig.3), or b) downloading a DJVU copy of the book in it's entirety, which is then stored on the system for reading at any later time including when the laptop is not connected. With option b) the Read activity displays the book.

FIG.3 - The Embeddable Bookreader
book_frame.png

When an Internet connection is not available, an alternative interface is provided which allows the books to be stored on an attached device such as a memory drive. In this case the interface allows the books to be downloaded off the attached device and stored on the system as DJVU files, which are read with the Read Activity. Additional tools are supplied to download the books and covers files for the collection, after which they are copied to the attached storage device. See section 5.1 for details on how to create the attached storage solution.

NOTE: The DJVU file support on the XO platform and Read activity is not fully tested, but problems were observed with versions of Sugar less than .84. The attached storage solution was lightly tested using Sugar .84 running in a virtual machine in Sun VirtualBox and worked correctly.

The adaptable interface ensures that the collection is fully accessible on any platform with a web browser. Furthermore, if a DJVU reader such as Evince is installed, the books can be read offline on any platform as well. Finally, although the current version requires some minor code modification to support it, any book format may be easily supported for download by changing filenames and links either at generation time or afterward.

The Public Domain Icon Set:

The icons for our collection were designed specifically for the Sugar user interface and dedicated to the public domain. The goal of the design was to develop a pictograph for each category that could be understood without the aid of text much like the icons in the Sugar OS. This eliminates the need for translation of the user interface, and ideally provides a universal visual language. The first set of icons are in beta and will be refined based on user feedback, or users can submit revisions as a patch. You can download the icons here [link]:

Javascript Libraries:

The user interface utilizes a suite of JavaScript libraries, each with their own documentation effort:

  • prototype.js - Core JavaScript framework.
  • prototip.js - Prototip2 tooltip generator (requires Prototype).
  • scriptaculous.js - Library of visual and Ajax effects (requires Prototype).
  • effects.js - An effects module in the script.aculo.us Library.
  • side-bar.js - Sliding tab for category sidebar (requires script.aculo.us).
  • reset.js - Resets icons when category page is loaded.

gCI Design and Architecture:

gCI is designed as a single Python program which uses a category decription file, a CSV file obtained from a search on an Internet Archive collection, and an ID mapping file, along with several templates, to generate an HTML and JavaScript interface file for each category. These may be accessed from a server over the Internet or locally depending on connectivity and/or attached storage availablity. The templates and data files are described below.

Template Design

The template consists of several HTML fragment files and included JavaScript libraries. There are three "permanent" parts to the template: headtmpl.html.tmpl, divtmpl.html.tmpl, and foottmpl.html.tmpl. These files must be present in the directory where the tool is run.

Header Template File

The header template file consists of the HTML code for the final output, including the collapsible panel containing the book icons. The necessary JavaScript libraries and CSS files are all specified in this section. When the interface is installed on a server the libraries and CSS files must all be in the correct location. If the interface is installed on an attached storage device, the libraries are installed on the same relative path structure. When the interface is generated, the $header keyword in the template is replaced by the proper category name.

Div Template File

The divtmpl.html.tmpl file consists of a single div which will contain the icon, the link to the book or bookreader, the title, and the JavaScript code to invoke the tooltip when the icon is hovered over. For each book in the category the keywords in the template are replaced with the appropriate data which is taken from the input CSV file. These are as follows:

  • $idstr: the book id, used to complete a link to the book
  • $iaclid: the IACL id, used to attach the tooltip, DOM id for the link
  • $coverstr: the cover file name, to display image of cover
  • $title: the book title, display the book title

Footer Template File

The footer template contains the code for the sidebar, and terminating tags for the document.

CSV Meta data search result file

The input CSV file supplies the metadata used for text replacement of keywords in the div template to create links to the books and the cover files, and also to sort the search results into categories for the interface. In addition, tooltips describing the books are built using the CSV information and JavaScript.

The CSV file may be produced by a search using Advanced Search supplied by Internet Archive. The search results used for the IACL collection are included with the generator, but any other search results may be used. When producing the CSV, the following fields should be included IN THE ORDER SPECIFIED (unused fields should be present but may be blank in the CSV). This order may seem strange until you realize this CSV file came from a search on the Internet Archive using the advanced search utility - to generate a different collection interface you could get a different CSV file using a different search; we searched for "collection=iacl". When you set up a search, you specify which fields to include in the output. Make sure you get at least the following fields, or there will be problems when the interface is generated:

  1. Id (field 6): this field gets assigned as the div id and is used to attach the tooltip, must be unique.
  2. Title (field 8): this field will be used as the title, displayed in full in the tool tip, truncated to eight words in the category display
  3. Author (field 2): this field is displayed only in the tooltip, when setting up IA search specify "creator"
  4. Description (field 4): This field is displayed as "details" in the tooltip only
  5. Subjects (field 7): this field is displayed in the tooltip. It is also searched for a match to an entry in the categories.txt input file during interface generation, and if it matches, this book will be entered in that category's output file.

Our search used more fields, and they came out in the order below, but you could easily play with the result in a spreadsheet program and remove unneeded fields or add them in.

Categories file

The category description file contains the categories to be generated as HTML files for the collection interface. It is up to the user to determine the categories list and create the categories.txt file. You must have a categories file even if it is empty. The format is one category per line. We created ours by selecting categories from the IACL that had a fair number of books, or that seemed most appealing to our target audience. We further refined these categories throughout the course of the project, and added categories when additional books were contributed to our program (these books are represented by gray icons in our user interface, and will be added in the future). Anything not matched from categories.txt will go into a catchall file "other.html", so there will always be at least one category. For every category, an icon must be created to display in the sidebar widget (if a suitable one is not found in our public domain collection), and another to display on the opening page list of categories.

ID Mapping file

The ID mapping file contains a mapping from the Internet Archive identifier to the Open Library Identifier. This is used in accessing the covers for the books display, which are accessed via the Open Library covers interface. A link to the cover is created by mapping the IA identifier contained in the search result CSV file to the OLID, then accessing the cover using the Open Library covers API. The IACL collection mapping was supplied to the project through prior involvement with another open source project [3]. To determine how to create such a mapping, refer to the Open Library API documentation. The mapping file is formatted as follows: <OLID field> <IAID field>, one entry per line. This file should be named iaclBookList.txt and is included in the run directory.

Installation

Install the tools by downloading and unzipping the distribution archive. It's recommended you do this in a dedicated directory to avoid overwriting files.

Usage

As packaged, the tool will generate the collection interface for web-based access. To generate an attached storage-based collection interface, see section 5.1 below. Each time the tool is run, delete *.html and *.js in the run directory, because the tool appends to the output files if they exist. The usage is:

Python genCollectionInterface.py <CSVfile> <categoryfile>

To ensure the tool is working, you can make a run using our data as follows:

 genCollectionInterface.py search.CSV categories.txt 

You will see numerous messages as the files are parsed and output is generated, when the process is completed, there will be a directory full of HTML and JavaScript files. The ones used for the interface are called <Category>Category.html, for example "Adventure and AdventurersCategory.html", or for JavaScript files, <Category>ToolTip.js, for example "Adventure and AdventurersToolTip.js" There's also a "catchall" category: "Other", where anything not in the categories.txt will go. Lastly, there's an "index.html" which is simply a list of links to the other files and may be used as a starting point for building a "category list" page, as that is not autogenerated.

Generating the attached storage interface

To generate the attached storage solution so that the collection may be accessed without a web connection, run the tool with the "-attached-storage" option. This will generate output files with the links set to download the books off the attached storage, which must be mounted on /media/LIB. On the XO, this means that you are using a USB flash drive for the attached storage and have named the volume LIB when you created it. The DJVU files should be in a /djvu subdirectory on the attached storage. There is a downloader tool that will download all the books in the collection, bookdownload.py, and another that will download all the covers, coverdownload.py. Once the downloads are completed copy all the books to the attached storage /media/LIB/djvu, and the covers to /media/lib/covers.

This solution would probably work on many installations of Linux, but very likely you'd have to change the code to reflect other storage naming conventions on other platforms, especially Windows where the paths have drive letters.

About The Collection

The Internet Archive Children's Library is a digital repository of over 3,300 digital public domain books for children from around the world. The Rural Design Collective selected a subset of these books and created a child-friendly user interface for the OLPC XO as part of their 2009 Summer Mentoring Program.