E-Book Enlightenment

Free E-Book Formats

free_1.png

For the purposes of this book I consider an e-book to be in a file that can be downloaded to the computer and read when the computer is not connected to the network.  There are many websites where you can read a book online, but I don't consider websites to be e-books.

I'm also going to limit the list to formats that can be read on a computer without dealing with Digital Rights Management.  Free e-books are likely to be the only ones without DRM.

Plain Text

This is the oldest format and the simplest.  A plain text file just contains letters, numbers, punctuation, and spaces.  There may be a newline character (the character you make when you press Enter to start a new line) at the end of each line, or newlines may be just used to separate paragraphs.  There are no changes in font, no bold, no italics, no underlines.  By convention a word is considered to be bold if it has asterisks (*) before and after it.   A word is considered italicized if it has underline characters (_) before and after.

Reading a Plain Text E-Book 

Advantages

Plain text produces the smallest files by far.  It is the simplest format to create a reader for, so it is supported on the most devices.  While all the text needs to be displayed in the same font, you can make the font as large or small as you need it to be and the text will wrap itself to fit in the available space, making it a good choice for readers that can benefit from a larger font.  Because it is so simple to support in a reader program the program might have features that are not supported for other formats.  In the case of Sugar, plain text files are the only ones (so far) that have support for text to speech with highlighting.

Disadvantages

No illustrations.  This makes it a poor format for children's books.

Portable Document Format (PDF)

This is one of the most popular formats.  It is a compressed version of the PostScript language used to format pages for printers.  What you see on the screen looks exactly like the page printed using the original PostScript.

StraightPDF.jpg 

Advantages

This is an attractive format that can support having illustrations.

Disadvantages

A PDF is designed to show exactly what a printed page will look like, and not every printed page works on the screen.  Multiple columns, tiny fonts and landscape page orientation can make a PDF unusable on the screen.

Another issue with a PDF is that the text cannot be reformatted.  You can zoom in on a PDF but unlike plain text you can't make the text larger and have it wrap to fit on the page.

Image Container PDF's

Image Container PDF is a term used by the Internet Archive to describe a PDF that is composed entirely of images of book pages.  This format gives the reader an experience as much as possible like reading the original book.  PDFs created this way can have a "text layer" created by Optical Character Recognition, making these e-books searchable.

DjVu.jpg

Advantages

An excellent format for children's books, which often have pictures and other decorations on every page.

Disadvantages

PDFs composed of images have huge file sizes (20 megabytes or more is common for Internet Archive PDF's, 50 megabytes and up is common for PDF's like this you create yourself) and highly decorated books can use a lot of memory to read, in extreme cases causing out of memory errors.

Comic Book Zip (CBZ)

A CBZ file is simply a bunch of sequentially named images stored in a Zip archive file.  Generally the suffix on the archive is renamed from .zip to .cbz.

There is a related format Comic Book RAR (CBR) which is used more often than CBZ.  This uses a RAR archive file rather than a Zip file, so you need to have a commercial program to create RAR archives.  This may give a slightly smaller file size than a CBZ, but in my opinion not enough to make it preferable to CBZ.

ViewSlides1.jpg 

Advantages

Smaller file size than a PDF created with the same images.  Very easy to create.

Disadvantages

No support for text to make the pages searchable like PDF has.

DjVu

DjVu is an alternative to PDF's created with book page images.  DjVu is a method of compressing these images that is optimized for documents and book pages.  As a result .djvu files are smaller than the equivalent PDF and can take less memory to read.

DjVu.jpg 

Advantages

Noticeably smaller file size than PDF's composed of page images. Also smaller than CBZ's.

Disadvantages

Only supported by the later versions of the Read Activity which requires a newer version of Sugar than .82.  Most XO laptops run .82 or older.

Rich Text Format (RTF)

This is a file format invented by Microsoft to simplify sharing documents between different brands of word processor.  Most word processors can read and write this format as well as their own format.

It may seem like a stretch to consider RTF as a format for e-books, but in fact there are e-books that use this format.  Of all the e-book formats distributed by the Baen Free Library website only RTF is usable in Sugar .82.  (If you have a version of Sugar that supports EPUB the Baen Free Library now offers that format).

The RTF format in a word processor

Advantages

I can't think of any.

Disadvantages

Really there are only two ways to use an RTF file as an e-book: load it into a word processor and convert it to a PDF, then read that file, or use an e-book reader like Read Etexts that will convert the RTF to a plain text file when it first loads it.

EPUB

EPUB is a format specifically meant for e-books, unlike all the other formats discussed so far.  It is based on XHTML and Cascading Style Sheets like a web page, and can include image files, but the various files are stored in a single Zip archive file.  There is special XML file called an NCX that provides a table of contents for the document.

This is The Big Book of Aviation for Boys as an EPUB with illustrations.  I created the EPUB for this book.

Read Activity With EPUB 

Advantages

Like PDFs an EPUB can contain formatted text and illustrations.

Like a plain text file the text can be made larger or smaller and the text will re-wrap to fit in the visible space.

The file size is small.

The format is supported on many devices as well as on computers.  It may become the most popular e-book format.

Disadvantages

Like DjVu, it is only supported by the latest versions of the Read Activity that will not run on Sugar .82.

While many free e-books are available that use the EPUB format, few make full use of what the format has to offer.  Project Gutenberg EPUBs may or may not have illustrations, and EPUB's from the Internet Archive are made from OCR'd text that has often not been proofed and corrected.

This is Pride and Prejudice from Project Gutenberg as an EPUB, without illustrations:

Pride and Prejudice ePub

Here is the same book from the Internet Archive, with illustrations but badly needing proofreading:

Pride and Prejudice from the Internet Archive