Publishing is Coding: Change My Mind

Nika Zhenya's entries from Mariana Eguaras' blog in broken english.

Digital Publishing as Publishing from Scratch

October 3, 2018 | Methodology | MD / EPUB / MOBI / spanish version

Thanks to Mariana Eguaras we are going to blog about digital publishing, its characteristics, benefits and challenges. We are also going to talk about its relation with print publishing and how these issues directly affect the proceedings for any kind of publishing.

We already have planned what we are going to write about in the first entries, but any suggestions are welcome. As much as it is possible the writing won't be technical. We are going to try to be more friendly to the general public or publishers.

However, you have to consider that some technicalities are necessary for publishing. The typography, printing or design slangs are common knowledge for publishers. In the same way, the jargons from web or software developers are starting to be part of our cultural background.

The entries were originally wrote in spanish. Some of them are now kind of old: in some things I have a different opinion or approach. And as it is obvious, english is not my first language. Therefore, you are going to find a lot of grammar mistakes or typos and I will only translate (in a very loose way) the entries that I still consider relevant. So when you find this kind of box, it means that it is an addendum only for this broken english version.

Do you want to improve this mess? You can always help through GitLab or GitHub.

In this first entry we will do a general comparison between some of the most common methods for developing an standarized ebook in EPUB format. Some other time we will go deeper in the history of EPUB.

First off we should say that between the different ebook formats available the EPUB since the begining was created as a type of file for ebooks. The EPUB stands out because of its versatility, lightness and respect of web standards. This ensures code uniformity and complete control over the text edition.

With these features, the EPUB is easily convertible in propertary formats as the ones used by Amazon or Apple. That means that we can save resources and time when we develop a digital publication.

This flexibility also allows the development of software that intends to facilitate the creation of EPUBs. Just with a couple of clicks in a text processor (Writer or Word, e.g.) or desktop publishing (like InDesign) we instantly have an EPUB.

At first glance this is a huge advantage for indie authors or publishers that don't want to invest in “additional efforts.” However there are at least two disvantages in doing things this way:

  1. The code, design and text edition's qualities tend to be lower in comparison of others methods.

  2. It is often forgotten that the most important thing about the digital revolution it is not the ebook.

The ebook is the most common feature in digital publishing but it is just the tip of the iceberg. In order to go deeper we will have to familiarize with the behind the scenes of ebook's development.

In spanish I insist that digital publishing isn't the same as digital editing. In spanish it is common to use the word “edition” and derivatives for things concerning publishing. But as far as I can see, “edition“ has a more general meaning in english spoken world.

With “digital editing” I mean the process of publishing that involves the use of a computer (practically all publishing industry nowadays). “Digital publishing” is the product of such process. In these translations I will use the terms interchangeably. Only when I see it relevant I will say “digital editing” or “digital text editing.”

Some people are skeptical about the need of publishing “from scratch.” Most people prefer to use converters to automatically create EPUB files.

Why do we have to learn markup languages such as HTML or Markdown? Why should we worry about styles sheets like CSS or SCSS? Why must we think about programming languages (JavaScript, Python, Ruby or C++, e.g.) and how it could create new reading experiencies or improve the quality of text edition?

Regardless wether you want a print or digital book, if we start to pay attention in methodologies, litle by litle we will see its importance.

Exercise's peculiarities

To show the advantages and disadvantages of converters compared to “from scratch publishing,” we will develop the same book but with each method.

We are gonna do this exercise as realistically as possible. That is why we are gonna use Gutenberg Project's spanish edition of Don Quixote. For uniformity our standing points are the text in HTML format and the same CSS style sheet.

You could wonder:

Production time chart: the efectiveness of the “from scratch” method

Production time chart in minutes.
Production time chart in minutes.

One of the biggest myths about “from scratch publishing” is that it requires a lot of time. But “from scratch” doesn't mean we have to code it all by hand. As we will see in other entries, with scripts we can grasp all monotonous work implied in EPUB development.

With “from scratch publishing” I mean a method were we don't have a publishing enviroment. Instead of that we use a plain text editor or a source code editor and a command-line interface.

This method could sound very complex and time consuming. While “from scratch publishing” has it owns challenges, anyone with a computer can overcome these difficulties.

If we ignore the time needed to format text, in the following chart we can see that “from scratch” method is the most efective.

With InDesign and Jutoh we have to link each CSS style to a paragraph or character style. InDesign is way more intuitive than Jutoh. With Sigil or “from scratch publishing” we don't have this need, because we can automatically link the CSS with the book. But “from scratch” method has the advantage that we don't have to recreate the directory tree or import files.

EPUB's size chart: the impact of images and “junk” code

EPUB's size chart in KBs.
EPUB's size chart in KBs.

There are two factors that impact EPUB's size: 1) embedded images and 2) “junk” code.

Most EPUBs embed at least one image, the cover, and sometimes also a back cover and an author's photo. It doesn't matter if there are just a couple elements, images are the most heavy files in an EPUB if we have one or more of these setups:

Neither of this conditions affect our exercise because we are using the same 204 KB image.

The difference comes from “junk” code. Some converters add extra code lines. Most of the times it is because it inject its credits. We also get extra code if we work with paragraph or character styles instead of CSS styles.

These extra code lines don't improve the reading experience of our EPUB, that is why we called them “junk” code.

When we allow converters to create the CSS, they will use their own name conventions that generates two downsides:

  1. Needless increase of file's size.

  2. CSS name convention that could make it hard to understand or edit.

InDesing and Jutoh's EPUB are bigger because of “junk” code. Nevertheless, the size difference between Sigil and “from scratch publishing” involes the ebook's structure.

From EPUB3 we have two files for the table of contents (TOC). NCX is the legacy file while the new file follows an XHTML structure.

Because of that, the EPUB developed with “from scratch publishing” has two TOCs. This adds 11 KB resulting in a difference of only 5 KB between Sigil and “from scratch publishing” books.

This means that by default Sigil doesn't create the new required TOC format. That could affect the reading experience in newer devices.

Errors and warnings chart: EPUB validation

Erros and warnings chart.
Erros and warnings chart.

One of the main advantages of not developing an EPUB with “from scratch” method is that we don't have to know HTML, CSS and EPUB structures. Usually we also count with a graphical interface that implies a short learning curve.

However, ebooks not only requiere good text edition and design quality, they also need coherent structures, i.e. we have to care about technical issues. EPUBs must not have errors or warnings because of bad quality HTML or CSS code, insufficient metadata or image issues.

For this reasons we need EPUB validators. The official tool for EPUB validation is EpubCheck. You can use it online or download it.

Generally we use another validator so we can do a double check. For this exercise we also used BlueGriffon. This software isn't free, but is demanded by some clients.

The above chart only show BlueGriffon's validation because EpubCheck didn't find any error or warning. We had a few issues because we used the same HTML and CSS files. Besides, each method created metadata independently. (For “from scratch publishing” we used Pecas, a suite of publishing scripts.)

In InDesign the issue is because an incorrect image compression. For Sigil and Jutoh, BlueGriffon considers they are using obsolete metadata elements.

Actually, it isn't hard to solve these issues. Nevertheless, it could be very frustrating to solve them if you don't know what is inside an EPUB file. In order to solve them we must decompress the EPUB, then we have to modify the problematic files and, finally, compress the files again.

Implicit production costs: propertary vs free software

We dont need to buy software in order to develop EPUBs.

However, half of the methods seen here use propertary software and, therefore, they have some additional costs. For InDesign and Jutoh we have to purchase software licenses. Sigil and “from scratch publishing” only use free software.

A common myth between non-free software users is that this kind of tools have lower quality. At least in publishing enviroment this isn't true. As we could see in this exercise: Sigil and “from scratch publishing” had better results.

However, most publishers only use Adobe products, so in specific circumstances it is more convenient to develop ebooks by this way.

If you really care about the quality of your EPUBs, think twice before buying propertary software. The free and open source software communities have great alternatives that could fulfill your needs.

Conclusion: “from scratch publishing” wins the match

As it was shown in this exercise “from scratch publishing” had better results. Most readers could think that this method requieres certain complex knowledge and a long learning curve.

I can say that within a 24 hours workshop anybody can develop their first ebook “from scratch.” Usually most learners don't have a technical background such as knowing HTML, CSS or command line tools.

If you are gong to use software exclusively for ebooks, the recommendation is that it has to be free or open source software. With this you can avoid the cost increments at the same time that you can get free help from their communities.

You can download the graphics and the data :)