326 lines
14 KiB
Markdown
326 lines
14 KiB
Markdown
# Digital Publishing as Publishing from Scratch
|
|
|
|
@meta['2018-10-12','Methodology','A general comparation between the most common methods for developing EPUBs: InDesign, Sigil, Jutoh and “from scratch publishing.”','https://marianaeguaras.com/edicion-digital-como-edicion-desde-cero']
|
|
|
|
Thanks to [Mariana Eguaras](http://marianaeguaras.com/)
|
|
we are going to blog about __digital publishing__,
|
|
its __characteristics, benefits and challenges__. We are
|
|
also going to talk about __its relation with print
|
|
publishing__ and how these issues directly affect the
|
|
proceedings for any kind of publishing.
|
|
|
|
We already have planned what we are going to write about
|
|
in the first entries, but any suggestions are welcome. As
|
|
much as it is possible the writing won't be technical. We
|
|
are going to try to be more friendly to the general public
|
|
or publishers.
|
|
|
|
However, you have to consider that some technicalities
|
|
are necessary for publishing. The typography, printing or
|
|
design slangs are common knowledge for publishers. In the
|
|
same way, the jargons from web or software developers are
|
|
starting to be part of our cultural background.
|
|
|
|
> The entries were originally wrote in spanish. Some of
|
|
> them are now kind of old: in some things I have a
|
|
> different opinion or approach. And as it is obvious,
|
|
> english is not my first language. Therefore, you are
|
|
> going to find a lot of grammar mistakes or typos
|
|
> and I will only translate (in a very loose way) the
|
|
> entries that I still consider relevant. So when you find
|
|
> this kind of box, it means that it is an _addendum_ only
|
|
> for this broken english version. {.addenda}
|
|
|
|
> Do you want to improve this mess? You can always help
|
|
> through [GitLab](https://gitlab.com/NikaZhenya/publishing-is-coding)
|
|
> or [GitHub](https://github.com/NikaZhenya/publishing-is-coding). {.addenda}
|
|
|
|
In this first entry we will do a __general comparison
|
|
between some of the most common methods for developing an
|
|
standarized ebook in +++EPUB+++ format__. Some other time
|
|
we will go deeper in the history of +++EPUB+++.
|
|
|
|
First off we should say that between the different ebook
|
|
formats available the +++EPUB+++ since the begining was
|
|
created as a type of file for _ebooks_. The +++EPUB+++
|
|
stands out because of its __versatility, lightness and
|
|
respect of web standards__. This ensures code uniformity
|
|
and __complete control over the text edition__.
|
|
|
|
With these features, the +++EPUB+++ is easily convertible
|
|
in propertary formats as the ones used by Amazon or Apple.
|
|
That means that we can save resources and time when we
|
|
develop a digital publication.
|
|
|
|
This flexibility also allows the development of software
|
|
that intends to facilitate the creation of +++EPUB+++s. Just
|
|
with a couple of clicks in a text processor (Writer or Word,
|
|
e.g.) or desktop publishing (like InDesign) we instantly
|
|
have an +++EPUB+++.
|
|
|
|
At first glance this is a huge advantage for indie authors
|
|
or publishers that don't want to invest in “additional
|
|
efforts.” However there are at least __two disvantages__ in
|
|
doing things this way:
|
|
|
|
1. The code, design and text edition's qualities tend to be
|
|
lower in comparison of others methods.
|
|
2. It is often forgotten that the most important thing about
|
|
the digital revolution it is not the ebook.
|
|
|
|
The ebook is the most common feature in digital publishing
|
|
but it is just the tip of the iceberg. In order to go
|
|
deeper we will have to familiarize with __the behind the
|
|
scenes of ebook's development__.
|
|
|
|
> In spanish I insist that digital publishing isn't the same
|
|
> as digital editing. In spanish it is common to use the
|
|
> word “edition” and derivatives for things concerning
|
|
> publishing. But as far as I can see, “edition“ has
|
|
> a more general meaning in english spoken world. {.addenda}
|
|
|
|
> With “digital editing” I mean _the process_ of publishing
|
|
> that involves the use of a computer (practically
|
|
> all publishing industry nowadays). “Digital publishing”
|
|
> is _the product_ of such process. In these translations I
|
|
> will use the terms interchangeably. Only when I see it
|
|
> relevant I will say “digital editing” or “digital text
|
|
> editing.” {.addenda}
|
|
|
|
Some people are skeptical about the need of publishing “from
|
|
scratch.” Most people prefer to use converters to
|
|
automatically create +++EPUB+++ files.
|
|
|
|
Why do we have to learn markup languages such as [+++HTML+++](https://en.wikipedia.org/wiki/HTML)
|
|
or [Markdown](https://en.wikipedia.org/wiki/Markdown)? Why
|
|
should we worry about styles sheets like [+++CSS+++](https://en.wikipedia.org/wiki/Cascading_Style_Sheets)
|
|
or [+++SCSS+++](https://en.wikipedia.org/wiki/Sass_(stylesheet_language))?
|
|
Why must we think about programming languages ([JavaScript](https://en.wikipedia.org/wiki/JavaScript),
|
|
[Python](https://en.wikipedia.org/wiki/Python_(programming_language)), [Ruby](https://en.wikipedia.org/wiki/Ruby_(programming_language))
|
|
or [C++](https://en.wikipedia.org/wiki/C%2B%2B), e.g.) and
|
|
how it could create new reading experiencies or improve the
|
|
quality of text edition?
|
|
|
|
Regardless wether you want a print or digital book, if we
|
|
start to pay attention in methodologies, litle by litle we
|
|
will see its importance.
|
|
|
|
## Exercise's peculiarities
|
|
|
|
To show the advantages and disadvantages of converters
|
|
compared to “from scratch publishing,” we will develop the
|
|
same book but with each method.
|
|
|
|
We are gonna do this exercise as realistically as possible.
|
|
That is why we are gonna use [Gutenberg Project's spanish edition of
|
|
_Don Quixote_](http://www.gutenberg.org/ebooks/2000). For
|
|
uniformity our standing points are the text in +++HTML+++
|
|
format and the same +++CSS+++ style sheet.
|
|
|
|
You could wonder:
|
|
|
|
* __Why will we use Gutenberg Project's edition if there are
|
|
better editions online?__ Because it is public domain.
|
|
Unlike [Wikisource's edition](https://es.wikisource.org/wiki/El_ingenioso_hidalgo_Don_Quijote_de_la_Mancha),
|
|
it is easy to download in a single file.
|
|
* __Why will we use an already formated text and not the
|
|
direct source?__ I found some typos and similar issues;
|
|
plus, formating text could be a nigthmare which I
|
|
prefer to discuss another time.
|
|
* __Why will we use the same style sheet instead of
|
|
redesign the book in each method?__ Design could involve
|
|
a lot of time and resources. Also, I want to show the
|
|
revelance and flexibility of web style sheets on
|
|
publishing even though I am going to talk about it in
|
|
another entry.
|
|
* __Which methods will we apply in this exercise?__ We will
|
|
see [InDesign's](https://www.adobe.com/products/indesign.html)
|
|
way of doing things because it is the most common among
|
|
publishers and designers. We will use [Jutoh](http://jutoh.com/)
|
|
like an example of propertary software for ebook
|
|
publishing. Also, we will employ [Sigil](https://github.com/Sigil-Ebook/Sigil)
|
|
as open software for ebook publishing. Finally, we will
|
|
show how “from scratch publishing” could be a good
|
|
candidate for digital publishing.
|
|
|
|
## Production time chart: the efectiveness of the
|
|
“from scratch” method
|
|
|
|
![Production time chart in minutes.](../img/e001_01.jpg)
|
|
|
|
One of the biggest myths about “from scratch publishing”
|
|
is that it requires a lot of time. But “from scratch”
|
|
doesn't mean we have to code it all by hand. As we will
|
|
see in other entries, with [scripts](https://en.wikipedia.org/wiki/Scripting_language)
|
|
we can grasp all monotonous work implied in +++EPUB+++
|
|
development.
|
|
|
|
With “from scratch publishing” I mean a method were we
|
|
don't have a publishing enviroment. Instead of that we use
|
|
a [plain text editor](https://en.wikipedia.org/wiki/Text_editor)
|
|
or a [source code editor](https://en.wikipedia.org/wiki/Source_code_editor)
|
|
and a [command-line interface](https://en.wikipedia.org/wiki/Command-line_interface).
|
|
|
|
This method could sound very complex and time consuming.
|
|
While “from scratch publishing” has it owns challenges,
|
|
anyone with a computer can overcome these difficulties.
|
|
|
|
If we ignore the time needed to format text, in the
|
|
following chart we can see that __“from scratch” method is
|
|
the most efective__.
|
|
|
|
With InDesign and Jutoh we have to link each +++CSS+++ style
|
|
to a paragraph or character style. InDesign is way more
|
|
intuitive than Jutoh. With Sigil or “from scratch
|
|
publishing” we don't have this need, because we can
|
|
automatically link the +++CSS+++ with the book. But “from
|
|
scratch” method has the advantage that we don't have to
|
|
recreate the directory tree or import files.
|
|
|
|
## +++EPUB+++'s size chart: the impact of images and “junk”
|
|
code
|
|
|
|
![+++EPUB+++'s size chart in +++KB+++s.](../img/e001_02.jpg)
|
|
|
|
There are two factors that impact +++EPUB+++'s
|
|
size: __1)__ embedded images and __2)__ “junk” code.
|
|
|
|
Most +++EPUB+++s embed at least one image, the cover, and
|
|
sometimes also a back cover and an author's photo. It
|
|
doesn't matter if there are just a couple elements, images
|
|
are __the most heavy files in an +++EPUB+++__ if we have
|
|
one or more of these setups:
|
|
|
|
* The book is short.
|
|
* The images are bigger than our needs.
|
|
* The images lack of good compression.
|
|
* The images are in an inconvenient format.
|
|
|
|
Neither of this conditions affect our exercise because we
|
|
are using the same 204 +++KB+++ image.
|
|
|
|
__The difference comes from “junk” code__. Some converters
|
|
add extra code lines. Most of the times it is because it
|
|
inject its credits. We also get extra code if we work with
|
|
paragraph or character styles instead of +++CSS+++ styles.
|
|
|
|
> These extra code lines don't improve the reading
|
|
> experience of our +++EPUB+++, that is why we called them
|
|
> “junk” code. {.addenda}
|
|
|
|
When we allow converters to create the +++CSS+++, they
|
|
will use their own name conventions that generates __two
|
|
downsides__:
|
|
|
|
1. Needless increase of file's size.
|
|
2. +++CSS+++ name convention that could make it hard to
|
|
understand or edit.
|
|
|
|
InDesing and Jutoh's +++EPUB+++ are bigger because of “junk”
|
|
code. Nevertheless, the size difference between Sigil and
|
|
“from scratch publishing” involes the ebook's structure.
|
|
|
|
From +++EPUB+++3 we have two files for the table of
|
|
contents (+++TOC+++). +++NCX+++ is the legacy file while the
|
|
new file follows an [+++XHTML+++](https://en.wikipedia.org/wiki/XHTML)
|
|
structure.
|
|
|
|
Because of that, __the +++EPUB+++ developed with “from
|
|
scratch publishing” has two +++TOC+++s__. This adds 11
|
|
+++KB+++ resulting in a difference of only 5 +++KB+++
|
|
between Sigil and “from scratch publishing” books.
|
|
|
|
> This means that by default Sigil doesn't create the new
|
|
> required +++TOC+++ format. That could affect the reading
|
|
> experience in newer devices. {.addenda}
|
|
|
|
## Errors and warnings chart: +++EPUB+++ validation
|
|
|
|
![Erros and warnings chart.](../img/e001_03.jpg)
|
|
|
|
One of the main advantages of not developing an +++EPUB+++
|
|
with “from scratch” method is that we don't have to know
|
|
+++HTML+++, +++CSS+++ and +++EPUB+++ structures. Usually we
|
|
also count with a graphical interface that implies a short
|
|
learning curve.
|
|
|
|
However, __ebooks not only requiere good text edition and
|
|
design quality, they also need coherent structures__,
|
|
i.e. we have to care about technical issues. +++EPUB+++s
|
|
must not have errors or warnings because of bad quality
|
|
+++HTML+++ or +++CSS+++ code, insufficient metadata or
|
|
image issues.
|
|
|
|
For this reasons we need __+++EPUB+++ validators__. The
|
|
official tool for +++EPUB+++ validation is EpubCheck. You
|
|
can use it [online](http://validator.idpf.org/) or
|
|
[download it](https://github.com/IDPF/epubcheck/releases).
|
|
|
|
Generally we use another validator so we can do a double
|
|
check. For this exercise we also used [BlueGriffon](http://www.bluegriffon-epubedition.com/BGEV.html).
|
|
This software isn't free, but is demanded by some clients.
|
|
|
|
The above chart only show BlueGriffon's validation because
|
|
EpubCheck didn't find any error or warning. We had a few
|
|
issues because we used the same +++HTML+++ and +++CSS+++
|
|
files. Besides, each method created metadata independently.
|
|
(For “from scratch publishing” we used [Pecas](https://pecas.cliteratu.re/),
|
|
a suite of publishing scripts.)
|
|
|
|
In InDesign the issue is because an incorrect image
|
|
compression. For Sigil and Jutoh, BlueGriffon considers they
|
|
are using obsolete metadata elements.
|
|
|
|
Actually, __it isn't hard to solve these issues__.
|
|
Nevertheless, it could be very frustrating to solve them if
|
|
you don't know what is inside an +++EPUB+++ file. In order to
|
|
solve them we must decompress the +++EPUB+++, then we have
|
|
to modify the problematic files and, finally, compress the
|
|
files again.
|
|
|
|
## Implicit production costs: propertary _vs_ free software
|
|
|
|
We dont need to buy software in order to develop
|
|
+++EPUB+++s.
|
|
|
|
However, half of the methods seen here use propertary
|
|
software and, therefore, they have some additional costs.
|
|
For InDesign and Jutoh we have to purchase software
|
|
licenses. Sigil and “from scratch publishing” only use free
|
|
software.
|
|
|
|
A common myth between non-free software users is that this
|
|
kind of tools have lower quality. At least in publishing
|
|
enviroment this isn't true. As we could see in this exercise:
|
|
__Sigil and “from scratch publishing” had better results__.
|
|
|
|
However, most publishers only use Adobe products, so in
|
|
specific circumstances it is more convenient to develop
|
|
ebooks by this way.
|
|
|
|
If you really care about the quality of your +++EPUB+++s,
|
|
think twice before buying propertary software. The free and
|
|
open source software communities have great alternatives
|
|
that could fulfill your needs.
|
|
|
|
## Conclusion: “from scratch publishing” wins the match
|
|
|
|
As it was shown in this exercise __“from scratch
|
|
publishing”__ had better results. Most readers could think
|
|
that this method requieres certain complex knowledge and a
|
|
long learning curve.
|
|
|
|
I can say that within a 24 hours workshop anybody can
|
|
develop their first ebook “from scratch.” Usually most
|
|
learners don't have a technical background such as
|
|
knowing +++HTML+++, +++CSS+++ or command line tools.
|
|
|
|
If you are gong to use software exclusively for ebooks, the
|
|
recommendation is that it has to be free or open source
|
|
software. With this you can avoid the cost increments at
|
|
the same time that you can get free help from their
|
|
communities.
|
|
|
|
You can download the [graphics](http://git.cliteratu.re/publishing-is-coding/blob/master/src/entry001/graphics.ods)
|
|
and the [data](http://git.cliteratu.re/publishing-is-coding/raw/master/src/entry001/data.txt) :)
|