perrotuerto.blog/md/entry001.md

325 lines
14 KiB
Markdown
Raw Normal View History

2018-10-03 19:45:11 -05:00
# Digital Publishing as Publishing from Scratch
2018-10-07 20:09:17 -05:00
@meta['2018-10-03','Methodology','A general comparation between the most common methods for developing EPUBs: InDesign, Sigil, Jutoh and “from scratch publishing.”','https://marianaeguaras.com/edicion-digital-como-edicion-desde-cero']
2018-10-03 19:45:11 -05:00
Thanks to [Mariana Eguaras](http://marianaeguaras.com/)
we are going to blog about __digital publishing__,
its __characteristics, benefits and challenges__. We are
also going to talk about __its relation with print
publishing__ and how these issues directly affect the
proceedings for any kind of publishing.
2018-10-03 19:45:11 -05:00
We already have planned what we are going to write about
in the first entries, but any suggestions are welcome. As
much as it is possible the writing won't be technical. We
are going to try to be more friendly to the general public
or publishers.
2018-10-03 19:45:11 -05:00
However, you have to consider that some technicalities
are necessary for publishing. The typography, printing or
design slangs are common knowledge for publishers. In the
same way, the jargons from web or software developers are
starting to be part of our cultural background.
2018-10-03 19:45:11 -05:00
2018-10-03 23:17:11 -05:00
> The entries were originally wrote in spanish. Some of
> them are now kind of old: in some things I have a
> different opinion or approach. And as it is obvious,
> english is not my first language. Therefore, you are
> going to find a lot of grammar mistakes or typos
> and I will only translate (in a very loose way) the
> entries that I still consider relevant. So when you find
> this kind of box, it means that it is an _addendum_ only
> for this broken english version. {.addenda}
2018-10-03 19:45:11 -05:00
> Do you want to improve this mess? You can always help
2018-10-03 23:17:11 -05:00
> through [GitLab](https://gitlab.com/NikaZhenya/publishing-is-coding)
> or [GitHub](https://github.com/NikaZhenya/publishing-is-coding). {.addenda}
In this first entry we will do a __general comparison
between some of the most common methods for developing an
standarized ebook in +++EPUB+++ format__. Some other time
we will go deeper in the history of +++EPUB+++.
First off we should say that between the different ebook
formats available the +++EPUB+++ since the begining was
created as a type of file for _ebooks_. The +++EPUB+++
stands out because of its __versatility, lightness and
respect of web standards__. This ensures code uniformity
and __complete control over the text edition__.
With these features, the +++EPUB+++ is easily convertible
in propertary formats as the ones used by Amazon or Apple.
That means that we can save resources and time when we
develop a digital publication.
This flexibility also allows the development of software
that intends to facilitate the creation of +++EPUB+++s. Just
with a couple of clicks in a text processor (Writer or Word,
e.g.) or desktop publishing (like InDesign) we instantly
have an +++EPUB+++.
At first glance this is a huge advantage for indie authors
or publishers that don't want to invest in “additional
efforts.” However there are at least __two disvantages__ in
doing things this way:
1. The code, design and text edition's qualities tend to be
lower in comparison of others methods.
2. It forgots that the most important thing about the
digital revolution it is not the ebook.
The ebook is the most common feature in digital publishing
but it is just the tip of the iceberg. In order to go
deeper we will have to familiarize with __behind the scenes
of ebook's development__.
> In spanish I insist that digital publishing it isn't the
> same that digital editing. In spanish it is common to use
> the word “edition” and derivatives for things concerned
> during publishing. But as far as I can see, “edition“ has
> a more general meaning in english spoken world. {.addenda}
> With “digital editing” I mean _the process_ for publishing
> that involves the use of a computer (practically
> all publishing industry nowadays). “Digital publishing”
> is _the product_ of such process. In these translations I
> will use the terms interchangeably. Only when I see the
> relevance I will say “digital editing” or “digital text
> editing.” {.addenda}
Some people are skeptical about the need of publishing “from
scratch.” Most people prefer to use conversors to
automatically create +++EPUB+++ files.
Why we have to learn markup languages such as [+++HTML+++](https://en.wikipedia.org/wiki/HTML)
or [Markdown](https://en.wikipedia.org/wiki/Markdown)? Why
we should worry about styles sheets like [+++CSS+++](https://en.wikipedia.org/wiki/Cascading_Style_Sheets)
or [+++SCSS+++](https://en.wikipedia.org/wiki/Sass_(stylesheet_language))?
Why we must think about programming languages ([JavaScript](https://en.wikipedia.org/wiki/JavaScript),
[Python](https://en.wikipedia.org/wiki/Python_(programming_language)), [Ruby](https://en.wikipedia.org/wiki/Ruby_(programming_language))
or [C++](https://en.wikipedia.org/wiki/C%2B%2B), e.g.) and
how it could create new reading experiencies or improve the
quality of text edition?
Regardless you want a print or digital book, if we start to
put attention in methodologies, litle by litle we will see
its importance.
## Exercise's peculiarities
To show the advantages and disadvantages of conversors
compared to “from scratch publishing,” we will develop the
same book but with each method.
We are gonna do this exercise as real as possible. That is
why we are gonna use [Gutenberg Project's spanish edition of
_Don Quixote_](http://www.gutenberg.org/ebooks/2000). For
uniformity our standing points are the text in +++HTML+++
format and the same +++CSS+++ style sheet.
You could wonder:
* __Why will we use Gutenberg Project's edition if there are
better editions online?__ Because it is public domain.
Unlike [Wikisource's edition](https://es.wikisource.org/wiki/El_ingenioso_hidalgo_Don_Quijote_de_la_Mancha),
it is easy to download in a single file.
* __Why will we use an already formated text and not the
direct source?__ I found some typos and similar issues;
plus, formating text could be a nigthmare which I
prefer to discuss it other time.
* __Why will we use the same style sheet instead of
redesign the book in each method?__ Design could involve
a lot of time and resources. Also, I want to show the
revelance and flexibility of web style sheets on
publishing even though I am going to talk about it in
other entry.
* __Which methods will we apply in this exercise?__ We will
see [InDesign's](https://www.adobe.com/products/indesign.html)
way of doing things because it is the most common among
publishers and designers. We will use [Jutoh](http://jutoh.com/)
like an example of propertary software for ebook
publishing. Also, we will employ [Sigil](https://github.com/Sigil-Ebook/Sigil)
as open software for ebook publishing. Finally, we will
show how “from scratch publishing” could be a good
candidate for digital publishing.
## Production time chart: the efectiveness of “from scratch”
method
![Production time chart in minutes. “Desde cero” is equal to “from scratch”.](../img/e001_01.jpg)
One of the biggest myths about “from scratch publishing”
is that it requires a lot of time. But “from scratch”
doesn't mean we have to do all the code by hand. As we will
see in other entries, with [scripts](https://en.wikipedia.org/wiki/Scripting_language)
we can grasp all monotonous work implied in +++EPUB+++
development.
With “from scratch publishing” I mean a method were we
don't have a publishing enviroment. Instead of that we use
a [plain text editor](https://en.wikipedia.org/wiki/Text_editor)
or a [source code editor](https://en.wikipedia.org/wiki/Source_code_editor)
and a [command-line interface](https://en.wikipedia.org/wiki/Command-line_interface).
This method could sound very complex and time consuming.
While “from scratch publishing” has it owns challenges,
anyone with a computer can overcome these difficulties.
If we ignore the time needed to format text, in the
following chart we can see that __“from scratch” method is
the most efective__.
With InDesign and Jutoh we have to link each +++CSS+++ style
to a paragraph or character style. InDesign is way more
intuitive than Jutoh. With Sigil or “from scratch
publishing” we don't have this need, because we can
automatically link the +++CSS+++ with the book. But “from
scratch” method has the advantage that we don't have to
recreate the directory tree or import files.
## +++EPUB+++'s size chart: the impact of images and “junk”
code
![+++EPUB+++'s size chart in +++KB+++s.](../img/e001_02.jpg)
There are two factors that impact EPUB's size: __1)__
embedded images __2)__ “junk” code.
Most +++EPUB+++s embed at least one image, the cover, and
sometimes also a back cover and an author's photo. No
matter there are just a couple of elements, images are
__the most heavy files in an +++EPUB+++__ if we have one
or more of these setups:
* The book is short.
* The images are bigger than our needs.
* The images lack of good compression.
* The images are in an inconvenient format.
Neither of this conditions affect our exercise because we
are using the same 204 +++KB+++ image.
__The difference comes from “junk” code__. Some conversors
add extra code lines. Most of the times it is because it
inject its credits. We also get extra code if we work with
paragraph or character styles instead of +++CSS+++ styles.
> These extra code lines doesn't improve the reading
> experience of our +++EPUB+++, that is why we called them
> “junk” code. {.addenda}
When we allow conversors to create the +++CSS+++, they
will use their own name conventions that generates __two
downsides__:
1. Needless increase of file's size.
2. +++CSS+++ name convention that could made hard to
understand or edit.
InDesing and Jutoh's +++EPUB+++ are bigger because of “junk”
code. Nevertheless, the size difference between Sigil and
“from scratch publishing” involes the ebook's structure.
From +++EPUB+++3 we have two files for the table of
contents (+++TOC+++). +++NCX+++ is the legacy file while the
new file follows an [+++XHTML+++](https://en.wikipedia.org/wiki/XHTML)
structure.
Because of that, __the +++EPUB+++ developed with “from
scratch publishing” has two +++TOC+++s__. This adds 11
+++KB+++ resulting in a difference of only 5 +++KB+++
between Sigil and “from scratch publishing” books.
> This means that by default Sigil doesn't create the new
> required +++TOC+++ format. That could affect the reading
> experience in newest devices. {.addenda}
## Errors and warnings chart: +++EPUB+++ validation
![Erros and warnings chart. Errors displayed in red and warnings in yellow.](../img/e001_03.jpg)
One of the main advantages of not developing an +++EPUB+++
with “from scratch” method is that we don't have to know
+++HTML+++, +++CSS+++ and +++EPUB+++ structures. Usually we
also count with a graphic interface that implies a short
learning curve.
However, __ebooks not only requieres good text edition and
design qualities, they also need coherent structures__,
i.e. we have to care about the technical issues. +++EPUB+++s
must not have errors or warnings because bad quality
+++HTML+++ or +++CSS+++ code, insufficient metadata or
images issues.
For this reasons we need __EPUB validators__. The official
tool for EPUB validation is EpubCheck. You can use it
[online](http://validator.idpf.org/) of [download it](https://github.com/IDPF/epubcheck/releases).
Generally we use another validator so we can do a double
check. For this exercise we also used [BlueGriffon](http://www.bluegriffon-epubedition.com/BGEV.html).
This software isn't free, but is demanded by some clients.
The above chart only show BlueGriffon's validation because
EpubCheck didn't find any error or warning. We had a few
issues because we used the same +++HTML+++ and +++CSS+++
files. Besides, each method created metadata independently.
(For “from scratch publishing” we used [Pecas](https://pecas.cliteratu.re/),
a suite of publishing scripts).
In InDesign the issue is because an incorrect image
compression. For Sigil and Jutoh, BlueGriffon considers they
are using obsolete metadata elements.
Actually, __it isn't hard to solve these issues__.
Nevertheless, it could be very frustrating to solve them if
you don't now what is inside an +++EPUB+++ file. In order to
solve them we must decompress the +++EPUB+++, then we have
to modify the problematic files and, finally, compress the
files again.
## Implicit production costs: propertary _vs_ free software
We dont need to buy software in order to develop
+++EPUB+++s.
However, the half of the methods seen here use propertary
software and, therefore, they have some additional costs.
For InDesign and Jutoh we have to purchase software
licenses. Sigil and “from scratch publishing” only use free
software.
A common myth between non-free software users is that this
kind of tools have lower quality. At least in publishing
enviroment this is fake. As we could see in this exercise:
__Sigil and “from scratch publishing” had better results__.
Nevertheless, most publishers only use Adobe products, so in
specific circumstances it is more convenient to develop
ebooks by this way.
If you really care about the quality of your +++EPUB+++s,
think twice before buying propertary software. The free and
open source software communities have great alternatives
that could fullfit your needs.
## Conclusion: “from scratch publishing” wins the match
As it was shown in this exercise __“from scratch
publishing”__ had better results. Most readers could think
that this method requieres certain complex knowledge and a
long learning curve.
I can say that within 24 hours workshop anybody can develop
its first ebook “from scratch.” Usually most assitants don't
have a technical background such as knowing +++HTML+++,
+++CSS+++ or command line tools.
If you are gong to use software exclusively for ebooks, the
recommendation is that it has to be free or open source
software. With this you can avoid the costs increment at
the same time that you can get free help from its
communities.
You can download the exercise's files [here](http://clientes.cliteratu.re/eguaras/epubs.zip).
Just consider that they are in spanish :)