Tuesday, August 2, 2011

Better Git-scribe Bullet Lists on Kindle

‹prev | My Chain | next›

Today, I continue my efforts to improve the mobi formatted version of SPDY Book (which you should totally buy, it's awesome). One of the problems in the mobi is that, on the kinde, bullet lists are all out of whack.

On the mobi readers that I have on Ubuntu, the bullet lists look fine. On the kindle, however, the text starts on the line below the bullet. Inspecting the element in Calibre, I see:



Some of that is certainly added by Calibre. Still, I suspect that the problem on the kindle is a <p> tag inside a <li> tag. Hopefully I can remove it. If nothing else, I will run it through sed. To the git-scribe source!

In generate.rb, the do_mobi method reads:
    def do_mobi
do_html
info "GENERATING MOBI"
generate_toc_files
# generate book.opf
cmd = "kindlegen -verbose book.opf -o book.mobi"
if ex(cmd)
'book.mobi'
end
end
The first place I will look, then, is that book.opf that served as the input to kindlegen. If it has <p> tags inside a <li> tags, then I can concentrate my efforts there. If not, I will have to investigate kindlegen itself.

Hrm... book.opf is fairly empty. It has a top-level <package> tag that contains four children: <metadata>, <manifest>, <spine>, and <guide>. Both <manifest> and <guide> reference book.html, so I check there next.

The book.html file is pretty cool. It not only has the bullet list that is causing me problems, but includes CSS and JS(!) as well. I will be keen to see the Javascript in action at some point (it seems to do footnotes somehow) and I definitely need to fiddle with the CSS. But, for now, I am only interested in the lists, which look like:
<div class="ulist"><ul>
<li>
<p>
2011-06-30: First beta release.
</p>
</li>
<li>
<p>
2011-07-05: Formatting changes. Copy edits from Ashish Dixit.
</p>
</li>
Yup. Stinking <p> tags. To verify that this is the source of my bullet list woes, I manually remove the <p> tags, manually regenerate the mobi (using the same kindlegen command from generate.rb), and copy the file to my kindle:
➜  output git:(master) ✗ kindlegen -verbose book.opf -o book.mobi
➜ output git:(master) ✗ cp book.mobi /media/Kindle/documents/spdybook.mobi
And that fixes it! But I am not going to manually edit book.html each time I edit SPDY Book.

Before trying to alter the output of asciidoc, I try a pure CSS solution. Git-scribe uses the following to style <p> and <li> tags:
p, ul, ol, dl {
margin:10px 0
}
I get nowhere with that, however. Making p display inline, removing the margin, special styles for p tags inside li tags all have no effect.

So I resort to manually removing the extra p tag:
    def do_html
return true if @done['html']
info "GENERATING HTML"
# TODO: look for custom stylesheets
styledir = local('stylesheets')
cmd = "asciidoc -a stylesdir=#{styledir} -a theme=scribe #{BOOK_FILE}"
if ex(cmd)
remove_p_from_li('book.html')
@done['html'] == true
'book.html'
end
end
My sed and awk-fu are not powerful enough to work on multi-line text, so I have to resort to slurping the entire book into a buffer and gsub'ing on it:
    def remove_p_from_li(file)
content = File.read(file)
File.open(file, 'w') do |f|
f.write content.gsub(%r"<li>\s*<p>(.+?)</p>\s*</li>"m, '<li>\1</li>')
end
end
There is simply no way that is going to work for all books. But that will do for now as it successfully removes the p tags from my output:
<h2 id="_history">History</h2>
<div class="sectionbody">
<div class="ulist"><ul>
<li>
2011-06-30: First beta release.
</li>
<li>
2011-07-05: Formatting changes. Copy edits from Ashish Dixit.
</li>
The resulting mobi looks good on the kindle so I check that off my TODO list. But add another, future TODO item to come up with a more robust solution.


Day #101

No comments:

Post a Comment