pettarin / export-kobo

A Python tool to export annotations and highlights from a Kobo SQLite file.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting the annotation ordered by page ?

Th0masL opened this issue · comments

Hello,
Thanks for this usefull tool :)
I'm having a small problem with the annotations :
Most of the time, I read a book two or three times, and I annotate something every time I read it.
The problem is that when I export the Annotations, they are ordered by date and not by Page number or Line number, so when I then read the Annotations, the order is a bit random and it makes no real sense according to the book order/chapters.
Is it possible to add a functionality that will order the Annotations in order of appearance in a book, instead of ordering them by the date ?
Thanks :)

Hi :)

I kind of forgot about exporting Notes from my Kobo reader, but I decided to give it one more try today.

I've been browsing the content of the file KoboReader.sqlite, and it seems that indeed, the column "Bookmark.StartContainerPath" is kind of containing a value that is precise enough to order the result in an order that match the appearance of the highlights in the book.

I tried with couple of books, and this query returns ALMOST the right order :

SELECT Bookmark.Text FROM Bookmark
WHERE Bookmark.VolumeID="<Insert_Book_VolumeID_Here>"
ORDER BY Bookmark.StartContainerPath

I wrote "ALMOST", because the "ORDER BY" cause is a bit stupid and put the following row
index_split_018.xhtml#point(/1/4/115:1)
before
index_split_018.xhtml#point(/1/4/81:1)
it is because the 1 of "115" comes before "8" of "81" in the alphabetical order.

But i'm pretty sure it's quite easy to fix that with python (but my python skills are bad, so I'm going to ask a coworker tomorrow :) )

So yeah, maybe for some books it will not work, but it seems that so far it's enough to get the Notes in the right order for pretty much all Ebooks that I tried :)

I'll keep you posted :)

If Bookmark.StartContainerPath is the right order, we can just add a numeric sort in the read_items function.

I'm happy to contribute a pull request on this; however I'm not totally sure about this. Need the author's confirmation.

Well, again, nobody except Kobo knows for sure, we can only observe the values their code puts in the SQLite file. If you want to provide a PR with the functionality, I will merge. But I am afraid that there is no simple way to correctly sort Bookmark.StartContainerPath (even parsing it to take into account numeric vs. lexicographic values) by just looking at its values, since an EPUB might contain a file "page2.xhtml" appearing before "page1.xhtml" in the TOC order. Or, if you fancy another example, if you just sort Bookmark.StartContainerPath, you get "acknowledgments.xhtml" before "title.xhtml".

Unfortunately I have no time (and no longer a Kobo!) to investigate the issuer further.