published: (updated: )
by Harshvardhan J. Pandit
is part of: klip - Kindle Annotations
Project Source: klip - Github
How I became the maintainer of klip
I was not the original maintainer of klip. The project was started by Github user emre with a few contributions by berkerpeksag. I came about it when searching for a pypa (python packaging authority, the place where third party python libraries are hosted) package related to kindle annotations.
I opened an issue and emre said that I could have the project as he wasn't really maintaining it anymore. This was the first time I was being given responsibility of someone else's project, and it made me quite excited. The transfer was smooth, and I was even given the project on pypa. So here I was, with someone else's code, doing what I wanted to do.
The first thing I did was to check if the code works with my Kindle as it is. It did, so I did not need any immediate modifications. With such lazy thoughts, the project stagnated for quite a few months. Recently, I took upon myself to update the documentation and to maintain it as much as I can.
Structure of project
There are two file,
devices.py contains a class for each Kindle that has a
different annotation format; and
parser.py which extracts the annotations.
Each device is an instance of an abstract class called
which contains the fields and properties used in each annotation.
class KindleBase(object): noises = None title = None author_in_title = None type_info = None time_format = None clip_type = None page = None location = None added_on = None content = None
Classes that inherit this base class define these attributes. The project has classes that handle annotations for-
- Kindle 1-3 (
- Kindle 4 (
- Kindle Paperwhite (
- Kindle Touch (
As and when I come across any new form of Kindle (or annotation), I will create a new class for them and add it to the devices. This keeps the parser free to do its job, which is to parse stuff.
ClippingLoader contains the parsing code in various functions.
The module contains two functions for parsing. The first,
takes data in the form of a string (read from a file, e.g.) and
parses it. The second,
load_from_file, takes a filepath and
parses the contents of the file.
As explained in the
the annotations are separated by a series of
The first task is to create chunks of annotations that can then be
handled individually. Python offers a handy mechanism to break text
based on a pattern using the
ENTRY_SEPERATOR = '=' * 10 chunks = data.split(ENTRY_SEPERATOR)
Each chunk has at least 5 lines-
- Title and Author
- Annotation type, location, timestamp
- blank line
- Text of annotation
If there are less than 5 lines, then it is not the kind of annotation we need to address or handle. To extract each item, we pass the entire chunk to the helper functions which use regex to extract relevant bits and then return it.
- auto-detect the Kindle model by matching all relevant regexes
- turn the annotation parser into a
kindle.jslibrary that can be run in browsers
- use the above script in heroku webapp for online clipping parsing