Inheriting klip
published: (updated: )
by Harshvardhan J. Pandit
is part of: klip - Kindle Annotations
kindle python
Project Source: klip - Github
How I became the maintainer of klip
I was not the original maintainer of klip. The project was started by Github user emre with a few contributions by berkerpeksag. I came about it when searching for a pypa (python packaging authority, the place where third party python libraries are hosted) package related to kindle annotations.
I opened an issue and emre said that I could have the project as he wasn't really maintaining it anymore. This was the first time I was being given responsibility of someone else's project, and it made me quite excited. The transfer was smooth, and I was even given the project on pypa. So here I was, with someone else's code, doing what I wanted to do.
Moving forward
The first thing I did was to check if the code works with my Kindle as it is. It did, so I did not need any immediate modifications. With such lazy thoughts, the project stagnated for quite a few months. Recently, I took upon myself to update the documentation and to maintain it as much as I can.
Structure of project
There are two file, devices.py
contains a class for each Kindle that has a
different annotation format; and parser.py
which extracts the annotations.
devices.py
Each device is an instance of an abstract class called KindleBase
which contains the fields and properties used in each annotation.
class KindleBase(object):
noises = None
title = None
author_in_title = None
type_info = None
time_format = None
clip_type = None
page = None
location = None
added_on = None
content = None
Classes that inherit this base class define these attributes. The project has classes that handle annotations for-
- Kindle 1-3 (
KindleOldGen
) - Kindle 4 (
Kindle4
) - Kindle Paperwhite (
KindlePaperwhite
) - Kindle Touch (
KindleTouch
)
As and when I come across any new form of Kindle (or annotation), I will create a new class for them and add it to the devices. This keeps the parser free to do its job, which is to parse stuff.
parser.py
ClippingLoader
contains the parsing code in various functions.
The module contains two functions for parsing. The first, load
,
takes data in the form of a string (read from a file, e.g.) and
parses it. The second, load_from_file
, takes a filepath and
parses the contents of the file.
Parsing Logic
Seperating chunks
As explained in the
previous post,
the annotations are separated by a series of =
characters.
The first task is to create chunks of annotations that can then be
handled individually. Python offers a handy mechanism to break text
based on a pattern using the split
method.
ENTRY_SEPERATOR = '=' * 10
chunks = data.split(ENTRY_SEPERATOR)
Parsing chunks
ClippingLoader._parse
Each chunk has at least 5 lines-
- seperator
- Title and Author
- Annotation type, location, timestamp
- blank line
- Text of annotation
If there are less than 5 lines, then it is not the kind of annotation we need to address or handle. To extract each item, we pass the entire chunk to the helper functions which use regex to extract relevant bits and then return it.
ToDo
- auto-detect the Kindle model by matching all relevant regexes
- turn the annotation parser into a
kindle.js
library that can be run in browsers - use the above script in heroku webapp for online clipping parsing