Inheriting klip

Inheriting the klip project, cleaning it up, and documenting it

Project Source: klip - Github

How I became the maintainer of klip

I was not the original maintainer of klip. The project was started by Github user emre with a few contributions by berkerpeksag. I came about it when searching for a pypa (python packaging authority, the place where third party python libraries are hosted) package related to kindle annotations.

I opened an issue and emre said that I could have the project as he wasn't really maintaining it anymore. This was the first time I was being given responsibility of someone else's project, and it made me quite excited. The transfer was smooth, and I was even given the project on pypa. So here I was, with someone else's code, doing what I wanted to do.

Moving forward

The first thing I did was to check if the code works with my Kindle as it is. It did, so I did not need any immediate modifications. With such lazy thoughts, the project stagnated for quite a few months. Recently, I took upon myself to update the documentation and to maintain it as much as I can.

Structure of project

There are two file, devices.py contains a class for each Kindle that has a different annotation format; and parser.py which extracts the annotations.

devices.py

Each device is an instance of an abstract class called KindleBase which contains the fields and properties used in each annotation.

class KindleBase(object):
    noises = None
    title = None
    author_in_title = None
    type_info = None
    time_format = None
    clip_type = None
    page = None
    location = None
    added_on = None
    content = None

Classes that inherit this base class define these attributes. The project has classes that handle annotations for-

  • Kindle 1-3 (KindleOldGen)
  • Kindle 4 (Kindle4)
  • Kindle Paperwhite (KindlePaperwhite)
  • Kindle Touch (KindleTouch)

As and when I come across any new form of Kindle (or annotation), I will create a new class for them and add it to the devices. This keeps the parser free to do its job, which is to parse stuff.

parser.py

ClippingLoader contains the parsing code in various functions. The module contains two functions for parsing. The first, load, takes data in the form of a string (read from a file, e.g.) and parses it. The second, load_from_file, takes a filepath and parses the contents of the file.

Parsing Logic

Seperating chunks

As explained in the previous post, the annotations are separated by a series of = characters. The first task is to create chunks of annotations that can then be handled individually. Python offers a handy mechanism to break text based on a pattern using the split method.

ENTRY_SEPERATOR = '=' * 10
chunks = data.split(ENTRY_SEPERATOR)

Parsing chunks

ClippingLoader._parse

Each chunk has at least 5 lines-

  1. seperator
  2. Title and Author
  3. Annotation type, location, timestamp
  4. blank line
  5. Text of annotation

If there are less than 5 lines, then it is not the kind of annotation we need to address or handle. To extract each item, we pass the entire chunk to the helper functions which use regex to extract relevant bits and then return it.

ToDo

  • auto-detect the Kindle model by matching all relevant regexes
  • turn the annotation parser into a kindle.js library that can be run in browsers
  • use the above script in heroku webapp for online clipping parsing