In line with the Unix philosophy I wanted to build a small system which is as user-friendly as possible (even though I might be the only user). I chose a system which would help me with a personal goal namely learning Russian.
Recently I scraped example phrases for the most common Russian words. Based on Zipf’s law it makes sense to really focus on the most common words since the frequency of any word is inversely proportional to its rank in the frequency table. Just learning vocabulary is kind of sterile so I spice it up by learning some basic phrases.
Learning those phrases involves studying them which means:
Since each of the above has to be done a lot of times it makes sense to create a small CLI to interact with the phrases. For example the first phrase of our list is:
Мальчик и девочка играют. A boy and a girl are playing.
We have some issues with the word девочка
often confusing it with дедушка
which means grandfather. Wouldn’t it be nice to be able to type:
similar девочка дедушка
translation дедушка grandfather
Let’s say the next time we go over the phrases we want to train all the similar words:
similar
Which gives a list of similar words. These are just some useful ideas for commands.
Based on the above I want the system to have the following:
Setup as explained in cli-russian README repository.
There are plenty of things we can do with the phrases but the most important thing is to be able to:
You quickly realize annotating a word isn’t as straightforward as it seems. For each type of word (noun, verb, adjective, …) the information we want to add is different. For all of them we’ll want to add the Russian word but for example gender does make sense for nouns but doesn’t for verbs.
After mulling it over I ended up with multiple annotation commands. They’re grouped by type of word (noun, verb, …):
annotate conjunction <russian> <english>
annotate preposition <russian> <english> <related case>
annotate particle <russian> <english>
annotate pronoun <russian> <root> <english> <case>
annotate verb <russian> <root> <english> <time> <person> <number>
annotate noun <russian> <root> <english> <case> <gender> <number>
annotate adjective <russian> <root> <english> <case> <gender> <number>
By article words are meant which don’t change form (like и),
I used Click to design the command line interface.
Having more or less defined where we want to end up with an initial prototype we create a first implementation. If we want to implement our commands we need:
Both are explained excellently in the Click quickstart. Click is entirely based on decorators. There’s a lot to it but in essence decorators are just functions wrapping other functions to change how these functions behave.
So this is how things look like for our first annotate-noun
command:
@click.command()
@click.argument('russian')
@click.argument('root')
@click.argument('english')
@click.argument('case')
@click.argument('gender')
@click.argument('number')
def annotate_noun(russian, root, english, case, gender, number):
click.echo('Russian noun is %s' % russian)
click.echo('Based on root noun %s' % root)
click.echo('English translation of this noun is %s' % english)
click.echo('Case is %s' % case)
click.echo('Gender is %s' % gender)
click.echo('Number is %s' % number)
@click.command
makes the annotate-noun
function a command. @click.argument
helps us pass arguments to annotate-noun
.
If we call annotate-noun
with python annotate.py человеки человек person nom m sg
we get:
Russian noun is человеки
Based on root noun человек
English translation of this noun is person
Case is nom
Gender is m
Number is sg
But say we forgot we had to specify the gender and call with python annotate.py человеки человек person nom pl
. We get:
Error: Missing argument "NUMBER".
Because all arguments are positional the error is complaining about the number while it should have complained about the gender. That’s why, before moving on to the implementation itself, we’ll make our command a bit more robust to user errors.
We can improve the usability by:
Arguments and options are very much alike. However, arguments are positional and so they have to be specified by definition. Well, options are optional, no surprises there. Since I noticed prompts are only available for options there’s no other choice then to use options instead of arguments.
Validation is done with choice options. You pass a list of valid values and if the input by user doesn’t match any of these options an error is displayed. With some hassle it’s possible to have custom error messages but I won’t go into that in this post.
The help page is built from:
click.option
By specifying --help
e.g. annotate_noun --help
you get access to the help page.
This is the result of making the code more user-friendly:
@click.command()
@click.option('--russian', prompt='Give the original Russian noun', help='the declined form of the noun')
@click.option('--root', prompt='Give the root', help='nominative singular form of the noun')
@click.option('--english', prompt='Give the English translation of the noun', help='English translation of the noun')
@click.option('--case', type=click.Choice(['nom', 'gen', 'dat', 'acc', 'ins', 'pre']), prompt='Specify the case of the noun', help="case can be either nom(iniative), gen(initive), dat(ive), acc(usative), ins(trumental) or pre(positional) ")
@click.option('--gender', type=click.Choice(['m', 'f', 'n']), prompt='Specify the gender of the noun', help="gender can be either m(asculine), f(eminine) or n(euter)")
@click.option('--number', type=click.Choice(['sg', 'pl']), prompt="Specify the number of the noun", help="number can be either sg (singular) or pl (plural)")
def annotate_noun(russian, root, english, case, gender, number):
"""Adds annotation for the Russian noun."""
...
click.Choice
takes care of the validation, the """..."""
docstring is used in the help and the prompt
argument triggers a prompt.
Now we have everything setup to get a clear interface, we can focus on the persistence.
This part is as simple as:
with open('nouns.tsv', 'a') as out: # tsvs use tabs as separators
tsv_writer = csv.writer(out, delimiter='\t')
# write noun as row
tsv_writer.writerow([russian, root, english, case, gender, number])
with
is a Python function helping with opening and closing the file so we don’t have to do it ourselves. The writer object which has a writerow
function.
We can now type annonate_noun
in the command line and are prompted to fill in the required fields for annotating a noun:
$ annotate_noun
Give the original Russian noun: человека
Give the root: человек
Give the English translation of the noun: person
Specify the case of the noun (nom, gen, dat, acc, ins, pre): gen
Specify the gender of the noun (m, f, n): m
Specify the number of the noun (sg, pl): sg
Annotation has been added to the nouns.tsv file.
If we want help we can get it as well:
$ annotate_noun --help
Usage: annotate_noun [OPTIONS]
Adds annotation for the Russian noun.
Options:
--russian TEXT the declined form of the noun
--root TEXT nominative singular form of the noun
--english TEXT English translation of the noun
--case [nom|gen|dat|acc|ins|pre]
case can be either nom(iniative),
gen(initive), dat(ive), acc(usative),
ins(trumental) or pre(positional)
--gender [m|f|n] gender can be either m(asculine), f(eminine)
or n(euter)
--number [sg|pl] number can be either sg (singular) or pl
(plural)
--help Show this message and exit.
This was just an example with a single command. However, nothing stops us from scaling it up with all the commands designed. The next part will be all about that.