Skip to content

taipo translit

> python -m taipo translit

  Commands to generate transliterations.

Options:
  --help  Show this message and exit.

Commands:
  augment   Applies translitertion to an NLU file and saves it to disk.
  generate  Generate train/validation data with/without translitertion.

These tools are able to transliterate to and from a latin alphabet. It uses transliterate as a backend and supports (ru, mn, sr, bg, ka, uk, el, mk, l1, hy).

taipo translit augment

Transliterates a single NLU file to and from a latin alphabet.

> python -m taipo translit augment --help

  Applies translitertion to an NLU file and saves it to disk.

Arguments:
  FILE  The original nlu.yml file  [required]
  OUT   Path to write misspelled file to  [required]

Options:
  --target TEXT  Alphabet to map to.  [default: latin]
  --source TEXT  Alphabet to map from.  [default: latin]
  --lang TEXT    Language for keyboard layout  [default: en]
  --help         Show this message and exit.

Example Usage

This example generates a new greek-nlu.yml file from nlu.yml.

python -m taipo keyboard augment data/nlu.yml data/greek-nlu.yml --target el

This example generates works the other way around. It assumes a Greek alphabet as a starting point and transliterates it to the latin alphabet.

python -m taipo keyboard augment data/greek-nlu.yml data/latin-nlu.yml --source el

taipo translit generate

The generate command takes a single NLU file and populates your data/test folders with relevant files to run benchmarks. Will also perform train/validation splitting.

> python -m taipo translit generate --help

  Generate train/validation data with/without translitertion.

  Will also generate files for the `/test` directory.

Arguments:
  FILE  The original nlu.yml file  [required]

Options:
  --seed INTEGER       The seed value to split the data  [default: 42]
  --test-size INTEGER  Percentage of data to keep as test data  [default: 33]
  --prefix TEXT        Prefix to add to all the files  [default: translit]
  --target TEXT        Alphabet to map to.  [default: latin]
  --source TEXT        Alphabet to map from.  [default: latin]
  --lang TEXT          Language for keyboard layout  [default: en]
  --help               Show this message and exit.

Example Usage

This command will take the original nlu-orig.yml file and will use it to populate the /test and /data folders. In this case it will generate characters from the Greek alphabet.

> python -m taipo translit generate data/nlu-orig.yml --prefix greek --target el

The following files will now be on disk.

📂 rasa-project
┣━━ 📂 data
┃   ┣━━ 📄 nlu-train.yml                ( 667 items)
┃   ┗━━ 📄 greek-nlu-train.yml          ( 667 items)
┣━━ 📂 tests
┃   ┣━━ 📄 nlu-valid.yml                ( 333 items)
┃   ┗━━ 📄 greek-nlu-valid.yml          ( 333 items)
┗━━ 📄 nlu-orig.yml                     (1000 items)