taipo keyboard
¶
> python -m taipo keyboard
Commands to simulate keyboard typos.
Options:
--help Show this message and exit.
Commands:
augment Applies typos to an NLU file and saves it to disk.
generate Generate train/validation data with/without misspelling.
These tools are able to simulate keyboard typos. It uses nlpaug
as a backend and supports keyboard layouts of 10 languages
(de
, en
, es
, fr
, he
, it
, nl
, pl
, th
, uk
). For
more details on the mapping see here.
taipo keyboard augment
¶
The augment command generates a single misspelled NLU file.
python -m taipo keyboard augment --help
Usage: keyboard augment [OPTIONS] FILE OUT
Applies typos to an NLU file and saves it to disk.
Arguments:
FILE The original nlu.yml file [required]
OUT Path to write misspelled file to [required]
Options:
--char-max INTEGER Max number of chars to change per line [default: 3]
--word-max INTEGER Max number of words to change per line [default: 3]
--lang TEXT Language for keyboard layout [default: en]
--seed-aug INTEGER The seed value to augment the data
--help Show this message and exit.
Example Usage¶
This example generates a new bad-spelling-nlu.yml
file from nlu.yml
.
python -m taipo keyboard augment data/nlu.yml data/bad-spelling-nlu.yml
This example generates does the same thing but assumes a Dutch keyboard layout.
python -m taipo keyboard augment data/nlu.yml data/bad-spelling-nlu.yml --lang nl
taipo keyboard generate
¶
The generate command takes a single NLU file and populates your data/test folders with relevant files to run benchmarks. Will also perform train/validation splitting.
> python -m taipo keyboard generate --help
Usage: keyboard generate [OPTIONS] FILE
Generate train/validation data with/without misspelling.
Will also generate files for the `/test` directory.
Arguments:
FILE The original nlu.yml file [required]
Options:
--seed-split INTEGER The seed value to split the data [default: 42]
--seed-aug INTEGER The seed value to augment the data
--test-size INTEGER Percentage of data to keep as test data [default: 33]
--prefix TEXT Prefix to add to all the files [default: misspelled]
--char-max INTEGER Max number of chars to change per line [default: 3]
--word-max INTEGER Max number of words to change per line [default: 3]
--lang TEXT Language for keyboard layout [default: en]
--help Show this message and exit.
Example Usage¶
This command will take the original nlu-orig.yml
file and will use it to populate
the /test
and /data
folders.
> python -m taipo keyboard generate data/nlu-orig.yml
The current disk state is now:
📂 rasa-project
┣━━ 📂 data
┃ ┣━━ 📄 nlu-train.yml ( 667 items)
┃ ┗━━ 📄 misspelled-nlu-train.yml ( 667 items)
┣━━ 📂 tests
┃ ┣━━ 📄 nlu-valid.yml ( 333 items)
┃ ┗━━ 📄 misspelled-nlu-valid.yml ( 333 items)
┗━━ 📄 nlu-orig.yml (1000 items)