Rasa NLU Examples¶
This repository contains some example components meant for educational and inspirational purposes. These are components that we open source to encourage experimentation but these are components that are not officially supported. There will be some tests and some documentation but this is a community project, not something that is part of core Rasa.
The goal of these tools will be to be compatible with the most recent version of rasa only. You may need to point to an older release of the project if you want it to be compatible with an older version of Rasa.
To use these tools locally you need to install via git.
python -m pip install "rasa_nlu_examples @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"
Note that if you want to install optional dependencies as well that you'll need to run:
python -m pip install "rasa_nlu_examples[flashtext] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git" python -m pip install "rasa_nlu_examples[stanza] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git" python -m pip install "rasa_nlu_examples[thai] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git" python -m pip install "rasa_nlu_examples[fasttext] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git" python -m pip install "rasa_nlu_examples[all] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"
If you're using any models that depend on spaCy you'll need to install the Rasa dependencies for spaCy first.
python -m pip install rasa[spacy]
Tokenizers can split up the input text into tokens. Depending on the Tokenizer that you pick you can also choose to apply lemmatization. For languages that have rich grammatical features this might help reduce the size of all the possible tokens.
Dense featurizers attach dense numeric features per token as well as to the entire utterance. These features are picked up by intent classifiers and entity detectors later in the pipeline.
Intent classifiers are models that predict an intent from a given user message text. The default intent classifier in Rasa NLU is the DIET model which can be fairly computationally expensive, especially if you do not need to detect entities. We provide some examples of alternative intent classifiers here.
The components listed here won't effect the NLU pipeline but they might instead cause extra logs to appear to help with debugging.
Language models in spaCy are typically trained on Western news datasets. That means that the reported benchmarks might not apply to your use-case. For example; detecting names in texts from France is not the same thing as detecting names in Madagascar. Even thought French is used actively in both countries, the names of it's citizens might be so different that you cannot assume that the benchmarks apply universally.
To remedy this we've started collecting name lists. These can be used as a lookup table which can be picked up by Rasa's RegexEntityExtractor or our FlashTextEntityExtractor. It won't be 100% perfect but it should give a reasonable starting point.
You can find the namelists here. We currently offer namelists for the United States, Germany as well as common Arabic names. Feel free to submit PRs for more languages. We're also eager to receive feedback.
You can find the contribution guide here.