If you want to use this component, be sure to either install flashtext manually or use our convenience installer.
python -m pip install "rasa_nlu_examples[flashtext] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"
This entity extractor uses the flashtext library to extract entities.
This is similar to RegexEntityExtractor, but different in a few ways:
FlashTextEntityExtractoruses token-matching to find entities, not regex patterns
FlashTextEntityExtractormatches using whitespace word boundaries. You cannot set it to match words regardless of boundaries.
FlashTextEntityExtractoris much faster than
RegexEntityExtractor. This is especially true for large lookup tables.
Also note that anything other than
[A-Za-z0-9_] is considered a word boundary. To add more non-word boundaries
use the parameter
- path: the path to the lookup text file
- entity_name: the name of the entity to attach to the message
- case_sensitive: whether to consider case when matching entities.
- non_word_boundaries: characters which shouldn't be considered word boundaries.
The configuration below is an example of how you might use
language: en pipeline: - name: WhitespaceTokenizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: rasa_nlu_examples.extractors.FlashTextEntityExtractor case_sensitive: False path: path/to/file.txt entity_name: country - name: DIETClassifier epochs: 100
You must include a plain text file that contains the tokens to detect. Such a file might look like:
Afghanistan Albania ... Zambia Zimbabwe
In this example, anytime a user's utterance contains an exact match for a country,
FlashTextEntityExtractor will extract this as an entity with type
country. You should include a few examples with
this entity in your intent data, like so:
- intent: inform_home_country examples: | - I am from [Afghanistan](country) - My family is from [Albania](country)