If you want to use this component, be sure to either install flashtext manually or use our convenience installer.
python -m pip install "rasa_nlu_examples[flashtext] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"
This is similar to RegexEntityExtractor, but different in a few ways:
lookups, not regex patterns
FlashTextEntityExtractormatches using whitespace word boundaries. You cannot set it to match words regardless of boundaries.
FlashTextEntityExtractoris much faster than
RegexEntityExtractor. This is especially true for large lookup tables.
Also note that anything other than
[A-Za-z0-9_] is considered a word boundary. To add more non-word boundaries
use the parameter
- case_sensitive: whether to consider case when matching entities.
- non_word_boundaries: characters which shouldn't be considered word boundaries.
The configuration below is an example of how you might use
language: en pipeline: - name: WhitespaceTokenizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: rasa_nlu_examples.extractors.FlashTextEntityExtractor case_sensitive: True non_word_boundary: - "_" - "," - name: DIETClassifier epochs: 100
You must include lookup tables in your NLU data. This might look like:
nlu: - lookup: country examples: | - Afghanistan - Albania - ... - Zambia - Zimbabwe
In this example, anytime a user's utterance contains an exact match for a country from the lookup table above,
FlashTextEntityExtractor will extract this as an entity with type
country. You should include a few examples with
this entity in your intent data, like so:
- intent: inform_home_country examples: | - I am from [Afghanistan](country) - My family is from [Albania](country