FlashTextEntityExtractor¶
Note
If you want to use this component, be sure to either install flashtext manually or use our convenience installer.
python -m pip install "rasa_nlu_examples[flashtext] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"
This entity extractor uses the flashtext library to extract entities.
This is similar to RegexEntityExtractor, but different in a few ways:
FlashTextEntityExtractor
uses token-matching to find entities, not regex patternsFlashTextEntityExtractor
matches using whitespace word boundaries. You cannot set it to match words regardless of boundaries.FlashTextEntityExtractor
is much faster thanRegexEntityExtractor
. This is especially true for large lookup tables.
Also note that anything other than [A-Za-z0-9_]
is considered a word boundary. To add more non-word boundaries
use the parameter non_word_boundaries
Configurable Variables¶
- path: the path to the lookup text file
- entity_name: the name of the entity to attach to the message
- case_sensitive: whether to consider case when matching entities.
False
by default. - non_word_boundaries: characters which shouldn't be considered word boundaries.
Base Usage¶
The configuration below is an example of how you might useFlashTextEntityExtractor
.
language: en
pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: rasa_nlu_examples.extractors.FlashTextEntityExtractor
case_sensitive: False
path: path/to/file.txt
entity_name: country
- name: DIETClassifier
epochs: 100
You must include a plain text file that contains the tokens to detect. Such a file might look like:
Afghanistan
Albania
...
Zambia
Zimbabwe
In this example, anytime a user's utterance contains an exact match for a country,
FlashTextEntityExtractor
will extract this as an entity with type country
. You should include a few examples with
this entity in your intent data, like so:
- intent: inform_home_country
examples: |
- I am from [Afghanistan](country)
- My family is from [Albania](country)