Skip to content

DateparserEntityExtractor

Note

If you want to use this component, be sure to either install flashtext manually or use our convenience installer.

python -m pip install "rasa_nlu_examples[dateparser] @ git+https://github.com/RasaHQ/rasa-nlu-examples.git"

What does it do?

This entity extractor uses the dateparser to extract entities that resemble dates. You can get a demo by running the code below.

from rasa.shared.nlu.training_data.message import Message
from rasa_nlu_examples.extractors.dateparser_extractor import DateparserEntityExtractor
from rich import print

msg = Message.build("hello tomorrow, goodbye yesterday",)
extractor = DateparserEntityExtractor({})
extractor.process(msg)
print(msg.as_dict_nlu())

This will parse the following information.

{
    'text': 'hello tomorrow, goodbye yesterday',
    'entities': [
        {
            'entity': 'DATETIME_REFERENCE',
            'start': 6,
            'end': 14,
            'value': 'tomorrow',
            'parsed_date': '2021-06-05 11:50:10.502082',
            'confidence': 1.0,
            'extractor': 'DateparserEntityExtractor'
        },
        {
            'entity': 'DATETIME_REFERENCE',
            'start': 24,
            'end': 33,
            'value': 'yesterday',
            'parsed_date': '2021-06-03 11:50:10.503160',
            'confidence': 1.0,
            'extractor': 'DateparserEntityExtractor'
        }
    ]
}

Note that we add an extra parsed_date key to the entity dictionary here. Another benefit of dateparser is that it also contains rules for Non-English languages. Here is a Dutch example.

{
    'text': 'ik wil een pizza bestellen voor morgen',
    'entities': [
        {
            'entity': 'DATETIME_REFERENCE',
            'start': 32,
            'end': 38,
            'value': 'morgen',
            'parsed_date': '2021-06-05 11:50:10.708588',
            'confidence': 1.0,
            'extractor': 'DateparserEntityExtractor'
        }
    ]
}

It's also possible to configure the DateparserEntityExtractor to prefer dates in the future or in the past. That way, if somebody talks about Thursday can be picked up as next Thursday, allowing us to still parse out a date.

"Future" Results

This ran on Friday the 4th of June, 2021.

{
    'text': 'i want a pizza thursday',
    'entities': [
        {
            'entity': 'DATETIME_REFERENCE',
            'start': 15,
            'end': 23,
            'value': 'thursday',
            'parsed_date': '2021-06-10 00:00:00',
            'confidence': 1.0,
            'extractor': 'DateparserEntityExtractor'
        }
    ]
}

"Past" Results

This ran on Friday the 4th of June, 2021.

{
    'text': 'i want to buy a pizza thursday',
    'entities': [
        {
            'entity': 'DATETIME_REFERENCE',
            'start': 22,
            'end': 30,
            'value': 'thursday',
            'parsed_date': '2021-06-03 00:00:00',
            'confidence': 1.0,
            'extractor': 'DateparserEntityExtractor'
        }
    ]
}

Configurable Variables

  • languages: pass a list of languages that you want the parser to focus on, can be None but this setting is likely to overfit on English assumptions
  • prefer_dates_from: can be either "future", "past" or None
  • relative_base: can be a datestring that represents a reference date, this is useful when a user mentions "tomorrow", default None points to todays date

Base Usage

The configuration below is an example of how you might useFlashTextEntityExtractor.

language: en

pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: rasa_nlu_examples.extractors.DateparserEntityExtractor
  languages: ["en", "nl", "es"]
  prefer_dates_from: "future"

Note that this entity extractor completely ignores the tokeniser. There might also be overlap with enities from other engines, like DIET and spaCy.

Relative Base Usage

language: en

pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: rasa_nlu_examples.extractors.DateparserEntityExtractor
  languages: ["en", "nl", "es"]
  prefer_dates_from: "future"
  relative_base: "2020-01-01"