Skip to content

ThaiTokenizer

The ThaiTokenizer is a Rasa compatible tokenizer for Thai, using PyThaiNLP under the hood.

In order to use the ThaiTokenizer the language must be set to th - no other languages are supported by this tokenizer.

Note

In order to use this tool you'll need to ensure the correct dependencies are installed.

pip install "rasa_nlu_examples[thai] @ https://github.com/RasaHQ/rasa-nlu-examples.git"

Configurable Variables

None

Base Usage

The ThaiTokenizer can be used in a Rasa configuration like below:

language: th
pipeline:
  - name: rasa_nlu_examples.tokenizers.ThaiTokenizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100

If there are any issues with this tokenizer, please let us know.