The ThaiTokenizer is a Rasa compatible tokenizer for Thai, using PyThaiNLP under the hood.

In order to use the ThaiTokenizer the language must be set to th - no other languages are supported by this tokenizer.


In order to use this tool you'll need to ensure the correct dependencies are installed.

pip install "rasa_nlu_examples[thai] @"

Configurable Variables


Base Usage

The ThaiTokenizer can be used in a Rasa configuration like below:

language: th
  - name: rasa_nlu_examples.tokenizers.ThaiTokenizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100

If there are any issues with this tokenizer, please let us know.