Skip to main content

Hybrid NLU

The playground uses a hybrid engine composed of a grammar-based interpreter and a machine learning interpreter to handle natural language understanding.

Grammar-based Interpreter

The grammar-based interpreter has two advantages:

  1. It doesn't require a training step, and it can integrate new changes instantly (click Start in the menu bar). This enables quick prototyping using the playground.

  2. Short expressions are usually more challenging for ML engines to deal with since they contain very little information, e.g., "what about X". The grammar-based interpreter can deal better with these and handle ambiguities gracefully.

ML-based Interpreter

On the other hand, the ML model can recognize a broader range of expressions after the training phase. To train the ML model, click the Train button. The training time depends on the size of the training data. The ML engine is built using RASA.

Models Configuration

The ML models configuration can be specified in the nlu/config.yml or nlu_config.yml files.

The structure of the NLU config file is the following:

ROOT_NAME_1:
- name: CONFIG_NAME
threshold: 0.8
rasa_config:
...
- ...
ROOT_NAME_2:
- ...

The default configuration is:

main:
# The default configuration using character embeddings
- name: default
threshold: 0.8
rasa_config:
language: "en"
pipeline:
- name: libraries.rasaextensions.custom_features.CustomFeaturesDecoder
- name: WhitespaceTokenizer
- name: libraries.rasaextensions.custom_features.CustomFeaturesFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
types:
# The default configuration using character embeddings
- name: types
threshold: 0.8
rasa_config:
language: "en"
pipeline:
- name: libraries.rasaextensions.custom_features.CustomFeaturesDecoder
- name: WhitespaceTokenizer
- name: libraries.rasaextensions.custom_features.CustomFeaturesFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100

Model Roots

The roots, which are the top level keys the config file, determine when the list of models contained within them will be used. There are two categories of symbols that need to be matched by the NLU model: intents and types.

Valid roots for intent matching are:

  1. main: A single model will be trained with all the defined intents.
  2. main_hierarchical: A model will be trained per each level i.e. if spaces are used in the intent names, they determine the levels.
  3. seq2seq: A single model will be trained for all intents, but using a seq2seq approach i.e. the intent names will be split into tokens and each token is a class in the model.

Only one of the above roots must be specified in a config file.

There is only one valid root for types matching, types, and it will train one model for all the type symbols.

Advanced usage

Custom Features

In order to use the custom features in any model configuration, we must include the CustomFeaturesDecoder and the CustomFeaturesFeaturizer components in the RASA NLU pipeline:

main:
- name: CONFIG_NAME
threshold: 0.8
rasa_config:
language: "en"
pipeline:
- name: libraries.rasaextensions.custom_features.CustomFeaturesDecoder
- name: WhitespaceTokenizer
- name: libraries.rasaextensions.custom_features.CustomFeaturesFeaturizer
...

Word Embeddings

In order to use pre-trained word embeddings like BERT embeddings, the following configuration we must use the following configuration:

main:
- name: bert
threshold: 0.8
rasa_config:
language: "en"
pipeline:
- name: libraries.rasaextensions.custom_features.CustomFeaturesDecoder
- name: WhitespaceTokenizer
- name: libraries.rasaextensions.custom_features.CustomFeaturesFeaturizer
- name: libraries.customrasa.featurizers.RemoteLanguageModelFeaturizer
model_name: "bert"
model_weights: "rasa/LaBSE"
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100

Supported embeddings include: bert, gpt, gpt2, xlnet, distilbert, roberta.