nlu.text_processing¶
This module is used for preprocessing user inputs before further analysis.
The user utterance is broken into tokens which contain additional information about the it.
Module Contents¶
Classes¶
- class nlu.text_processing.Span(text: str, start: int, end: int | None = None, lemma: str | None = None)¶
- class nlu.text_processing.Token(text: str, start: int, end: int | None = None, lemma: str | None = None, is_stop: bool | None = False)¶
Bases:
Span
- class nlu.text_processing.Tokenizer(additional_stop_words: List[str] = None)¶
- process_text(text: str) List[Token]¶
Processes given text.
The text is split into tokens which can be mapped back to the original text.
- Parameters:
text – A piece of text.
- Returns:
List of tokens.
- remove_punctuation(text: str) str¶
Defines patterns of punctuation marks to remove in the utterance.
- Parameters:
text – A piece of text.
- Returns:
A piece of text without punctuation.
- lemmatize_text(text: str) str¶
Returns string lemma.
- Parameters:
text – A piece of text.
- Returns:
Lemmatized piece of text.