tokenizer

Tokenizer

class tokenizer.Tokenizer(lang, dir='/home/docs/.cache/diaparser', verbose=True)[source]

Interface to Stanza tokenizers. Args. lang (str): conventional language identifier. dir (str): directory for caching models. verbose (Bool): print download progress.

format(sentences)[source]

Convert sentences to CoNLL format.

reader()[source]

Reading function that returns a generator of CoNLL-U sentences.