Parsers¶
Parser¶
- class diaparser.parsers.Parser(args, model, transform)[source]¶
- train(train, dev, test, buckets=32, batch_size=5000, lr=0.002, mu=0.9, nu=0.9, epsilon=1e-12, clip=5.0, decay=0.75, decay_steps=5000, epochs=5000, patience=100, verbose=True, **kwargs)[source]¶
- Parameters
lr (float) – learnin rate of adam optimizer. Default: 2e-3.
mu (float) – beta1 of adam optimizer. Default: .9.
nu (float) – beta2 of adam optimizer. Default: .9.
epsilon (float) – epsilon of adam optimizer. Default: 1e-12.
buckets (int) – number of buckets. Default: 32.
epochs (int) – number of epochs to train: Default: 5000.
patience (int) – early stop after these many epochs. Default: 100.
- predict(data, pred=None, buckets=8, batch_size=5000, prob=False, **kwargs)[source]¶
Parses the data and produces a parse tree for each sentence. :param data: input to be parsed: either
a str, that will be tokenized first with the tokenizer for the parser language
a path to a file to be read, either in CoNLL-U format or in plain text if :param text: is supplied.
a list of lists of tokens
- Parameters
text (str) – optional, specifies that the input data is in plain text in the specified language code.
pred (str or file) – a path to a file where to write the parsed input in CoNLL-U fprmat.
bucket (int) – the number of buckets used to group sentences to parallelize matrix computations.
batch_size (int) – group sentences in batches.
prob (bool) – whther to return also probabilities for each arc.
- Returns
a Dataset containing the parsed sentence trees.
- classmethod load(name_or_path='', lang='en', cache_dir='/home/docs/.cache/diaparser', **kwargs)[source]¶
Loads a parser from a pretrained model.
- Parameters
name_or_path (str) –
a string with the shortcut name of a pretrained parser listed in
resource.json
to load from cache or download, e.g.,'en_ptb.electra-base'
.a path to a directory containing a pre-trained parser, e.g., ./<path>/model.
lang (str) – A language code, used in alternative to
name_or_path
to load the default model for the given language.cache_dir (str) – Directory where to cache models. The default value is ~/.cache/diaparser.
kwargs (dict) – A dict holding the unconsumed arguments that can be used to update the configurations and initiate the model.
Examples
>>> parser = Parser.load('en_ewt.electra-base') >>> parser = Parser.load(lang='en') >>> parser = Parser.load('./ptb.biaffine.dependency.char')
BiaffineDependencyParser¶
- class diaparser.parsers.BiaffineDependencyParser(*args, **kwargs)[source]¶
The implementation of Biaffine Dependency Parser.
References
Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.
- MODEL¶
alias of
BiaffineDependencyModel
- train(train, dev, test, buckets=32, batch_size=5000, punct=False, tree=False, proj=False, verbose=True, **kwargs)[source]¶
- Parameters
train/dev/test (list[list] or str) – Filenames of the train/dev/test datasets.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False
, ignores the punctuations during evaluation. Default:False
.tree (bool) – If
True
, ensures to output well-formed trees. Default:False
.proj (bool) – If
True
, ensures to output projective trees. Default:False
.partial (bool) –
True
denotes the trees are partially annotated. Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.kwargs (dict) – A dict holding the unconsumed arguments that can be used to update the configurations for training.
- evaluate(data, buckets=8, batch_size=5000, punct=False, tree=True, proj=False, partial=False, verbose=True, **kwargs)[source]¶
- Parameters
data (str) – The data for evaluation, both list of instances and filename are allowed.
buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
punct (bool) – If
False
, ignores the punctuations during evaluation. Default:False
.tree (bool) – If
True
, ensures to output well-formed trees. Default:False
.proj (bool) – If
True
, ensures to output projective trees. Default:False
.partial (bool) –
True
denotes the trees are partially annotated. Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.kwargs (dict) – A dict holding the unconsumed arguments that can be used to update the configurations for evaluation.
- Returns
The loss scalar and evaluation results.
- predict(data, pred=None, buckets=8, batch_size=5000, prob=False, tree=True, proj=False, verbose=False, **kwargs)[source]¶
- Parameters
data (list[list] or str) – The data for prediction, both a list of instances and filename are allowed.
pred (str) – If specified, the predicted results will be saved to the file. Default:
None
.buckets (int) – The number of buckets that sentences are assigned to. Default: 32.
batch_size (int) – The number of tokens in each batch. Default: 5000.
prob (bool) – If
True
, outputs the probabilities. Default:False
.tree (bool) – If
True
, ensures to output well-formed trees. Default:False
.proj (bool) – If
True
, ensures to output projective trees. Default:False
.verbose (bool) – If
True
, increases the output verbosity. Default:True
.kwargs (dict) – A dict holding the unconsumed arguments that can be used to update the configurations for prediction.
- Returns
A
Dataset
object that stores the predicted results.
- classmethod build(path, min_freq=2, fix_len=20, **kwargs)[source]¶
Build a brand-new Parser, including initialization of all data fields and model parameters.
- Parameters
path (str) – The path of the model to be saved.
min_freq (str) – The minimum frequency needed to include a token in the vocabulary. Default: 2.
fix_len (int) – The max length of all subword pieces. The excess part of each piece will be truncated. Required if using CharLSTM/BERT. Default: 20.
kwargs (dict) – A dict holding the unconsumed arguments.