Models¶

BiaffineDependencyModel¶

class diaparser.models.BiaffineDependencyModel(n_words, n_feats, n_rels, feat='char', n_word_embed=100, n_feat_embed=100, n_char_embed=50, bert=None, n_bert_layers=4, bert_fine_tune=False, mix_dropout=0.0, token_dropout=0.0, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, lstm_dropout=0.33, n_mlp_arc=500, n_mlp_rel=100, mask_token_id=0.0, mlp_dropout=0.33, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=6, feat_pad_index=0, pad_index=0, unk_index=1, **kwargs)[source]¶

The implementation of Biaffine Dependency Parser.

References

Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.

Parameters

n_words (int) – The size of the word vocabulary.
n_feats (int) – The size of the feat vocabulary.
n_rels (int) – The number of labels in the treebank.
feat (str) – Specifies which type of additional feature to use: 'char' | 'bert' | 'tag'. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained langugae models like XLNet are also feasible. 'tag': POS tag embeddings. Default: 'char'.
n_word_embed (int) – The size of word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if feat='char'. Default: 50.
bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased' and 'xlnet-base-cased'. This is required if feat='bert'. The full list can be found in transformers. Default: None.
n_bert_layers (int) – Specifies how many last layers to use. Required if feat='bert'. The final outputs would be the weight sum of the hidden states of these layers. Default: 4.
bert_fine_tune (bool) – Weather to fine tune the BERT model. Deafult: False.
mix_dropout (float) – The dropout ratio of BERT layers. Required if feat='bert'. Default: .0.
token_dropout (float) – The dropout ratio of tokens. Default: .0.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
lstm_dropout (float) – The dropout ratio of LSTM. Default: .33.
n_mlp_arc (int) – Arc MLP size. Default: 500.
n_mlp_rel (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
use_hidden_states (bool) – Wethre to use hidden states rather than outputs from BERT. Default: True.
use_attentions (bool) – Wethre to use attention heads from BERT. Default: False.
attention_head (int) – Which attention head from BERT to use. Default: 0.
attention_layer (int) – Which attention layer from BERT to use; use all if 0. Default: 6.
feat_pad_index (int) – The index of the padding token in the feat vocabulary. Default: 0.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(words: Tensor, feats: Tensor) → Tuple[Tensor, Tensor][source]¶

Parameters

words (LongTensor) – [batch_size, seq_len]. Word indices.
feats (LongTensor) – Feat indices. If feat is 'char' or 'bert', the size of feats should be [batch_size, seq_len, fix_len]. if 'tag', the size is [batch_size, seq_len].

Returns

The first tensor of shape [batch_size, seq_len, seq_len] holds scores of all possible arcs. The second of shape [batch_size, seq_len, seq_len, n_labels] holds scores of all possible labels on each arc.

Return type

Tensor, Tensor

loss(s_arc: Tensor, s_rel: Tensor, arcs: Tensor, rels: Tensor, mask: Tensor, partial: bool = False) → Tensor[source]¶

Computes the arc and tag loss for a sequence given gold heads and tags.

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.
rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
partial (bool) – True denotes the trees are partially annotated. Default: False.

Returns

The training loss.

Return type

Tensor

decode(s_arc: Tensor, s_rel: Tensor, mask: Tensor, tree: bool = False, proj: bool = False) → Tuple[Tensor, Tensor][source]¶

Parameters

s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.
s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.
mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.
tree (bool) – If True, ensures to output well-formed trees. Default: False.
proj (bool) – If True, ensures to output projective trees. Default: False.

Returns

Predicted arcs and labels of shape [batch_size, seq_len].

Return type

Tensor, Tensor