Models

BiaffineDependencyModel

class diaparser.models.BiaffineDependencyModel(n_words, n_feats, n_rels, feat='char', n_word_embed=100, n_feat_embed=100, n_char_embed=50, bert=None, n_bert_layers=4, bert_fine_tune=False, mix_dropout=0.0, token_dropout=0.0, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, lstm_dropout=0.33, n_mlp_arc=500, n_mlp_rel=100, mask_token_id=0.0, mlp_dropout=0.33, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=6, feat_pad_index=0, pad_index=0, unk_index=1, **kwargs)[source]

The implementation of Biaffine Dependency Parser.

References

Parameters
  • n_words (int) – The size of the word vocabulary.

  • n_feats (int) – The size of the feat vocabulary.

  • n_rels (int) – The number of labels in the treebank.

  • feat (str) – Specifies which type of additional feature to use: 'char' | 'bert' | 'tag'. 'char': Character-level representations extracted by CharLSTM. 'bert': BERT representations, other pretrained langugae models like XLNet are also feasible. 'tag': POS tag embeddings. Default: 'char'.

  • n_word_embed (int) – The size of word embeddings. Default: 100.

  • n_feat_embed (int) – The size of feature representations. Default: 100.

  • n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if feat='char'. Default: 50.

  • bert (str) – Specifies which kind of language model to use, e.g., 'bert-base-cased' and 'xlnet-base-cased'. This is required if feat='bert'. The full list can be found in transformers. Default: None.

  • n_bert_layers (int) – Specifies how many last layers to use. Required if feat='bert'. The final outputs would be the weight sum of the hidden states of these layers. Default: 4.

  • bert_fine_tune (bool) – Weather to fine tune the BERT model. Deafult: False.

  • mix_dropout (float) – The dropout ratio of BERT layers. Required if feat='bert'. Default: .0.

  • token_dropout (float) – The dropout ratio of tokens. Default: .0.

  • embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.

  • n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.

  • n_lstm_layers (int) – The number of LSTM layers. Default: 3.

  • lstm_dropout (float) – The dropout ratio of LSTM. Default: .33.

  • n_mlp_arc (int) – Arc MLP size. Default: 500.

  • n_mlp_rel (int) – Label MLP size. Default: 100.

  • mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.

  • use_hidden_states (bool) – Wethre to use hidden states rather than outputs from BERT. Default: True.

  • use_attentions (bool) – Wethre to use attention heads from BERT. Default: False.

  • attention_head (int) – Which attention head from BERT to use. Default: 0.

  • attention_layer (int) – Which attention layer from BERT to use; use all if 0. Default: 6.

  • feat_pad_index (int) – The index of the padding token in the feat vocabulary. Default: 0.

  • pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.

  • unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(words: Tensor, feats: Tensor) Tuple[Tensor, Tensor][source]
Parameters
  • words (LongTensor) – [batch_size, seq_len]. Word indices.

  • feats (LongTensor) – Feat indices. If feat is 'char' or 'bert', the size of feats should be [batch_size, seq_len, fix_len]. if 'tag', the size is [batch_size, seq_len].

Returns

The first tensor of shape [batch_size, seq_len, seq_len] holds scores of all possible arcs. The second of shape [batch_size, seq_len, seq_len, n_labels] holds scores of all possible labels on each arc.

Return type

Tensor, Tensor

loss(s_arc: Tensor, s_rel: Tensor, arcs: Tensor, rels: Tensor, mask: Tensor, partial: bool = False) Tensor[source]

Computes the arc and tag loss for a sequence given gold heads and tags.

Parameters
  • s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.

  • s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.

  • arcs (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard arcs.

  • rels (LongTensor) – [batch_size, seq_len]. The tensor of gold-standard labels.

  • mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.

  • partial (bool) – True denotes the trees are partially annotated. Default: False.

Returns

The training loss.

Return type

Tensor

decode(s_arc: Tensor, s_rel: Tensor, mask: Tensor, tree: bool = False, proj: bool = False) Tuple[Tensor, Tensor][source]
Parameters
  • s_arc (Tensor) – [batch_size, seq_len, seq_len]. Scores of all possible arcs.

  • s_rel (Tensor) – [batch_size, seq_len, seq_len, n_labels]. Scores of all possible labels on each arc.

  • mask (BoolTensor) – [batch_size, seq_len]. The mask for covering the unpadded tokens.

  • tree (bool) – If True, ensures to output well-formed trees. Default: False.

  • proj (bool) – If True, ensures to output projective trees. Default: False.

Returns

Predicted arcs and labels of shape [batch_size, seq_len].

Return type

Tensor, Tensor