Models¶
BiaffineDependencyModel¶
- class diaparser.models.BiaffineDependencyModel(n_words, n_feats, n_rels, feat='char', n_word_embed=100, n_feat_embed=100, n_char_embed=50, bert=None, n_bert_layers=4, bert_fine_tune=False, mix_dropout=0.0, token_dropout=0.0, embed_dropout=0.33, n_lstm_hidden=400, n_lstm_layers=3, lstm_dropout=0.33, n_mlp_arc=500, n_mlp_rel=100, mask_token_id=0.0, mlp_dropout=0.33, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=6, feat_pad_index=0, pad_index=0, unk_index=1, **kwargs)[source]¶
The implementation of Biaffine Dependency Parser.
References
Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.
- Parameters
n_words (int) – The size of the word vocabulary.
n_feats (int) – The size of the feat vocabulary.
n_rels (int) – The number of labels in the treebank.
feat (str) – Specifies which type of additional feature to use:
'char'
|'bert'
|'tag'
.'char'
: Character-level representations extracted by CharLSTM.'bert'
: BERT representations, other pretrained langugae models like XLNet are also feasible.'tag'
: POS tag embeddings. Default:'char'
.n_word_embed (int) – The size of word embeddings. Default: 100.
n_feat_embed (int) – The size of feature representations. Default: 100.
n_char_embed (int) – The size of character embeddings serving as inputs of CharLSTM, required if
feat='char'
. Default: 50.bert (str) – Specifies which kind of language model to use, e.g.,
'bert-base-cased'
and'xlnet-base-cased'
. This is required iffeat='bert'
. The full list can be found in transformers. Default:None
.n_bert_layers (int) – Specifies how many last layers to use. Required if
feat='bert'
. The final outputs would be the weight sum of the hidden states of these layers. Default: 4.bert_fine_tune (bool) – Weather to fine tune the BERT model. Deafult: False.
mix_dropout (float) – The dropout ratio of BERT layers. Required if
feat='bert'
. Default: .0.token_dropout (float) – The dropout ratio of tokens. Default: .0.
embed_dropout (float) – The dropout ratio of input embeddings. Default: .33.
n_lstm_hidden (int) – The size of LSTM hidden states. Default: 400.
n_lstm_layers (int) – The number of LSTM layers. Default: 3.
lstm_dropout (float) – The dropout ratio of LSTM. Default: .33.
n_mlp_arc (int) – Arc MLP size. Default: 500.
n_mlp_rel (int) – Label MLP size. Default: 100.
mlp_dropout (float) – The dropout ratio of MLP layers. Default: .33.
use_hidden_states (bool) – Wethre to use hidden states rather than outputs from BERT. Default: True.
use_attentions (bool) – Wethre to use attention heads from BERT. Default: False.
attention_head (int) – Which attention head from BERT to use. Default: 0.
attention_layer (int) – Which attention layer from BERT to use; use all if 0. Default: 6.
feat_pad_index (int) – The index of the padding token in the feat vocabulary. Default: 0.
pad_index (int) – The index of the padding token in the word vocabulary. Default: 0.
unk_index (int) – The index of the unknown token in the word vocabulary. Default: 1.
- extra_repr()[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(words: Tensor, feats: Tensor) Tuple[Tensor, Tensor] [source]¶
- Parameters
words (LongTensor) –
[batch_size, seq_len]
. Word indices.feats (LongTensor) – Feat indices. If feat is
'char'
or'bert'
, the size of feats should be[batch_size, seq_len, fix_len]
. if'tag'
, the size is[batch_size, seq_len]
.
- Returns
The first tensor of shape
[batch_size, seq_len, seq_len]
holds scores of all possible arcs. The second of shape[batch_size, seq_len, seq_len, n_labels]
holds scores of all possible labels on each arc.- Return type
- loss(s_arc: Tensor, s_rel: Tensor, arcs: Tensor, rels: Tensor, mask: Tensor, partial: bool = False) Tensor [source]¶
Computes the arc and tag loss for a sequence given gold heads and tags.
- Parameters
s_arc (Tensor) –
[batch_size, seq_len, seq_len]
. Scores of all possible arcs.s_rel (Tensor) –
[batch_size, seq_len, seq_len, n_labels]
. Scores of all possible labels on each arc.arcs (LongTensor) –
[batch_size, seq_len]
. The tensor of gold-standard arcs.rels (LongTensor) –
[batch_size, seq_len]
. The tensor of gold-standard labels.mask (BoolTensor) –
[batch_size, seq_len]
. The mask for covering the unpadded tokens.partial (bool) –
True
denotes the trees are partially annotated. Default:False
.
- Returns
The training loss.
- Return type
- decode(s_arc: Tensor, s_rel: Tensor, mask: Tensor, tree: bool = False, proj: bool = False) Tuple[Tensor, Tensor] [source]¶
- Parameters
s_arc (Tensor) –
[batch_size, seq_len, seq_len]
. Scores of all possible arcs.s_rel (Tensor) –
[batch_size, seq_len, seq_len, n_labels]
. Scores of all possible labels on each arc.mask (BoolTensor) –
[batch_size, seq_len]
. The mask for covering the unpadded tokens.tree (bool) – If
True
, ensures to output well-formed trees. Default:False
.proj (bool) – If
True
, ensures to output projective trees. Default:False
.
- Returns
Predicted arcs and labels of shape
[batch_size, seq_len]
.- Return type