Modules¶
Affine¶
- class diaparser.modules.Biaffine(n_in, n_out=1, bias_x=True, bias_y=True)[source]¶
- extra_repr()[source]¶
Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
- forward(x, y)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
BertEmbedding¶
- class diaparser.modules.BertEmbedding(model, n_layers, n_out, stride=5, pad_index=0, dropout=0, requires_grad=False, mask_token_id=0, token_dropout=0.0, mix_dropout=0.0, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=8)[source]¶
A module that directly utilizes the pretrained models in transformers to produce BERT representations. While mainly tailored to provide input preparation and post-processing for the BERT model, it is also compatible with other pretrained language models like XLNet, RoBERTa and ELECTRA, etc.
- Parameters
model (str) – Path or name of the pretrained models registered in transformers, e.g.,
'bert-base-cased'
.n_layers (int) – The number of layers from the model to use. If 0, uses all layers.
n_out (int) – The requested size of the embeddings. If 0, uses the size of the pretrained embedding model.
stride (int) – A sequence longer than the limited max length will be splitted into several small pieces with a window size of
stride
. Default: 5.pad_index (int) – The index of the padding token in the BERT vocabulary. Default: 0.
dropout (float) – The dropout ratio of BERT layers. Default: 0. This value will be passed into the
ScalarMix
layer.requires_grad (bool) – If
True
, the model parameters will be updated together with the downstream task. Default:False
.
- class diaparser.modules.ScalarMix(n_layers: int, dropout: float = 0.0)[source]¶
Computes a parameterised scalar mixture of \(N\) tensors, \(mixture = \gamma * \sum_{k}(s_k * tensor_k)\) where \(s = \mathrm{softmax}(w)\), with \(w\) and \(\gamma\) scalar parameters.
- Parameters
n_layers (int) – The number of layers to be mixed, i.e., \(N\).
dropout (float) – The dropout ratio of the layer weights. If dropout > 0, then for each scalar weight, adjust its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively redistributes the dropped probability mass to all other weights. Default: 0.
LSTM¶
- class diaparser.modules.LSTM(input_size, hidden_size, num_layers=1, bidirectional=False, dropout=0)[source]¶
LSTM is an variant of the vanilla bidirectional LSTM adopted by Biaffine Parser with the only difference of the dropout strategy. It drops nodes in the LSTM layers (input and recurrent connections) and applies the same dropout mask at every recurrent timesteps.
APIs are roughly the same as
LSTM
except that we only allowsPackedSequence
as input.References
Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.
- Parameters
input_size (int) – The number of expected features in the input.
hidden_size (int) – The number of features in the hidden state h.
num_layers (int) – The number of recurrent layers. Default: 1.
bidirectional (bool) – If
True
, becomes a bidirectional LSTM. Default:False
dropout (float) – If non-zero, introduces a
SharedDropout
layer on the outputs of each LSTM layer except the last layer. Default: 0.
- forward(sequence, hx=None)[source]¶
- Parameters
sequence (PackedSequence) – A packed variable length sequence.
hx (Tensor, Tensor) – A tuple composed of two tensors h and c. h of shape
[num_layers*num_directions, batch_size, hidden_size]
holds the initial hidden state for each element in the batch. c of shape[num_layers*num_directions, batch_size, hidden_size]
holds the initial cell state for each element in the batch. If hx is not provided, both h and c default to zero. Default:None
.
- Returns
The first is a packed variable length sequence. The second is a tuple of tensors h and c. h of shape
[num_layers*num_directions, batch_size, hidden_size]
holds the hidden state for t=seq_len. Like output, the layers can be separated usingh.view(num_layers, 2, batch_size, hidden_size)
and similarly for c. c of shape[num_layers*num_directions, batch_size, hidden_size]
holds the cell state for t=seq_len.- Return type
CharLSTM¶
Dropout¶
- class diaparser.modules.IndependentDropout(p=0.5)[source]¶
For \(N\) tensors, they use different dropout masks respectively. When \(N-M\) of them are dropped, the remaining \(M\) ones are scaled by a factor of \(N/M\) to compensate, and when all of them are dropped together, zeros are returned.
- Parameters
p (float) – The probability of an element to be zeroed. Default: 0.5.
Examples
>>> x, y = torch.ones(1, 3, 5), torch.ones(1, 3, 5) >>> x, y = IndependentDropout()(x, y) >>> x tensor([[[1., 1., 1., 1., 1.], [0., 0., 0., 0., 0.], [2., 2., 2., 2., 2.]]]) >>> y tensor([[[1., 1., 1., 1., 1.], [2., 2., 2., 2., 2.], [0., 0., 0., 0., 0.]]])
SharedDropout differs from the vanilla dropout strategy in that the dropout mask is shared across one dimension.
- Parameters
Examples
>>> x = torch.ones(1, 3, 5) >>> nn.Dropout()(x) tensor([[[0., 2., 2., 0., 0.], [2., 2., 0., 2., 2.], [2., 2., 2., 2., 0.]]]) >>> SharedDropout()(x) tensor([[[2., 0., 2., 0., 2.], [2., 0., 2., 0., 2.], [2., 0., 2., 0., 2.]]])
- Parameters
x (Tensor) – A tensor of any shape.
- Returns
The returned tensor is of the same shape as x.
MLP¶
- class diaparser.modules.MLP(n_in, n_out, dropout=0)[source]¶
Applies a linear transformation together with
LeakyReLU
activation to the incoming tensor: \(y = \mathrm{LeakyReLU}(x A^T + b)\)- Parameters
n_in (Tensor) – The size of each input feature.
n_out (Tensor) – The size of each output feature.
dropout (float) – If non-zero, introduce a
SharedDropout
layer on the output with this dropout ratio. Default: 0.