Modules¶

Affine¶

class diaparser.modules.Biaffine(n_in, n_out=1, bias_x=True, bias_y=True)[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x, y)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

BertEmbedding¶

class diaparser.modules.BertEmbedding(model, n_layers, n_out, stride=5, pad_index=0, dropout=0, requires_grad=False, mask_token_id=0, token_dropout=0.0, mix_dropout=0.0, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=8)[source]¶

A module that directly utilizes the pretrained models in transformers to produce BERT representations. While mainly tailored to provide input preparation and post-processing for the BERT model, it is also compatible with other pretrained language models like XLNet, RoBERTa and ELECTRA, etc.

Parameters

model (str) – Path or name of the pretrained models registered in transformers, e.g., 'bert-base-cased'.
n_layers (int) – The number of layers from the model to use. If 0, uses all layers.
n_out (int) – The requested size of the embeddings. If 0, uses the size of the pretrained embedding model.
stride (int) – A sequence longer than the limited max length will be splitted into several small pieces with a window size of stride. Default: 5.
pad_index (int) – The index of the padding token in the BERT vocabulary. Default: 0.
dropout (float) – The dropout ratio of BERT layers. Default: 0. This value will be passed into the ScalarMix layer.
requires_grad (bool) – If True, the model parameters will be updated together with the downstream task. Default: False.

forward(subwords)[source]¶

Parameters: subwords (Tensor) – [batch_size, seq_len, fix_len].
Returns: BERT embeddings of shape [batch_size, seq_len, n_out].
Return type: Tensor

class diaparser.modules.ScalarMix(n_layers: int, dropout: float = 0.0)[source]¶

Computes a parameterised scalar mixture of \(N\) tensors, \(mixture = \gamma * \sum_{k}(s_k * tensor_k)\) where \(s = \mathrm{softmax}(w)\), with \(w\) and \(\gamma\) scalar parameters.

Parameters

n_layers (int) – The number of layers to be mixed, i.e., \(N\).
dropout (float) – The dropout ratio of the layer weights. If dropout > 0, then for each scalar weight, adjust its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively redistributes the dropped probability mass to all other weights. Default: 0.

forward(tensors)[source]¶

Parameters: tensors (list[Tensor]) – \(N\) tensors to be mixed.
Returns: The mixture of \(N\) tensors.

LSTM¶

class diaparser.modules.LSTM(input_size, hidden_size, num_layers=1, bidirectional=False, dropout=0)[source]¶

LSTM is an variant of the vanilla bidirectional LSTM adopted by Biaffine Parser with the only difference of the dropout strategy. It drops nodes in the LSTM layers (input and recurrent connections) and applies the same dropout mask at every recurrent timesteps.

APIs are roughly the same as LSTM except that we only allows PackedSequence as input.

References

Timothy Dozat and Christopher D. Manning. 2017. Deep Biaffine Attention for Neural Dependency Parsing.

Parameters

input_size (int) – The number of expected features in the input.
hidden_size (int) – The number of features in the hidden state h.
num_layers (int) – The number of recurrent layers. Default: 1.
bidirectional (bool) – If True, becomes a bidirectional LSTM. Default: False
dropout (float) – If non-zero, introduces a SharedDropout layer on the outputs of each LSTM layer except the last layer. Default: 0.

forward(sequence, hx=None)[source]¶

Parameters

sequence (PackedSequence) – A packed variable length sequence.
hx (Tensor, Tensor) – A tuple composed of two tensors h and c. h of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial hidden state for each element in the batch. c of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial cell state for each element in the batch. If hx is not provided, both h and c default to zero. Default: None.

Returns

The first is a packed variable length sequence. The second is a tuple of tensors h and c. h of shape [num_layers*num_directions, batch_size, hidden_size] holds the hidden state for t=seq_len. Like output, the layers can be separated using h.view(num_layers, 2, batch_size, hidden_size) and similarly for c. c of shape [num_layers*num_directions, batch_size, hidden_size] holds the cell state for t=seq_len.

Return type

PackedSequence, (Tensor, Tensor)

CharLSTM¶

class diaparser.modules.CharLSTM(n_chars, n_word_embed, n_out, pad_index=0)[source]¶

CharLSTM aims to generate character-level embeddings for tokens. It summerizes the information of characters in each token to an embedding using a LSTM layer.

Parameters

n_char (int) – The number of characters.
n_embed (int) – The size of each embedding vector as input to LSTM.
n_out (int) – The size of each output vector.
pad_index (int) – The index of the padding token in the vocabulary. Default: 0.

forward(x)[source]¶

Parameters: x (Tensor) – [batch_size, seq_len, fix_len]. Characters of all tokens. Each token holds no more than fix_len characters, and the excess is cut off directly.
Returns: The embeddings of shape [batch_size, seq_len, n_out] derived from the characters.
Return type: Tensor

Dropout¶

class diaparser.modules.IndependentDropout(p=0.5)[source]¶

For \(N\) tensors, they use different dropout masks respectively. When \(N-M\) of them are dropped, the remaining \(M\) ones are scaled by a factor of \(N/M\) to compensate, and when all of them are dropped together, zeros are returned.

Parameters: p (float) – The probability of an element to be zeroed. Default: 0.5.

Examples

>>> x, y = torch.ones(1, 3, 5), torch.ones(1, 3, 5)
>>> x, y = IndependentDropout()(x, y)
>>> x
tensor([[[1., 1., 1., 1., 1.],
         [0., 0., 0., 0., 0.],
         [2., 2., 2., 2., 2.]]])
>>> y
tensor([[[1., 1., 1., 1., 1.],
         [2., 2., 2., 2., 2.],
         [0., 0., 0., 0., 0.]]])

forward(*items)[source]¶

Parameters: items (list[Tensor]) – A list of tensors that have the same shape except the last dimension.
Returns: The returned tensors are of the same shape as items.

class diaparser.modules.SharedDropout(p=0.5, batch_first=True)[source]¶

SharedDropout differs from the vanilla dropout strategy in that the dropout mask is shared across one dimension.

Parameters

p (float) – The probability of an element to be zeroed. Default: 0.5.
batch_first (bool) – If True, the input and output tensors are provided as [batch_size, seq_len, *]. Default: True.

Examples

>>> x = torch.ones(1, 3, 5)
>>> nn.Dropout()(x)
tensor([[[0., 2., 2., 0., 0.],
         [2., 2., 0., 2., 2.],
         [2., 2., 2., 2., 0.]]])
>>> SharedDropout()(x)
tensor([[[2., 0., 2., 0., 2.],
         [2., 0., 2., 0., 2.],
         [2., 0., 2., 0., 2.]]])

forward(x)[source]¶

Parameters: x (Tensor) – A tensor of any shape.
Returns: The returned tensor is of the same shape as x.

MLP¶

class diaparser.modules.MLP(n_in, n_out, dropout=0)[source]¶

Applies a linear transformation together with LeakyReLU activation to the incoming tensor: \(y = \mathrm{LeakyReLU}(x A^T + b)\)

Parameters

n_in (Tensor) – The size of each input feature.
n_out (Tensor) – The size of each output feature.
dropout (float) – If non-zero, introduce a SharedDropout layer on the output with this dropout ratio. Default: 0.

forward(x)[source]¶

Parameters: x (Tensor) – The size of each input feature is n_in.
Returns: A tensor with the size of each output feature n_out.