Modules

Affine

class diaparser.modules.Biaffine(n_in, n_out=1, bias_x=True, bias_y=True)[source]
extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x, y)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

BertEmbedding

class diaparser.modules.BertEmbedding(model, n_layers, n_out, stride=5, pad_index=0, dropout=0, requires_grad=False, mask_token_id=0, token_dropout=0.0, mix_dropout=0.0, use_hidden_states=True, use_attentions=False, attention_head=0, attention_layer=8)[source]

A module that directly utilizes the pretrained models in transformers to produce BERT representations. While mainly tailored to provide input preparation and post-processing for the BERT model, it is also compatible with other pretrained language models like XLNet, RoBERTa and ELECTRA, etc.

Parameters
  • model (str) – Path or name of the pretrained models registered in transformers, e.g., 'bert-base-cased'.

  • n_layers (int) – The number of layers from the model to use. If 0, uses all layers.

  • n_out (int) – The requested size of the embeddings. If 0, uses the size of the pretrained embedding model.

  • stride (int) – A sequence longer than the limited max length will be splitted into several small pieces with a window size of stride. Default: 5.

  • pad_index (int) – The index of the padding token in the BERT vocabulary. Default: 0.

  • dropout (float) – The dropout ratio of BERT layers. Default: 0. This value will be passed into the ScalarMix layer.

  • requires_grad (bool) – If True, the model parameters will be updated together with the downstream task. Default: False.

forward(subwords)[source]
Parameters

subwords (Tensor) – [batch_size, seq_len, fix_len].

Returns

BERT embeddings of shape [batch_size, seq_len, n_out].

Return type

Tensor

class diaparser.modules.ScalarMix(n_layers: int, dropout: float = 0.0)[source]

Computes a parameterised scalar mixture of \(N\) tensors, \(mixture = \gamma * \sum_{k}(s_k * tensor_k)\) where \(s = \mathrm{softmax}(w)\), with \(w\) and \(\gamma\) scalar parameters.

Parameters
  • n_layers (int) – The number of layers to be mixed, i.e., \(N\).

  • dropout (float) – The dropout ratio of the layer weights. If dropout > 0, then for each scalar weight, adjust its softmax weight mass to 0 with the dropout probability (i.e., setting the unnormalized weight to -inf). This effectively redistributes the dropped probability mass to all other weights. Default: 0.

forward(tensors)[source]
Parameters

tensors (list[Tensor]) – \(N\) tensors to be mixed.

Returns

The mixture of \(N\) tensors.

LSTM

class diaparser.modules.LSTM(input_size, hidden_size, num_layers=1, bidirectional=False, dropout=0)[source]

LSTM is an variant of the vanilla bidirectional LSTM adopted by Biaffine Parser with the only difference of the dropout strategy. It drops nodes in the LSTM layers (input and recurrent connections) and applies the same dropout mask at every recurrent timesteps.

APIs are roughly the same as LSTM except that we only allows PackedSequence as input.

References

Parameters
  • input_size (int) – The number of expected features in the input.

  • hidden_size (int) – The number of features in the hidden state h.

  • num_layers (int) – The number of recurrent layers. Default: 1.

  • bidirectional (bool) – If True, becomes a bidirectional LSTM. Default: False

  • dropout (float) – If non-zero, introduces a SharedDropout layer on the outputs of each LSTM layer except the last layer. Default: 0.

forward(sequence, hx=None)[source]
Parameters
  • sequence (PackedSequence) – A packed variable length sequence.

  • hx (Tensor, Tensor) – A tuple composed of two tensors h and c. h of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial hidden state for each element in the batch. c of shape [num_layers*num_directions, batch_size, hidden_size] holds the initial cell state for each element in the batch. If hx is not provided, both h and c default to zero. Default: None.

Returns

The first is a packed variable length sequence. The second is a tuple of tensors h and c. h of shape [num_layers*num_directions, batch_size, hidden_size] holds the hidden state for t=seq_len. Like output, the layers can be separated using h.view(num_layers, 2, batch_size, hidden_size) and similarly for c. c of shape [num_layers*num_directions, batch_size, hidden_size] holds the cell state for t=seq_len.

Return type

PackedSequence, (Tensor, Tensor)

CharLSTM

class diaparser.modules.CharLSTM(n_chars, n_word_embed, n_out, pad_index=0)[source]

CharLSTM aims to generate character-level embeddings for tokens. It summerizes the information of characters in each token to an embedding using a LSTM layer.

Parameters
  • n_char (int) – The number of characters.

  • n_embed (int) – The size of each embedding vector as input to LSTM.

  • n_out (int) – The size of each output vector.

  • pad_index (int) – The index of the padding token in the vocabulary. Default: 0.

forward(x)[source]
Parameters

x (Tensor) – [batch_size, seq_len, fix_len]. Characters of all tokens. Each token holds no more than fix_len characters, and the excess is cut off directly.

Returns

The embeddings of shape [batch_size, seq_len, n_out] derived from the characters.

Return type

Tensor

Dropout

class diaparser.modules.IndependentDropout(p=0.5)[source]

For \(N\) tensors, they use different dropout masks respectively. When \(N-M\) of them are dropped, the remaining \(M\) ones are scaled by a factor of \(N/M\) to compensate, and when all of them are dropped together, zeros are returned.

Parameters

p (float) – The probability of an element to be zeroed. Default: 0.5.

Examples

>>> x, y = torch.ones(1, 3, 5), torch.ones(1, 3, 5)
>>> x, y = IndependentDropout()(x, y)
>>> x
tensor([[[1., 1., 1., 1., 1.],
         [0., 0., 0., 0., 0.],
         [2., 2., 2., 2., 2.]]])
>>> y
tensor([[[1., 1., 1., 1., 1.],
         [2., 2., 2., 2., 2.],
         [0., 0., 0., 0., 0.]]])
forward(*items)[source]
Parameters

items (list[Tensor]) – A list of tensors that have the same shape except the last dimension.

Returns

The returned tensors are of the same shape as items.

class diaparser.modules.SharedDropout(p=0.5, batch_first=True)[source]

SharedDropout differs from the vanilla dropout strategy in that the dropout mask is shared across one dimension.

Parameters
  • p (float) – The probability of an element to be zeroed. Default: 0.5.

  • batch_first (bool) – If True, the input and output tensors are provided as [batch_size, seq_len, *]. Default: True.

Examples

>>> x = torch.ones(1, 3, 5)
>>> nn.Dropout()(x)
tensor([[[0., 2., 2., 0., 0.],
         [2., 2., 0., 2., 2.],
         [2., 2., 2., 2., 0.]]])
>>> SharedDropout()(x)
tensor([[[2., 0., 2., 0., 2.],
         [2., 0., 2., 0., 2.],
         [2., 0., 2., 0., 2.]]])
forward(x)[source]
Parameters

x (Tensor) – A tensor of any shape.

Returns

The returned tensor is of the same shape as x.

MLP

class diaparser.modules.MLP(n_in, n_out, dropout=0)[source]

Applies a linear transformation together with LeakyReLU activation to the incoming tensor: \(y = \mathrm{LeakyReLU}(x A^T + b)\)

Parameters
  • n_in (Tensor) – The size of each input feature.

  • n_out (Tensor) – The size of each output feature.

  • dropout (float) – If non-zero, introduce a SharedDropout layer on the output with this dropout ratio. Default: 0.

forward(x)[source]
Parameters

x (Tensor) – The size of each input feature is n_in.

Returns

A tensor with the size of each output feature n_out.