MIT presents a new approach for sequence-to-sequence learning with latent neural grammars


Sequence-to-sequence (seq2seq) modeling with neural networks has become the de facto standard for sequence prediction tasks such as those found in language modeling and machine translation. The basic idea is to use an encoder to transform the input sequence into a context vector; then use a decoder to extract an output sequence that predicts the next value of that vector.

Despite their impressive power and performance, seq2seq models are often inefficient in terms of sampling. Additionally, due to their relatively low inductive biases, these models can fail dramatically on references designed to test composition generalization.

The new paper from MIT CSAIL Sequence-to-sequence learning with latent neural grammars proposes an alternative and hierarchical approach to seq2seq learning with quasi-synchronous grammars, by developing a neuronal parameterization of the grammar which allows the sharing of parameters on the combinatorial space of the derivation rules without requiring manual engineering of the functionalities.

The paper identifies three ways in which the proposed approach differs from previous work in this area:

  1. We model the distribution on the target sequence with a quasi-synchronous grammar which assumes a hierarchical generative process in which each node of the target tree is transduced by nodes of the source tree.
  2. Unlike the existing line of work on incorporating (often observed) tree structures into sequence modeling with neural networks, we treat source and target trees as fully latent and induce them during training.
  3. While previous work on synchronous grammars generally used log-linear models on craft / pipeline features, we use neural features to parameterize the probabilities of grammar rules, which allows efficient sharing of parameters on the Combinatorial space of derivation rules without the need for smoothing or engineering features.

Typically, quasi-synchronous grammars define a monolingual grammar on target strings conditioned by a source tree, where the rule set of the grammar dynamically depends on the source tree. Instead, this work uses Quasi-Synchronous Contextless Probabilistic Grammars (QCFGs), which transduce the output tree by aligning each target tree node to a subset of source tree nodes, making it suitable for them. tasks where syntactic discrepancies are common.

Moreover, this grammar does not need to implicitly capture the hierarchical structures in the hidden layers of a neural network; rather, it can explicitly model the hierarchical structure on both the source side and the target side, resulting in a more interpretable build process.

As each source tree node often appears multiple times in the training corpus, parameter sharing is required. While previous work on QCFGs involved intensive manual engineering of features to share parameters between rules, this approach instead uses neural parametrization to enable efficient sharing of parameters over the combinatorial space of the derivation rules.

For evaluation purposes, the proposed approach was applied to various seq2seq learning tasks, including a SCAN language navigation task designed to test composition generalization, style transfer to English Penn Treebank, and small-scale English-French machine translation.

In experiments, the proposed approach performed decently on niche datasets such as SCAN and StylePTB, but woefully underperforming compared to a transformer well tuned to machine translation tasks.

Overall, the study shows that the formalism of quasi-synchronous grammars can provide a flexible tool to permeate inductive biases, operationalize constraints and interface with models. The paper proposes that future work in this area may involve revisiting richer grammatical formalisms with contemporary parameterizations, conditioning on images / audio for grounded grammatical induction, adaptation to programs and graphics, and study. integrating grammars and symbolic models with pre-trained language models to solve practical problems. Tasks.

The paper Sequence-to-sequence learning with latent neural grammars is on arXiv.


Author: Hecate Il | Editor: Michael Sarazen, Zhang Channel


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Weekly Synchronized Global AI to get weekly AI updates.


Comments are closed.