On Multiplicative Integration with Recurrent Neural Networks

Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

2016

3 references

Abstract

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs). MI changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

View Paper PDF DOI

🤖 Machine Learning 🧠 ML Compilers

2 repositories

2 references

Code References

▶ onnx/onnx

1 file

▶ docs/proposals/NLPinONNXproposal.md

L53

Once we move beyond the domain of standard LSTM and GRU operations, we need a more generic abstraction onto which we can map NLP architectures. A simple example is how one can implement Multiplicative Integration LSTM (https://arxiv.org/pdf/1606.06630.pdf) in ONNX. We can expose a standard LSTMCell via the proposed Function abstraction (https://github.com/onnx/onnx/issues/481). Building on top of this, we can construct a MI-LSTM by applying the required second-order transformations to the inputs to the LSTMCell. Once we have this aggregated implementation, we can use the generic control flow operators (https://github.com/onnx/onnx/pull/436) to apply this “composite” MI-LSTM cell over a sequence.

▶ pytorch/pytorch

1 file

▶ benchmarks/fastrnns/cells.py

# Section 2.1 in https://arxiv.org/pdf/1606.06630.pdf

Link copied to clipboard!