Dynamic Neural Networks: A Survey

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 265

Record: found
Abstract: not found
Conference Proceedings: not found

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren … (2020)

0 comments Cited 9183 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2003)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7586 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar … (2017)

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures

0 comments Cited 3103 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Yizeng Han: (View ORCID Profile)

Gao Huang: (View ORCID Profile)

Shiji Song: (View ORCID Profile)

Le Yang: (View ORCID Profile)

Honghui Wang: (View ORCID Profile)

Yulin Wang: (View ORCID Profile)

Journal

Title: IEEE Transactions on Pattern Analysis and Machine Intelligence

Abbreviated Title: IEEE Trans. Pattern Anal. Mach. Intell.

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

ISSN (Print): 0162-8828

ISSN (Electronic): 2160-9292

ISSN (Electronic): 1939-3539

Publication date Created: November 1 2022

Publication date (Print): November 1 2022

Volume: 44

Issue: 11

Pages: 7436-7456

Affiliations

[1 ]Department of Automation, Tsinghua University, Beijing, China

Article

DOI: 10.1109/TPAMI.2021.3117837

PubMed ID: 34613907

SO-VID: 3ed8ded3-4a01-4b7e-84cb-6940d042329e

License:

https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html

https://doi.org/10.15223/policy-029

https://doi.org/10.15223/policy-037

History

Data availability:

Comments

Comment on this article

scite_

380

Smart Citations

380

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.