The Legendre Memory Unit (LMU) is a recurrent network that can learn temporal dependencies over millions of timesteps, unlike LSTMs. The LMU can scale to the size of transformers, but scales as O(N) in compute and O(1) in memory. In addition, LMUs require 10x less training (and data) to be as accurate as a transformer.
LMUs maintain efficient and scale-invariant representations of recent inputs and learn how to solve real world problems using those representations. LMUs can be implemented with traditional deep learning techniques on hardware you already have, or can be deployed on neuromorphic hardware or neural accelerators for massive power savings. Using LMUs, we and others have obtained the best-known results on a variety of benchmarks.
Purchase License