LSTM Neural Networks: The Basic Concept

An High Level Introduction to Long Short Term Memory Neural Networks

← Back to blog

Photo by Alina Grubnyak on Unsplash

Predicting the future is was once a thing of speculation and mystery. Thanks to human advancements, it has become a task only limited by the amount and depth of data.

And as we live in a society that continuously generates data at an exponential rate, this task of foresight is becoming more accessible.


The further you look into data driven predictions, the term LSTM is sure to rear it confusing head. As with many tech concepts, it is an acronym and it stands for Long Short Term Memory.

Simply stated, it is a Neural Network — a system of machine learning meant to emulate human learning patterns — that is able to “remember” previous data and conclusions, and use that to more accurately come to a final conclusion.

“… LSTM holds promise for any sequential processing task in which we suspect that a hierarchical decomposition may exist, but do not know in advance what this decomposition is.”
 — Felix A. Gers, et al., Learning to Forget: Continual Prediction with LSTM, 2000

LSTM is a type of Recurrent Neural Network in Deep Learning that has been specifically developed for the use of handling sequential prediction problems. For example:

  • Weather Forecasting
  • Stock Market Prediction
  • Product Recommendation
  • Text/Image/Handwriting Generation
  • Text Translation

Need a refresher on Neural Networks as a whole?

Everything you need to know about Neural Networks
*Courtesy: Kailash Ahirwar (Co-Founder & CTO, Mate Labs)*hackernoon.com

“Since LSTMs are effective at capturing long-term temporal dependencies without suffering from the optimization hurdles that plague simple recurrent networks (SRNs), they have been used to advance the state of the art for many difficult problems. This includes handwriting recognition and generation, language modeling and translation, acoustic modeling of speech, speech synthesis, protein secondary structure prediction, analysis of audio, and video data among others.”
 — Klaus Greff, et al., LSTM: A Search Space Odyssey, 2015

Like other Neural Networks, they contain neurons to perform computation, however for LSTM, they are often referred to as memory cells or simply cells. These cells contain weights and gates; the gates being the distinguishing feature of LSTM models. There are 3 gates inside of every cell. The input gate, the forget gate, and the output gate.

Photo Credit: Aleia Knight


— Important Variables —

Photo Credit: Aleia Knight


— LSTM Gates —

The Cell State

Photo Credit: Aleia Knight

The cell state is sort of like a conveyor belt that moves the data along through the cell. While it is not technically a gate, it is crucial for carrying data through each individual cell as well as to other cells. The data flowing through it is altered and updated according to the results from the forget and input gates and passed to the next cell.

The Forget Gate

Photo Credit: Aleia Knight

This gate removes unneeded information before merging with the cell state. Just as humans choose to not consider certain events or information that is not related or necessary for making a decision.

It takes in 2 inputs, new information (x_t) and the previous cells output (h_t-1). It runs these inputs through a sigmoid gate to filter out unneeded data, and then merges it with the cell state via multiplication.

The Input Gate

Photo Credit: Aleia Knight

This gate adds information to the cell state.The human equivalent being considering newly presented information on top of the information you already have.

Similar to the forget gate, it employs a sigmoid gate to determine what amount of information needs to be kept. It uses the tanh function to create a vector of the information to be added. It then multiplies the results from the sigmoid gate and tanh functions and adds the useful information to the cell state using addition.

At this point, all information has been setup: starting information, new information, and the dropping of unneeded information. It’s all gathered and compiled and a decision is ready to be made.

The Output Gate

Photo Credit: Aleia Knight

The last gate selects useful information based on cell state, the previous cell output, and new data. It does this by taking the cell state, after the input and forget gates have merged, and runs it through a tanh function to create a vector. It then takes the new data and previous cell output and runs them through a sigmoid function to find what values need to be outputted. The results of those 2 operations are then multiplied and returned as this cells output.

For more on Activation Functions (tanh and sigmoid):

Activation Function
*An activation function is a function used in artificial neural networks which outputs a small value for small inputs…*deepai.org

This entire process of the data moving through the cells is happening in 1 cell. But in an actual model, there can be any amount of cell, per layer, for however many layers are added, before a final conclusion is reached.

And then that entire model is ran again for however many epochs (iterations) is needed to get closer to a more accurate answer. The better the accuracy; The better the prediction.


As humans, we do this process constantly and at an amazingly fast rate. Even as far back as learning to walk by looking back and what we did wrong, looking at what other people did right, and adjusting from that. The LSTM process is, like other Neural Networks, meant to emulate the human mind.

The difference being it’s computational prowess.

Humans are highly intelligent creatures, thus how we made it this far, but we know that the very machines we’ve built are smarter than us. Especially in terms of mathematical and scientific speed.

LSTM models are able to look back at previous data and decisions and make decisions from that. But they are also also to use that same process to make educated guesses/predictions about what can happen. Thus why this model is best when fed sequential data. It will find trends and use those trends to predict future trends and results.

Can we do that? Yes

But perhaps not with the same level of accuracy. Let alone taking in millions of data points as input.

So you see, we can predict the future and we don’t even need a fortune teller to do so.

Photo by Michael Dziedzic on Unsplash

Simply a powerful computer, some data, and a bit of math!


Enjoyed the read? Comment to let me know what you think on the topic, and follow to get more articles on Machine Learning, Data Science, STEM, and Career/Personal Development.

Latest Posts

See all posts

Let's work together