Deep learning is one of the hottest topics in today’s digital world. Everyone wants to know just how much potential this tool has in terms of revolutionizing technology and whether or not it will be able to introduce utopian elements such as intelligent autonomous systems (chatbots, self-driving cars, etc.). However, before one delves into the field of deep learning, they must understand just how far deep learning has come, what limitations are currently stalling deep learning models, and whether or not it’s worth buying into this hype. For the sake of keeping this blog post simple, I will only focus on the chronological steps behind the discovery of deep learning and elaborate on the fundamental differences introduced at each stage.
Artificial Intelligence, Machine Learning, Deep Learning: What’s the Difference?
Although these phrases are interchanged quite frequently, they do not necessarily mean the same thing. These terms involve a specific class hierarchy. Everything classified as deep learning is also classified as machine learning, and everything classified as machine learning is also classified as artificial intelligence. The reverse, however, is untrue. That is, there is some subset of artificial intelligence that isn’t classified as machine learning and there is some subset of machine learning that isn’t classified as deep learning. This can be understood best with a visualization using the image below:

Artificial Intelligence
The term artificial intelligence was first coined in the 1950s, an era in which pioneers wanted to explore whether or not computers could “think“. It was categorized very broadly and came to be defined as “the effort to automate intellectual tasks normally performed by humans.” Aside from machine learning and deep learning, there are also many subfields of artificial intelligence that involve no learning at all. For quite a long time, people thought that feeding a machine a huge system of hardcoded rules would enable it to attain human levels of intelligence. For example, back in the day, many AI pioneers fed chess AIs large sets of hardcoded instructions in hopes of allowing them to reach peak human chess performance. These system types are commonly referred to as symbolic AI, and do not classify as machine learning. Symbolic AI lived from the 1950s to the 1980s and was especially popular during the 1980s when the expert systems boom struck. Fortunately enough, pioneers eventually realized that this approach is not scalable and does not work for much more complicated problems such as image classification and language translation. Soon enough, this realization led to the emergence of machine learning.
Machine Learning
Although Charles Babbage’s Analytical Engine was the first general-purpose mechanical computer, it wasn’t designed exactly for that purpose. As a matter of fact, even Lady Ada Lovelace remarked on this invention by stating that the machine only knew how to perform tasks that humans knew how to perform. In other words, she implied that the Analytical Engine was incapable of learning new things on its own. Alan Turing, a prominent AI pioneer at the time, even referenced this statement in the very same paper that introduced the Turing test. This controversial topic eventually became the basis of the root problem that machine learning aims to solve: can a computer go beyond ordered instructions on data we know and independently learn how to fulfill a task?
Through this question, AI pioneers experienced a shift in programming paradigms and transitioned from classical programming to machine learning. In classical programming, machines are fed data and a set of rules and then expected to predict the output of new, unseen data. In machine learning, however, machines are fed data and the designated output and then expected to output a set of rules that can be used to predict the output of new, unseen data. That is to say, machines running machine learning are not given hardcoded solutions, but instead are trained to learn statistically based rules that allow them to predict solutions. The difference in the two paradigms can be shown best with the image below:

Despite its late emergence in the 1990s, machine learning is now the most popular field under AI, especially with the increased amount of fast-processing hardware and huge volumes of data available across the Internet. While it is true that machine learning is related to statistical and mathematical principles, it differs quite drastically from these measures due to the fact that it frequently deals with massive, complicated datasets that would render these forms of analysis impractical. Hence, machine learning is more of a hands-on paradigm that’s usually built on top of empirical and experimental premises more so than theories.
Data Feature Learning
To be able to differentiate between deep learning and machine learning, we need to identify what machine learning does precisely. Let’s begin by discussing the general requirements involved in a machine learning task:
- Input Features – these features can be the pixels of an image or the encoding of text or any other input data representation.
- Example Labels – these are what the machines will be trying to predict. For instance, in a vehicle classification problem, this would be the type of vehicle being predicted.
- Algorithm Performance Evaluation – this mechanism is what allows us to tweak the algorithm based on the performance of our model on training example data. This tweak is the fundamental idea behind the learning aspect of machine learning.
Based on this definition, it is quite clear that the goal behind both machine learning and deep learning is to learn informative representations of the input data given to us to better be able to predict the output values for new, unseen input data. But what exactly do we mean by data representations? Simply put, representations are just another way to envision the input data such that it is compatible with a learning model’s architecture. For example, colored images can be represented with a 2-dimensional matrix where each entry point is a pixel in that image that has 3 color values (RGB). Do note, however, that not all input data use the same representation. What works for one can easily not work for another. Ultimately speaking though, the goal behind machine learning models is the discovery of meaningful representations of the input data to make real world tasks involving them much easier to perform.
Let’s consider a quick example. Imagine we have a dataset containing points in a rectangular coordinate system, where each point has a color label: black or white. We’d like to predict the color label of a point given its coordinates on the plane. Here’s what we need to be able to do just that:
- Input Features – the input data for our model would be the coordinates of the (x, y) points in our dataset.
- Example Labels – the output our model would be predicting is the the color labels of the (x, y) points in our dataset.
- Algorithm Performance Evaluation – there’s many ways our model could be evaluated, but one way we could do so would be by checking the percentage of correctly classified points (i.e. prediction accuracy).
Having this in mind, the end goal of our model would be to apply a transformation to the points such that they are classified correctly. The learning step of the training algorithm is meant to find these refined transformations at each stage to eventually put the input data at a useful representation for the task at hand. As mentioned earlier, these transformations can be anything and vary depending on the data and the task to be executed. There is very little creativity in this discovery process since machine learning algorithms only look for viable approximated solutions within their own hypothesis space, a set of potential solutions based on the premises seen so far from the example training data.
So what classifies as “deep” in deep learning?
The main difference between deep learning and its parent field machine learning is that deep learning stresses the importance of building models using successive layers that each learn specific feature representations. The number of layers used by a model is referred to as the model’s depth. Deep learning models, especially present day builds, use a large amount of chained layers and can sometimes even be in the hundreds. Smaller machine learning models that only use a few successive layers are often referred to as shallow learning models to highlight the difference in architectures.
Neural networks are most often the models used to achieve such deep learning of successive feature representations. The theory behind the name neural networks is driven by a neurobiological reference to neuron links that allow our human brains to functions as they do. We do not know for sure whether or not these neural network models replicate actual human brain behavior, and the way our brains retrieve information could very well differ drastically from the learning of neural networks. Hence, we generally like to stick to the notion that deep learning is merely a mathematically driven tool that facilitates learning data representations.
Having said that, you might be wondering what these learned features actually look like. Is there any common pattern that continues to show up across different models? Let’s consider this from an image classification task point of view. In the first few layers of the model, the model learns the representation of very small components of the picture that are practically meaningless and nothing compared to the image itself. For example, these features could be edges or shapes, but nothing so complex as to identify the image at hand. However, as the layers pile up and feed their learned representations into sequential layers, these representations become more and more vivid and informative about the image itself. A great analogy for this is the water purification process. At first, you have a liquid that’s mixed with so many substances, but after running it through a sequence of filters that each strip from it a substance, the end result is purified water. Similarly, deep learning models have no clue what the input data is like at first, but after running it through successive layers, it extracts many features that it can then learn to identify the image for what it is. Here’s a simple example to help visualize this process even better:

Sources
- Chollet François. Deep Learning with Python. Manning Publications Co., 2018.
- Featured image can be found here.