When it comes to building neural network models, there’s a lot of factors to consider such as hyperparameter tuning, model architecture, whether to use a pre-trained model or not, and so on and so forth. While it’s true that these are all important aspects to consider, I would argue that proper understanding of data representations is one of the most important aspects, if not the most important aspect. If you cannot transform input data to a format that is meaningful to the model of your choice, then I wish you the best of luck during model evaluation because the results aren’t going to be pretty.
The organization of input data can easily make or break a model’s performance depending on how well the data layout is. If the data is structured poorly, the model will be incapable of learning enough about the data (aka underfitting), which will then lead to unsatisfactory results. But let’s say that you do transform the data into a meaningful representation. Well… there’s still a risk of getting horrible results since tweaking deep learning models is a highly experimental process, and very often it takes several trial runs to find the optimum configuration. But I guess at least there’s hope in it working out in your favor, amirite?
All jokes aside, it’s imperative that you understand how to efficiently store data to be able to build great models. Having said that, let’s kickstart this post with an explanation of what tensors are and the significant role they play in machine learning.
The code examples in this post are written in Python, specifically using the PyTorch deep learning library. The code examples can be found in a Jupyter Notebook on my GitHub profile here.
Generally speaking, all machine learning models require data to be stored in multidimensional arrays. These multidimensional arrays are commonly referred to as tensors. Tensors are fundamental to the field of machine learning so much so that Google’s TensorFlow library was named after them. That being said, what exactly is a tensor?
At its heart, a tensor is a data container, where the data being stored can have any dimension. That is, the data can be a scalar value, a vector, a matrix, or even a higher dimensional structure. The dimension of a tensor is often referred to as an axis, and the number of axes it has is its rank.
Scalar tensors are tensors that contain a single value. They are 0-dimensional and are equivalent to a normal float value in Python, except that they are encapsulated by a Tensor class object. Here’s a quick PyTorch example of a scalar tensor:
To view the number of axes a tensor has in PyTorch, you can invoke the tensor object’s dim() function. As expected, the dim() function for the above tensor returns an output of 0.
A tensor that contains a list or array of values is a vector or, equivalently, a 1-dimensional tensor. That is to say that a vector tensor has only one axis. Let’s take a look at a quick example:
We can verify that the above tensor has just one axis using the dim() function as follows:
Note, however, that the dimension of a vector is not the same as the dimension of a tensor. In the above example, we have a 1-dimensional tensor, but we also have a 5-dimensional vector. A vector has only one axis, and the number of dimensions it has is equivalent to the number of values it holds. Because the above vector contains five values, the number of dimensions it has is five across that single axis. Contrarily, a tensor can have multiple axes, each of which can have multiple dimensions. Thus, the term dimension can refer to both the number of axis a tensor has or the number of values along a specific axes.
The intent behind the term dimension can sometimes be confusing, and as a result, it’s usually more appropriate to refer to the dimension of a tensor using the term rank (i.e. the above tensor has rank 5). Despite the ambiguous meaning behind the term dimension, it’s still very commonly used to refer to a tensor’s axes.
A matrix is a 2-dimensional tensor that contains a list of vectors. Matrices have two axes that are usually called rows and columns. The easiest way to imagine a matrix is as a rectangular grid of values similar to the classic multiplication table found at the back of most composition notebooks.
The following is a short PyTorch snippet showing how easy it is to create a matrix tensor:
In the matrix above, the first row contains the values [12, 31, 15, 29] and the first column contains the values [12, 16, 19]. Let’s now verify that the number of axes in this tensor is two as expected.
3+ Dimensional Tensors
If you make a list of matrices, the end result is a 3-dimensional tensor. You can visualize this by imagining a rectangular prism, where each layer in that prism is a matrix of values. 3-dimensional tensors are most associated with image representations, where the first axis represents the RGB channel and the remaining two axis represent the image pixel values. Let’s now take a look at an example of a 3-dimensional tensor.
Once again, let’s quickly verify that the tensor above has three axes.
Wrapping up 3-dimensional tensors with another layer will give you a 4-dimensional tensor. Generally speaking, tensors used in deep learning models are usually at most 4-dimensional unless you’re working with video representations, which require the use of 5-dimensional tensors.
When it comes to describing a tensor in detail, three important features must be specified:
- Axis Count (rank)
This should be self-explanatory. As mentioned earlier, a scalar has 0 axes, a vector has 1 axis, a matrix has 2 axes, and so on and so forth. In PyTorch, this can be discovered using a tensor object’s dim() function.
The shape of a tensor is a tuple, where each element in the tuple is the dimension along one of the tensor’s axes. These elements are in the same order as the tensor’s axes. For example, the 3-dimensional tensor we created earlier has a shape of (3, 2, 2).
- Data Type
The data type of a tensor is commonly referred to as dtype in many of the Python libraries. This is the type of the values that are encapsulated by the tensor. In the first couple of examples mentioned earlier, notice how I called the function type(torch.float32) on the tensor objects. Doing so tells PyTorch to convert all the elements in that Tensor to data type float32. Similarly, in the 3-dimensional example earlier, I didn’t invoke the type() function. Hence, the numbers in the tensor remained as type int32. There are other types to choose from such as uint16, float64, and many others.
One thing to keep in mind, however, is that string tensors are nonexistent because tensors are efficiently stored in preallocated, contiguous blocks of memory to optimize mathematical operations. That form of storage, however, is not compatible with the string data type because the lengths of strings are unfixed.
If you’ve made it this far in the post, I appreciate your time reading this post. It means a lot to me. Feel free to drop a comment below and share your thoughts with others reading these posts. In addition to that, you’re always welcome to join my RealDevTalk community on Twitter here. Until next time, take care 👌
- Chollet François. Deep Learning with Python. Manning Publications Co., 2018.