Hi everyone! I recently discovered a free, live-streamed 6-week PyTorch deep learning course on YouTube and decided to commit to it in the spare time that I have. The following is a copy of the Jupyter Notebook instance I wrote for the first assignment. This notebook can also be found on GitHub here. I hope you find this resource useful!
Exploring PyTorch Tensor Functions¶
This notebook will give a brief outlook on some of the tensor related functions provided by PyTorch.¶
PyTorch is an open source machine learning library that allows building deep learning projects at high speeds and with easy flexibility. PyTorch also serves as an advancement of NumPy with its incorporation of GPU power. For the remainder of this document, we will be taking a look at the following Pytorch tensor functions:
- torch.numel
- torch.logspace
- torch.full
- torch.cat
- torch.narrow
# Import torch and other required modules
import torch
Function 1 – torch.numel¶
The torch.numel function takes a tensor as an input parameter and returns a count of the elements contained by that tensor at all dimensions.
# Example 1 - working
x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
torch.numel(x)
In this example, we have a square matrix tensor of dimensions $3 \times 3$, meaning that there are $3 \times 3 = 9$ elements contained in total.
# Example 2 - working
x = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]])
torch.numel(x)
Here, we’re no longer using a square matrix tensor. Now our tensor has dimensions $3 \times 2 \times 2$ (i.e. $3$ rows
, each split into $2$ columns
, each of which has space for $2$ values
). It might help to visualize this tensor as a rectangular prism of length
$3$, width
$2$, and height
$2$. That being said, the number of elements contained in total is the volume of the prism itself, which is $3 \times 2 \times 2 = 12$.
# Example 3 - breaking (to illustrate when it breaks)
t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
torch.numel(t)
The failure here is caused by the fact that the function torch.numel only takes a tensor as data input. Although the list t
constructed in this example is formatted in the style of a tensor, the torch.numel function still cannot accept it as an input because it has no clue beforehand whether the list given to it is in tensor format or some other incompatible format (i.e. a list of strings). Hence, this function demands that the input data is encapsulated in a tensor object to guarantee that the input it receives is actually a number
, numeric vector
, or a numeric matrix
.
The function torch.numel is used in a variety of cases. It can be used to compute the average (mean)
value across a tensor or even to compute the size
of a tensor that was loaded from an external source such as a file or a scraped site. Any time the number of elements contained by a tensor is needed, this function is the perfect tool for the job.
Function 2 – torch.logspace¶
The torch.logspace function outputs a 1-dimensional tensor containing logarithmically spaced values starting at base
$^{start}$ and ending at base
$^{end}$, where (base
, start
, end
) are parameters set by the user. The number of points outputted in this tensor depends on the specified steps
parameter value.
# Example 1 - working
torch.logspace(start=0, end=5, steps=6, base=2)
This is a tensor containing values ranging from $2^{0} = 1$ to $2^{5} = 32$ inclusive. But how exactly are the intermediate values determined? Well, we have $end – start + 1 = 5 – 0 + 1 = 6$ values in the range $\lbrack$start
, end
$\rbrack$. We need to split these $6$ values into $6$ steps, and thus we do $\cfrac{6}{6} = 1$. This tells us that the powers chosen for the base
values included in the tensor will be a distance of $1$ unit apart. Given this information as well as the base
value of $2$, we arrive at our tensor values: $2^{0}, 2^{1}, 2^{2}, 2^{3}, 2^{4}, 2^{5}$, which are equivalent to the output shown above.
# Example 2 - working
torch.logspace(start=0, end=5, steps=5, base=2, dtype=torch.int32)
$\def\lc{\left\lceil}\def\rc{\right\rceil}$The resulting tensor this time around again contains values ranging from $2^{0} = 1$ to $2^{5} = 32$ inclusive. However, the intermediate values are arrived at differently not only because the steps
parameter has been modified, but also because now we’ve specified an output data type of $32$-bit integers, which means that floating point result values will be truncated. That being said, as shown in the previous example, there are $6$ values in the range $\lbrack$start
, end
$\rbrack$. We need to split these $6$ values into $5$ steps, and thus we do $\cfrac{6}{5} = 1.2$, which means that the powers chosen for the base
values included in the tensor will be a distance of $1.2$ apart. Given this information as well as the base
value of $2$, we arrive at our tensor values: $\lc 2^{0} \rc, \lc 2^{1.2} \rc, \lc 2^{2.4} \rc, \lc 2^{3.6} \rc, \lc 2^{4.8} \rc$, which are equivialent to the output shown above.
# Example 3 - breaking (to illustrate when it breaks)
x = torch.randn(4, dtype=torch.float32)
torch.logspace(start=1, end=4, steps=4, base=2, out=x, dtype=torch.int32)
We get an error here because of mismatching types. When calling the torch.logspace function, we specified that we wanted the output tensor to be stored in the tensor x
, which is completely fine to do. However, we also specified that the output tensor’s type should be of $32$-bit integers, which is in conflict with the passed in output tensor x
‘s type of $32$-bit floating point values.
The torch.logspace function is often used to create frequency vectors containing values within a specified range. For example, this proves to be beneficial when testing out multiple learning rate values to see which leads to better optimization of a machine learning algorithm. Before using this function, one needs to understand when exactly they need their data logarithmically spaced out because sometimes, it might be better to use a linear spacing instead. So when would you want to use one data spacing type over the other? Simply put, if you’re modeling something that relies on some internal relative change ($\textit{multiplicative}$) mechanism, then a logarithmic spacing would allow you to capture the patterns in this mechanism more accurately than a linear spacing would. Similarly, if you’re modeling something that relies on some internal absolute change ($\textit{additive}$) mechanism, then a linear spacing would allow you to capture the patterns in this mechanism more accurately than a logarithmic spacing would.
Function 3 – torch.full¶
The torch.full function creates a tensor of specified size
, prefills it entirely with a value specified by the value
parameter, and then returns that tensor as output.
# Example 1 - working
torch.full((2, 4), 3.0)
In this example, we can see that the function returned a matrix tensor of dimensions $2 \times 4$ ($2$ rows by $4$ columns) with all entry values set to $3.0$.
# Example 2 - working
torch.full((1, 1), -10.0)
Here, we have a simple example of a $1 \times 1$ matrix($1$ row by $1$ column) that has its one and only entry set to a value of $-10.0$.
# Example 3 - breaking (to illustrate when it breaks)
x = list()
torch.full((2, 4), 3.7, out=x)
The out
parameter in the torch.full function is used to determine where the output tensor is to be stored. However, by providing a list
as the output type instead of a tensor
, this results in a TypeError
since PyTorch expects to be passed a tensor as the out
parameter and not a list.
The torch.full function is used whenever one needs to create a tensor that’s loaded with the same specified value at all possible positions. For example, this can be used for initializing a weight tensor with an initial value that’s the same for each weight. There are also certain linear algebraic operations that require creating a vector tensor containg the same value at each position and then concatenating that to an other tensor to be able to compute a specific output correctly.
Function 4 – torch.cat¶
The torch.cat function takes an input tuple of tensors
as well as a dimension dim
, and then returns an output tensor that is a concatenation of the input tensors at the specified dimension.
# Example 1 - working
x = torch.tensor([
[3,4],
[5,6],
[7, 8]
])
torch.cat((x, x), dim=0)
The tensor x
in this example has two dimensions. The first dimension (dim
$= 0$) encompasses all the elements wrapped by the outer array (i.e. the $3$ inner arrays). Similarly, the second dimension (dim
$= 1$) encompasses all the values contained by each of the $3$ inner arrays that are wrapped by the outer array. Hence, when we concatenate x
with itself at dim
$= 0$, we’re essentially duplicating the $3$ inner arrays
since those are x's
contents at dim
$= 0$.
# Example 2 - working
x = torch.tensor([
[3,4],
[5,6],
[7, 8]
])
torch.cat((x, x), dim=1)
As mentioned above, the tensor x
in this example has two dimensions. That being said, when we concatenate x
with itself at dim
$= 1$, we’re essentially duplicating the values
contained by each of the 3 inner arrays (i.e. $[3,4] \implies[3, 4, 3, 4]$) since those are x's
contents at dim
$= 1$.
# Example 3 - breaking (to illustrate when it breaks)
x = torch.tensor([
[3, 4],
[5, 6],
[7, 8]
])
y = torch.tensor([
[3, 4, 10],
[5, 6, 11],
[7, 8, 13]
])
torch.cat((x, y), dim=0)
The reason why the function torch.cat fails here is because it expects all input tensors to have the same size
for all dimensions $\textbf{except}$ for the dimension at which the concatenation is occurring. In this example, we specified the concatenation at dim
$= 0$, and thus need to make sure that x
and y
have the same size
at dim
$= 1$, which clearly isn’t the case ($2 \neq 3$) and is the reason behind the error above.
The torch.cat function can be used whenever a sequence of tensors need to be joined for a computation to proceed. An example use case of this is in the construction of recurrent neural networks (RNN)
, which use the torch.cat function to continuously join hidden input
and output
states. Another example use case is in the establishment of data parallelism
, which involves breaking down input data into minibatches
and operating on those minibatches
in parallel
to improve performance.
Function 5 – torch.narrow¶
The torch.narrow function takes in an input tensor
, a dimension dim
, a starting position start
, and a distance length
to move from the starting position. The output tensor contains only the elements at the specified dimension dim
that are within the index range $\lbrack$start
, start
+ length
$)$, hence the term narrowing
# Example 1 - working
x = torch.tensor([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
])
torch.narrow(x, dim=0, start=1, length=2)
At dim
$=0$, x
has $3$ inner vectors. Each of these vectors is at an index
in dim
$= 0$ of x
. Hence, we have indices $0$, $1$, and $2$. For this example, we’ve specified that we would like to start at index $1$ and only include $2$ items from dim
$= 0$ from there onward. This means that the torch.narrow function will only output the inner vectors at indices $1$ and $2$.
# Example 2 - working
x = torch.tensor([
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]
])
torch.narrow(x, dim=1, start=1, length=3)
At dim
$= 1$, we’re operating on eah of the three inner vectors contained in dim
$= 0$ of x
. To narrow each of these vectors down, we will only select $2$ values in the index range $[1, 3]$ (or equivalently the second
, third
, and fourth
values) from each vector.
# Example 3 - breaking (to illustrate when it breaks)
x = torch.tensor([
[1, 2],
[3, 4]
])
torch.narrow(x, dim=0, start=1, length = 2)
The reason why this fails is because there are only two vectors at dim
$= 0$ of x
and we are asking the torch.narrow function to get us two vectors starting at index $1$ (i.e. the second
vector of x
), which would require that there be a vector after $[3, 4]$, but there isn’t one, and thus an error is thrown.
The torch.narrow function comes in very handy when we need to perform a computation that only requires a specific chunk from some tensor. One of the especially important features of the torch.narrow function is that it allows us to extract a tensor chunk from a larger tensor without making a memory copy of the original tensor. That is, the new tensor that is returned references the same storage point as the original tensor that is being narrowed. Without this function, we’d have to make deep copies of tensor data every time we wanted to reuse that data, which is very inefficient in terms of memory management. Hence, the torch.narrow function solves this exact problem.
Conclusion¶
In this notebook, we’ve explored a small set of PyTorch’s vast amount of brilliant functions. Although I only shed light on $5$ functions in this document, I learned a rich amount of knowledge researching the applications of these functions, which in turn forced me to learn about other functions. If you would like to learn more about these functions and other PyTorch functions, please check out the official documentation.
Reference Links¶
Provide links to your references and other interesting articles about tensors
- Official documentation for
torch.Tensor
: https://pytorch.org/docs/stable/tensors.html - StackExchange post discussing when and why one should use a
logarithmic scale
: https://stats.stackexchange.com/questions/18844/when-and-why-should-you-take-the-log-of-a-distribution-of-numbers - Official documentation for
Multi-GPU Examples
: http://seba1511.net/tutorials/beginner/former_torchies/parallelism_tutorial.html?highlight=concatenation
!pip install jovian --upgrade --quiet
import jovian
jovian.commit()
Sources
- Featured image can be found here.