PyTorch Internals by Edward Zang
Date read: 21st September 2025
Blog post link
Key Points
Tensors
- Tensors are stored in contiguous memory with the shape as metadata (hence reshape is ~free)
- Metadata:
- dtype
- size: shape of tensor
- stride: how to convert indices to memory address is contiguous memory
- E.g. stride: (2, 1) -> [i, j] -> base address + 2i + 1j
- Indexing tensors does not create new tensors, just alters view of current one.
- Storage: tensors contain human readable data, e.g. size and they have a corresponding storage unit
which stores the actual data: dtype, device etc.
- Tensor types are fully defined by three values:
- Device
- Layout (strides, size etc. This can change for sparse tensors for example).
- Dtype
Execution Steps
- Python argument parsing
- Variable dispatch
- Dtype and device dispatch:
- Dynamic dispatch calls the correct implementation for the device and datatype.
- The assigned kernel is then called
- Parallelisation occurs inside kernel, whether it be explicit for CPU or implicit in CUDA.
Autograd
- Uses reverse mode for forward pass: performs forward step backwards.
- Creates a graph of the forward pass to trace backwards
- Leaf nodes of graph are the parameters we are updating.