Dominic Rigby

PyTorch Internals by Edward Zang

Date read: 21st September 2025

Blog post link

Key Points

Tensors

Execution Steps

  1. Python argument parsing
  2. Variable dispatch
  3. Dtype and device dispatch:
    • Dynamic dispatch calls the correct implementation for the device and datatype.
  4. The assigned kernel is then called
    • Parallelisation occurs inside kernel, whether it be explicit for CPU or implicit in CUDA.

Autograd