Dominic Rigby

In-Context Reinforcement Learning for Variable Action Spaces

Date read: 1st September 2025

Introduces HeadlessAD, an in context reinforcement learning method which takes sequences of states, actions and rewards and then continues to guess the next optimal action.
HeadlessAD works on varying sized discrete action spaces
This is achieved by operating in an abstract action embedding space to make the model action size invariant.
Action embedding is randomly and orthonormally initialised for each sequence to prevent the model utilising inbuilt knowledge of the game and forcing it to learn in context.
Contrastive loss is used to train the model to predict the embeddings.
They finetune TinyLlama LLM (utilise transformer architecture)