Unit 4: Train a Policy — LeRobot Learning Path

Policy Choices

LeRobot ships three production-ready policy architectures. Choose one before you run training — you cannot switch mid-run.

Recommended for this path

ACT

Action Chunked Transformers. Best for dexterous single-arm manipulation. Trains in 1–3h on GPU. Predictable hyperparameters. Use this.

Diffusion Policy

Higher peak accuracy on precision tasks but 3–5x slower to train and infer. Use it after you have a working ACT baseline.

SmolVLA

Language-conditioned VLA. Use when your task requires natural language instructions or multi-task generalization. Requires more data.

ACT Training Command

Replace $HF_USER/pick-place-v1 with your dataset repo ID from Unit 3.

source ~/lerobot-env/bin/activate

python -m lerobot.scripts.train \
  --policy-type act \
  --dataset-repo-id $HF_USER/pick-place-v1 \
  --output-dir ~/lerobot-policies/pick-place-v1 \
  --config-overrides \
    training.num_steps=50000 \
    training.eval_freq=5000 \
    training.save_freq=5000 \
    training.batch_size=32 \
    policy.chunk_size=100 \
    policy.n_action_steps=100

# Add --device cuda if you have a GPU (strongly recommended)
# Checkpoints save every 5k steps to ~/lerobot-policies/pick-place-v1/
# Start this before sleep — it can run unattended

GPU vs CPU training time: On an RTX 3090 (24GB), 50,000 steps takes approximately 60–80 minutes. On an RTX 3080 (10GB), approximately 90–120 minutes. On CPU, expect 8–12 hours. Cloud GPU options (Lambda Labs, Vast.ai) run $0.50–1.50/hr for the hardware needed.

Recommended Hyperparameters for Single-Arm Pick-and-Place

Parameter	Recommended	Why
num_steps	50000	Sufficient for 50–100 demos of a simple pick-and-place. Increase to 80k if your loss plateau occurs late.
batch_size	32	Standard for single-arm datasets. Reduce to 16 if you run out of GPU memory.
chunk_size	100	ACT plans 100 steps ahead. At 30fps this is ~3.3 seconds — a good planning horizon for pick-and-place.
n_action_steps	100	Must match chunk_size. Reduces inference frequency and smooths execution.
kl_weight	10	LeRobot default. Do not change unless L_kl stays near zero after 20k steps.
lr	1e-5	LeRobot default for ACT. Lower to 5e-6 if reconstruction loss oscillates instead of converging.

Reading Training Logs

Training logs print to the terminal and to TensorBoard. Launch TensorBoard in a second terminal:

tensorboard --logdir ~/lerobot-policies/

Then open http://localhost:6006 in your browser. Watch these curves:

loss/reconstruction (L_recon)

The primary training signal. Should decrease from ~2.5–3.5 to below 0.1 by 50,000 steps. A plateau above 0.15 after 40,000 steps usually means your dataset has too much variance — review Unit 3's good demo practices and consider recording more consistent demonstrations.

loss/kl (L_kl)

Rises slowly from near 0 to 5–20. This is expected behavior — the CVAE is learning a compact style embedding. If it exceeds 40, your demonstrations contain too much behavioral diversity. If it stays near 0 after 20k steps, the CVAE is not learning; increase kl_weight to 20.

train/loss (total loss)

L_recon + kl_weight × L_kl. Dominated by L_recon in early training. Should decrease monotonically. A total loss that rises after an initial decrease indicates learning rate decay is too aggressive — check the scheduler config.

Checkpoint Management

Checkpoints save every 5,000 steps to ~/lerobot-policies/pick-place-v1/checkpoints/. Do not assume the final checkpoint is the best. The policy can overfit at high step counts, especially with small datasets.

After training, identify your best checkpoint: it is the step where L_reconstruction reached its minimum before starting to plateau. For 50 demonstrations, this typically occurs in the 35,000–50,000 step range. Save this step number — you will use it in Unit 5.

Unit 4 Complete When...

Training has completed 50,000 steps and checkpoints are saved in ~/lerobot-policies/pick-place-v1/checkpoints/. The final L_reconstruction loss is below 0.1. You have identified your best checkpoint step based on the loss curves. You understand what L_kl is doing in your training run. You are ready to evaluate the policy in Unit 5.