Policy Choices
LeRobot ships three production-ready policy architectures. Choose one before you run training — you cannot switch mid-run.
ACT
Action Chunked Transformers. Best for dexterous single-arm manipulation. Trains in 1–3h on GPU. Predictable hyperparameters. Use this.
Diffusion Policy
Higher peak accuracy on precision tasks but 3–5x slower to train and infer. Use it after you have a working ACT baseline.
SmolVLA
Language-conditioned VLA. Use when your task requires natural language instructions or multi-task generalization. Requires more data.
ACT Training Command
Replace $HF_USER/pick-place-v1 with your dataset repo ID from Unit 3.
Recommended Hyperparameters for Single-Arm Pick-and-Place
| Parameter | Recommended | Why |
|---|---|---|
| num_steps | 50000 | Sufficient for 50–100 demos of a simple pick-and-place. Increase to 80k if your loss plateau occurs late. |
| batch_size | 32 | Standard for single-arm datasets. Reduce to 16 if you run out of GPU memory. |
| chunk_size | 100 | ACT plans 100 steps ahead. At 30fps this is ~3.3 seconds — a good planning horizon for pick-and-place. |
| n_action_steps | 100 | Must match chunk_size. Reduces inference frequency and smooths execution. |
| kl_weight | 10 | LeRobot default. Do not change unless L_kl stays near zero after 20k steps. |
| lr | 1e-5 | LeRobot default for ACT. Lower to 5e-6 if reconstruction loss oscillates instead of converging. |
Reading Training Logs
Training logs print to the terminal and to TensorBoard. Launch TensorBoard in a second terminal:
Then open http://localhost:6006 in your browser. Watch these curves:
loss/reconstruction (L_recon)
The primary training signal. Should decrease from ~2.5–3.5 to below 0.1 by 50,000 steps. A plateau above 0.15 after 40,000 steps usually means your dataset has too much variance — review Unit 3's good demo practices and consider recording more consistent demonstrations.
loss/kl (L_kl)
Rises slowly from near 0 to 5–20. This is expected behavior — the CVAE is learning a compact style embedding. If it exceeds 40, your demonstrations contain too much behavioral diversity. If it stays near 0 after 20k steps, the CVAE is not learning; increase kl_weight to 20.
train/loss (total loss)
L_recon + kl_weight × L_kl. Dominated by L_recon in early training. Should decrease monotonically. A total loss that rises after an initial decrease indicates learning rate decay is too aggressive — check the scheduler config.
Checkpoint Management
Checkpoints save every 5,000 steps to ~/lerobot-policies/pick-place-v1/checkpoints/. Do not assume the final checkpoint is the best. The policy can overfit at high step counts, especially with small datasets.
After training, identify your best checkpoint: it is the step where L_reconstruction reached its minimum before starting to plateau. For 50 demonstrations, this typically occurs in the 35,000–50,000 step range. Save this step number — you will use it in Unit 5.
Unit 4 Complete When...
Training has completed 50,000 steps and checkpoints are saved in ~/lerobot-policies/pick-place-v1/checkpoints/. The final L_reconstruction loss is below 0.1. You have identified your best checkpoint step based on the loss curves. You understand what L_kl is doing in your training run. You are ready to evaluate the policy in Unit 5.