Underactuated Robotics

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2024 semester. Lecture videos are available on YouTube.

Search these notes

PDF version of the notes

You can also download a PDF version of these notes (updated much less frequently) from here.

The PDF version of these notes are autogenerated from the HTML version. There are a few conversion/formatting artifacts that are easy to fix (please feel free to point them out). But there are also interactive elements in the HTML version are not easy to put into the PDF. When possible, I try to provide a link. But I consider the online HTML version to be the main version.

Preface
Chapter 1: Fully-actuated vs Underactuated Systems

Motivation

Honda's ASIMO vs. passive dynamic walkers
Birds vs. modern aircraft
Manipulation
The common theme

Definitions
Feedback Equivalence
Input and State Constraints

Nonholonomic constraints

Underactuated robotics
Goals for the course
Exercises

Model Systems

Chapter 2: The Simple Pendulum

Introduction
Nonlinear dynamics with a constant torque

The overdamped pendulum
The undamped pendulum with zero torque

Orbit calculations

The undamped pendulum with a constant torque

The torque-limited simple pendulum

Energy-shaping control

Exercises

Chapter 3: Acrobots, Cart-Poles, and Quadrotors

The Acrobot

Equations of motion

The Cart-Pole system

Equations of motion

Quadrotors

The Planar Quadrotor
The Full 3D Quadrotor

Balancing

Linearizing the manipulator equations
Controllability of linear systems

The special case of non-repeated eigenvalues
A general solution
Controllability vs. underactuated
Stabilizability of a linear system

LQR feedback

Partial feedback linearization

PFL for the Cart-Pole System

Collocated
Non-collocated

General form

Collocated linearization
Non-collocated linearization
Task-space partial feedback linearization

Swing-up control

Energy shaping
Cart-Pole
Acrobot
Discussion

Other model systems
Exercises

Chapter 4: Simple Models of Walking and Running

Limit Cycles

Poincaré Maps

Simple Models of Walking

The Rimless Wheel

Stance Dynamics
Foot Collision
Forward simulation
Poincaré Map
Fixed Points and Stability
Stability of standing still

The Compass Gait
The Kneed Walker
Curved feet
And beyond...

Simple Models of Running

The Spring-Loaded Inverted Pendulum (SLIP)

Analysis on the apex-to-apex map
SLIP Control
SLIP extensions

Hopping robots from the MIT Leg Laboratory

The 2D Hopper
Running on four legs as though they were one

Towards human-like running

A simple model that can walk and run
Juggling
Exercises

Chapter 5: Highly-articulated Legged Robots

A thought experiment

A spacecraft model
Robots with (massless) legs

Center of pressure (CoP) and Zero-moment point (ZMP)

The special case of flat terrain
An aside: Zero-moment point derivation
A note about impact dynamics

ZMP-based planning

Heuristic footstep planning
Planning trajectories for the center of mass

The ZMP "Stability" Metric

From a CoM plan to a whole-body plan

Centroidal dynamics

Spatial momentum
Generalization to multibody

Whole-Body Control
Footstep planning and push recovery
Beyond ZMP planning
Exercises

Chapter 6: Model Systems with Stochasticity

Dynamics of a Markov chain

Extended Example: The Rimless Wheel on Rough Terrain
Randomized smoothing of contact dynamics
Noise models for real robots/systems.

Nonlinear Planning and Control

Chapter 7: Dynamic Programming

Formulating control design as an optimization

Additive cost

Optimal control as graph search
Continuous dynamic programming

The Hamilton-Jacobi-Bellman Equation
Solving for the minimizing control
Numerical solutions for $J^*$

Value iteration with function approximation
Linear function approximators
Value iteration on a mesh
Neural fitted value iteration
Continuous-time systems

Extensions

Discounted and average cost formulations
Stochastic control for finite MDPs

Stochastic interpretation of deterministic, continuous-state value iteration

Linear Programming Dynamic Programming
Sums-of-Squares Dynamic Programming

Exercises

Chapter 8: Linear Quadratic Regulators

Basic Derivation

Local stabilization of nonlinear systems

Finite-horizon formulations

Finite-horizon LQR
Time-varying LQR
Local trajectory stabilization for nonlinear systems
Linear Quadratic Optimal Tracking
Linear Final Boundary Value Problems

Variations and extensions

Discrete-time Riccati Equations
LQR with input and state constraints
LQR on a manifold
LQR for linear systems in implicit form
LQR as a convex optimization
Finite-horizon LQR via least squares
Minimum-time LQR
Parameterized Riccati Equations

Exercises
Notes

Finite-horizon LQR derivation (general form)

Chapter 9: Lyapunov Analysis

Lyapunov Functions

Global Stability
LaSalle's Invariance Principle
Relationship to the Hamilton-Jacobi-Bellman equations
Lyapunov functions for estimating regions of attraction
Robustness analysis using "common Lyapunov functions"
Barrier functions

Lyapunov analysis with convex optimization

Linear systems
Global analysis for polynomial systems
Region of attraction estimation for polynomial systems

The S-procedure
Basic region of attraction formulation
The equality-constrained formulation
Searching for $V(\bx)$
Convex outer approximations
Regions of attraction codes in Drake

Robustness analysis using the S-procedure
Piecewise-polynomial systems
Rigid-body dynamics are (rational) polynomial

Linear feedback and quadratic forms
Alternatives for obtaining polynomial equations

Verifying dynamics in implicit form

Finite-time Reachability

Time-varying dynamics and Lyapunov functions
Finite-time reachability
Reachability via Lyapunov functions

Control design

Control design via alternations

Global stability
Maximizing the region of attraction

State feedback for linear systems
Control-Lyapunov Functions
Approximate dynamic programming with SOS

Upper and lower bounds on cost-to-go
Linear Programming Dynamic Programming
Sums-of-Squares Dynamic Programming

Alternative computational approaches

Sampling Quotient-Ring Sum-of-Squares
"Satisfiability modulo theories" (SMT)
Mixed-integer programming (MIP) formulations
Continuation methods

Neural Lyapunov functions
Contraction metrics
Other variations and extensions
Exercises

Chapter 10: Trajectory Optimization

Problem Formulation
Convex Formulations for Linear Systems

Direct Transcription
Direct Shooting
Computational Considerations
Continuous Time

Nonconvex Trajectory Optimization

Direct Transcription and Direct Shooting
Direct Collocation
Pseudo-spectral Methods

Dynamic constraints in implicit form

Solution techniques

Efficiently computing gradients
The special case of direct shooting without state constraints
Penalty methods and the Augmented Lagrangian
Zero-order optimization
Getting good solutions... in practice.

Local Trajectory Feedback Design

Finite-horizon LQR
Model-Predictive Control

Receding-horizon MPC
Recursive feasibility
MPC and Lyapunov functions

Case Study: A glider that can land on a perch like a bird

The Flat-Plate Glider Model
Trajectory optimization
Trajectory stabilization
Trajectory funnels
Beyond a single trajectory

Pontryagin's Minimum Principle

Lagrange multiplier derivation of the adjoint equations
Necessary conditions for optimality in continuous time

Variations and Extensions

Differential Flatness
Iterative LQR and Differential Dynamic Programming
Leveraging combinatorial optimization
Explicit model-predictive control

Exercises

Chapter 11: Policy Search

Problem formulation
Linear Quadratic Regulator

Policy Evaluation
A nonconvex objective in ${\bf K}$
No local minima
True gradient descent

More convergence results and counter-examples
Trajectory-based policy search

Infinite-horizon objectives
Search strategies for global optimization

Policy Iteration

Chapter 12: Sampling-based motion planning

Large-scale Incremental Search
Probabilistic RoadMaps (PRMs)

Getting smooth trajectories

Rapidly-exploring Random Trees (RRTs)

RRTs for robots with dynamics
Variations and extensions

Decomposition methods
Exercises

Chapter 13: Robust and Stochastic Control

Stochastic models
Costs and constraints for stochastic systems
Finite Markov Decision Processes
Linear optimal control

Stochastic LQR

Non-i.i.d. disturbances
Stochastic linear MPC

Worst-case control w/ bounded uncertainty

Common Lyapunov functions
Polytope dynamics
Robust MPC
Polytopic containment
Robust constrained LQR

Disturbance-based feedback parameterizations
$L_2$ gain

Dissipation inequalities
Small-gain theorem
Model uncertainty as a special case.

Robust LQR as $\mathcal{H}_\infty$
Linear Exponential-Quadratic Gaussian (LEQG)
Adaptive control
Structured uncertainty
Linear parameter-varying (LPV) control

Trajectory optimization

Monte-carlo trajectory optimization
Iterative $\mathcal{H}_2$/iLQG

Nonlinear analysis and control
Domain randomization
Extensions

Alternative risk/robustness metrics

Chapter 14: Feedback Motion Planning

Parameterized feedback policies as "skills"

The rules of composition
Parameterized controllers and Lyapunov functions

Probabilistic feedback coverage
Online planning

Chapter 15: Output Feedback (aka Pixels-to-Torques)

Background

The classical perspective
From pixels to torques

Static Output Feedback

A hardness result
Perhaps a history of observations?

Partially-observable Markov Decision Processes (POMDPs)
Linear systems w/ Gaussian noise

Linear Quadratic Regulator w/ Gaussian Noise (LQG)
Trajectory optimization with Iterative LQG

Observer-based Feedback

Luenberger Observer

Disturbance-based feedback
Optimizing dynamic policies

Convex reparameterizations of $H_2$, $H_\infty$, and LQG
Policy gradient for LQG
Sums-of-squares alternations
Teacher-student learning

Feedback from pixels

Chapter 16: Algorithms for Limit Cycles

Trajectory optimization
Lyapunov analysis

Transverse coordinates
Transverse linearization
Region of attraction estimation using sums-of-squares

Feedback design

For underactuation degree one.
Transverse LQR
Orbital stabilization for non-periodic trajectories

Chapter 17: Planning and Control through Contact

(Autonomous) Hybrid Systems

Hybrid trajectory optimization

Given a fixed mode sequence
Direct shooting

Deriving hybrid models: minimal vs floating-base coordinates
Discrete control (between events)
Hybrid LQR
Hybrid Lyapunov analysis

Contact-implicit trajectory optimization
Leveraging combinatorial optimization
Exercises

Estimation and Learning

Chapter 18: System Identification

Problem formulation

Equation error vs simulation error
Online optimization
Learning models for control

Parameter Identification for Mechanical Systems

Kinematic parameters and calibration
Estimating inertial parameters (and friction)
Simultaneous kinematic and inertial identification via lumped parameters.
Identification using energy instead of inverse dynamics.
Residual physics models with linear function approximators
Experiment design as a trajectory optimization
Online estimation and adaptive control
Identification with contact

Identifying (time-domain) linear dynamical systems

From state observations

Model-based Iterative Learning Control (ILC)
Compression using the dominant eigenmodes
Linear dynamics in a nonlinear basis

From input-output data (the state-realization problem)
Adding stability constraints
Autoregressive models
Statistical analysis of learning linear models

Identification of finite (PO)MDPs

From state observations
Identifying Hidden Markov Models (HMMs)

Neural network models

Generating training data
From state observations
State-space models from input-output data (recurrent networks)
Input-output (autoregressive) models
Particle-based models
Object-centric models
Modeling stochasticity
Control design for neural network models

Alternatives for nonlinear system identification
Identification of hybrid systems
Task-relevant models
Exercises

Chapter 19: State Estimation

Chapter 20: Model-Free Policy Search

Policy Gradient Methods

The Likelihood Ratio Method (aka REINFORCE)
Sample efficiency
Stochastic Gradient Descent
The Weight Pertubation Algorithm
Weight Perturbation with an Estimated Baseline
REINFORCE w/ additive Gaussian noise
Summary

Sample performance via the signal-to-noise ratio.

Performance of Weight Perturbation

Chapter 21: Imitation Learning

Behavior cloning

Visuomotor policies (aka control from pixels)
Behavior cloning as sequence modeling
Supervised learning in a feedback loop: dealing with distribution shift
Dealing with suboptimal and multimodal demonstrations

Architectures for visuomotor policies

Desiderata
Output/action decoders
(Multi-modal) input encoders

Diffusion Policy

Denoising Diffusion models
Diffusion Policy
Diffusion Policy for Linear Policies

State-feedback
Output-feedback
Action sequence prediction

Inverse reinforcement learning
Vistas

Multitask / foundation models for control
Distributed decentralized learning (aka "fleet learning")
Be rigorous

Appendix

Appendix A: Drake

Pydrake
Online Jupyter Notebooks

Running on Deepnote
Running on Google Colab
Enabling licensed solvers

Running on your own machine
Getting help

Appendix B: Multi-Body Dynamics

Deriving the equations of motion
The Manipulator Equations

Recursive Dynamics Algorithms
Bilateral Position Constraints
Bilateral Velocity Constraints
Hybrid models via constraint forces

The Dynamics of Contact

Compliant Contact Models
Rigid Contact with Event Detection

Impulsive Collisions
Putting it all together

Time-stepping Approximations for Rigid Contact

Complementarity formulations
Anitescu's convex formulation
Todorov's regularization
The Semi-Analytic Primal (SAP) solver
Beyond Point Contact

Variational mechanics

Virtual work
D'Alembert's principle and the force of inertia
Principle of Stationary Action
Hamiltonian Mechanics

Exercises

Appendix C: Optimization and Mathematical Programming

Optimization software
General concepts

Convex vs nonconvex optimization
Constrained optimization with Lagrange multipliers

Convex optimization

Linear Programs/Quadratic Programs/Second-Order Cones
Semidefinite Programming and Linear Matrix Inequalities

Semidefinite programming relaxation of general quadratic optimization

Sums-of-squares optimization

Sums of squares on a Semi-Algebraic Set
Sums of squares optimization on an Algebraic Variety
DSOS and SDSOS

Solution techniques

Nonlinear programming

Second-order methods (SQP / Interior-Point)
First-order methods (SGD / ADMM)

Penalty methods
Projected Gradient Descent

Zero-order methods (CMA)
Example: Inverse Kinematics

Mixed-discrete (combinatorial) and continuous optimization

Search, SAT, First order logic, SMT solvers, LP interpretation
Mixed-integer convex optimization
Graphs of Convex Sets

Shortest path problems
Applications

"Black-box" optimization

Appendix D: An Optimization Playbook

Appendix E: Miscellaneous

You can find documentation for the source code supporting these notes here.

Preface

This book is about nonlinear dynamics and control, with a focus on mechanical systems. I've spent my career thinking about how to make robots move robustly, but also with speed, efficiency, and grace. I believe that this is best achieved through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. These notes contain selected material from dynamical systems theory, as well as linear and nonlinear control. But the dynamics of our robots quickly get too complex for us to handle with a pencil-and-paper approach. As a result, the primary focus of these notes is on computational approaches to control design, especially using optimization and machine learning.

When I started teaching this class, and writing these notes, the computational approach to control was far from mainstream in robotics. I had just finished my Ph.D. focused on reinforcement learning (applied to a bipedal robot), and was working on optimization-based motion planning. I remember sitting at a robotics conference dinner as a young faculty, surrounded by people I admired, talking about optimization. One of the senior faculty said "Russ: the people that talk like you aren't the people that get real robots to work." Wow, have things changed. Now almost every advanced robot is using optimization or learning in the planning/control system.

Today, the conversations about reinforcement learning (RL) are loud and passionate enough to drown out almost every other conversation in the room. Ironically, now I am the older professor and I find myself still believing in RL, but not with the complete faith of my youth. There is so much one can understand about the structure of the equations that govern our mechanical systems; algorithms which don't make use of that structure are missing obvious opportunities for data efficiency and robustness. The dream is to make the learning algorithms discover this structure on their own; but even then it pays for you (the designer) to understand the optimization landscape the learning systems are operating on. That's why my goal for this course is to help you discover this structure, and to learn how to use this structure to develop stronger algorithms and to guide your scientific endeavors into learning-based control.

I'll go even further. I'm willing to bet that our views of intelligence in 10-20 years will look less like feedforward networks with a training mode and a test mode, and more like a system with dynamics that ebb and flow in a beautiful dance with streams of incoming data and the ever-changing dynamics of the environment. These systems will move more flexibly between perception, forward prediction / sequential decision making, storing and retrieving long-term memories, and taking action. Dynamical systems theory offers us a way to understand and harness the complexity of these systems that we are building.

Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress the capabilities we are trying to develop. Many of the disciplines that I am drawing from are traditionally very rigorous, to the point where the basic ideas can be hard to penetrate for someone that is new to the field. I've made a conscious effort in these notes to keep a very informal, conversational tone even when introducing these rigorous topics, and to reference the most powerful theorems but only to prove them when that proof would add particular insights without distracting from the mainstream presentation. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any enthusiastic roboticist.

Organization

The material in these notes is organized into a few main parts. "Model Systems" introduces a series of increasingly complex dynamical systems and overviews some of the relevant results from the literature for each system. "Nonlinear Planning and Control" introduces quite general computational algorithms for reasoning about those dynamical systems, with optimization theory playing a central role. Many of these algorithms treat the dynamical system as known and deterministic until the last chapters in this part which introduce stochasticity and robustness. "Estimation and Learning" follows this up with techniques from statistics and machine learning which capitalize on this viewpoint to introduce additional algorithms which can operate with less assumptions on knowing the model or having perfect sensors. The book closes with an "Appendix" that provides slightly more background (and references) for the main topics used in the course.

The order of the chapters was chosen to make the book valuable as a reference. When teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem.

insert figure showing progression of problems here. pendulum -> cp/acro -> walking ... with chapter numbers associated.

Software

All of the examples and algorithms in this book, plus many more, are now available as a part of our open-source software project: . is a C++ project, but in this text we will use Drake's Python bindings. I encourage super-users or readers who want to dig deeper to explore the C++ code as well (and to contribute back).

Please see the appendix for specific instructions for using along with these notes.

First chapter

Search these notes

PDF version of the notes

Table of Contents

Preface

Organization

Software