Press enter or click to view image in full size
Press enter or click to view image in full size
1. Origin
Amazon wanted to put Reinforcement Learning in the hands of developers. Something like hands-on learning. And also Amazon venture into autonomous cars segment.
Deepracer is a 1/18 scale autonomous car, which runs on ubuntu running on Intel Atom processor and has a RC car chassis and engine. The secret sauce is Amazon’s deeplens camera.
Press enter or click to view image in full size
Press enter or click to view image in full size
AWS DeepRacer includes a fully-configured cloud environment that you can use to train your Reinforcement Learning models. It takes advantage of the new Reinforcement Learning feature in Amazon SageMaker and also includes a 3D simulation environment powered by AWS RoboMaker. You can train an autonomous driving model against a collection of predefined race tracks included with the simulator and then evaluate them virtually or download them to a AWS DeepRacer car and verify performance in the real world.
Press enter or click to view image in full size
2. RL — Reinforcement Learning
Press enter or click to view image in full size
Press enter or click to view image in full size
Types of machine learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
Press enter or click to view image in full size
Reinforcement Learning terms
- Agent = Deepracer
- Environment = track
- State = 1 round around the track
- Action (that agent can take) = steer right or left
- Reward (when the agent does a good thing)
- Episode (start to the end of the state, eg. 1 round around a track )
Press enter or click to view image in full size
REWARD FUNCTION
- It’s the core of RL.
Press enter or click to view image in full size
Here it’s better to have a reward function such that the agent gets more reward for the central line.
Press enter or click to view image in full size
Once the car runs across a track, it takes 15 pictures/sec.
Taking pictures is 1 step /per state
R = reward function.
R comes after they have taken action, and completed 1 state.
So R is a cumulative reward.
Now we use two different functions
- VALUE FUNCTION: One for reward
- POLICY FUNCTION: one to determine the action
Deepracer they use VANILLA POLICY GRADIENT and PPO (proximal policy optimization)(https://openai.com/blog/openai-baselines-ppo/)
Using gradient ascent, since one wants to maximize award.
Press enter or click to view image in full size
3. Virtual simulator
Press enter or click to view image in full size
What AWS services are being used?
Press enter or click to view image in full size
Simulation video on the console using Amazon Kinesis
Cloudwatch: to save logos
while (training)
{
ROBOMAKER: takes photos and passes to Sagemaker
SAGEMAKER : does training, after training saves the model .
Pass back to robomaker.
}
Press enter or click to view image in full size
To train and simulate
Track info
HyperParameters to play around Once you create your own model, there are parameters once can edit, in terms of
ACTION INFO FUNCTION
Press enter or click to view image in full size
REWARD FUNCTION
Press enter or click to view image in full size
HOW TO TRAIN USING AWS DEEPRACER
TRAINING STARTED for DEFAULT MODEL
Press enter or click to view image in full size
Press enter or click to view image in full size
SOFTWARE ARCHITECTURE
Press enter or click to view image in full size
I had a chance to learn about this amazing technology and participate in the deepracer league. Although a bit sad that after getting to #1 a few times, I ended up at position #9. Then too it was super fun.
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
References :
- Workshop
2. DeepRacer page
3. Robomaker