Autonomous racing presents formidable challenges such as minimizing lap times under uncertain dynamics and extreme vehicle control conditions. This paper introduces a deep reinforcement learning (RL) system applied within the Gran Turismo Sport (GTS) simulator, leveraging course-progress-based proxy rewards to overcome the sparsity of traditional lap-time objectives. Utilizing Soft Actor-Critic (SAC), the authors train a neural policy network that surpasses both the built-in AI and the best human lap times across multiple racing scenarios. The approach demonstrates generalization to changes in dynamics and track layout, highlighting the potential of model-free RL in high-performance autonomous driving applications.
The trained agent consistently outperforms the top 1% of human drivers and GTS built-in AI across three distinct racing scenarios using deep reinforcement learning.
A continuous reward based on track centerline progress, along with kinetic-energy-scaled wall penalties, enables efficient training despite sparse overall lap-time rewards.
The policy is trained on real-time, real-fidelity simulation using consumer-grade PlayStation 4 hardware, with observation constraints similar to those available to human drivers.
Without retraining, the agent maintains high performance under noise, inference delay, and moderate changes in track dynamics and car models.
Autonomous racing entails precise high-speed control and trajectory generation under dynamic and uncertain conditions. Traditional planning and control methods, while effective, suffer from scalability and flexibility issues. This paper pioneers the use of deep reinforcement learning in the GTS simulator a platform used for professional driver scouting—to develop an agent capable of not only competing with but outperforming expert human drivers. The authors circumvent common RL challenges in sparse-reward environments by introducing a carefully constructed proxy reward system and applying the SAC algorithm for policy training.
Prior studies in the autonomous racing can be grouped into three categories:
🎯 Step 1. Designing the Reward Function
Create a proxy reward that approximates lap time by evaluating how much progress the car makes along the track's centerline in small intervals. This gives a dense feedback signal, helping the agent learn more effectively.
But here’s the twist: fast driving often means flirting with the edge of control. To prevent the AI from simply bouncing off track walls (a cheap but effective strategy), the reward includes a penalty for wall contact, scaled by the car’s kinetic energy. The final reward at each timestep \(t\) is:
\[r_t = r^{\text{prog}}_t - \begin{cases} c_w |\mathbf{v}_t|^2 & \text{if wall contact} \\ 0 & \text{otherwise} \end{cases}\]
This encourages the agent to be fast and smooth just like a real racing pro.
🎯 Step 2: Representing the Car’s World
To make smart decisions, the AI needs to “see” its environment. But it doesn’t get pixels or radar scans—it uses structured game state data similar to what a human might infer visually:
These features are concatenated into a single observation vector \(s_t\), which serves as input to the neural network.
🎯 Step 3: Defining the Action Space
Instead of separate throttle and brake controls, the authors simplify things with a single combined throttle/brake command \((\omega_t)\), alongside a steering angle \((\delta_t)\). This reflects the observation that top human drivers rarely brake and accelerate simultaneously.
🎯 Step 4: Training the Neural Network (Policy)
The policy is trained using the Soft Actor-Critic (SAC) algorithm, a state-of-the-art model-free RL method known for stable and sample-efficient learning.
The architecture includes:
🎯 Step 5: Real-Time Simulation and Training
The magic happens on four PlayStation 4 consoles, each running 20 cars in parallel in the GTS environment. The AI collects real driving experiences (observations, actions, rewards), stores them in a replay buffer, and samples from this buffer to update its policy.
Training occurs in real-time, constrained by the PS4 hardware and simulator frame rate (10 Hz during training, 60 Hz during evaluation). This is no synthetic setup—it’s close to how an AI would need to operate in a real-world autonomous racing system.
The learned policy achieved lap times faster than the best human players in all three race settings:
The policy mimics professional driving behavior such as out-in-out cornering, anticipatory braking, and maximized exit speeds. Variance in lap time was significantly lower than that of expert humans, suggesting high consistency. Robustness tests under observation noise and inference delay show graceful degradation, maintaining top-tier performance up to 50 ms delay and 9% observation noise.
The paper convincingly demonstrates that end-to-end model-free RL can match and exceed human capabilities in high-speed racing. The approach’s strength lies in its simplicity and adaptability, eliminating the need for handcrafted cost functions or model-based control. Limitations include narrow generalization across unseen cars/tracks and lack of multi-agent interaction. Future work could explore meta-learning and policy distillation for broader applicability and competitive racing environments.
This work sets a new benchmark in autonomous racing by achieving super-human performance through a combination of realistic simulation, thoughtful reward design, and scalable RL training. It validates the potential of model-free deep reinforcement learning for real-world-like vehicle control and opens pathways for its application in broader autonomous driving contexts.
This paper introduces a deep RL framework that achieves super-human racing performance in Gran Turismo Sport.
Using a novel reward function based on course progress and kinetic wall penalties, the SAC-trained policy matches expert-level driving behavior.
The approach demonstrates strong robustness and realism, paving the way for future autonomous racing systems.
Generalization to unseen cars/tracks remains limited without retraining.
Lack of interaction modeling (no multi-agent racing or obstacle avoidance).
Dependency on proprietary simulation environments may hinder reproducibility.