Super-Human Performance in Gran Turismo Sports Using Deep Reinforcement Learning

Florian Fuchs, Yunlong Song, Elia Kaufmann, Davide Scaramuzza, Peter Duerr

Innovation Level Breakthrough
Reviewer Rating 98% Approval
Impact Factor High

Video

Abstract

Autonomous racing presents formidable challenges such as minimizing lap times under uncertain dynamics and extreme vehicle control conditions. This paper introduces a deep reinforcement learning (RL) system applied within the Gran Turismo Sport (GTS) simulator, leveraging course-progress-based proxy rewards to overcome the sparsity of traditional lap-time objectives. Utilizing Soft Actor-Critic (SAC), the authors train a neural policy network that surpasses both the built-in AI and the best human lap times across multiple racing scenarios. The approach demonstrates generalization to changes in dynamics and track layout, highlighting the potential of model-free RL in high-performance autonomous driving applications.

Key Takeaways

1

Super-Human Performance with Deep RL

The trained agent consistently outperforms the top 1% of human drivers and GTS built-in AI across three distinct racing scenarios using deep reinforcement learning.

2

Course-Progress Proxy Reward

A continuous reward based on track centerline progress, along with kinetic-energy-scaled wall penalties, enables efficient training despite sparse overall lap-time rewards.

3

Realism and Practical Constraints

The policy is trained on real-time, real-fidelity simulation using consumer-grade PlayStation 4 hardware, with observation constraints similar to those available to human drivers.

4

Robust Generalization

Without retraining, the agent maintains high performance under noise, inference delay, and moderate changes in track dynamics and car models.

Introduction

Autonomous racing entails precise high-speed control and trajectory generation under dynamic and uncertain conditions. Traditional planning and control methods, while effective, suffer from scalability and flexibility issues. This paper pioneers the use of deep reinforcement learning in the GTS simulator a platform used for professional driver scouting—to develop an agent capable of not only competing with but outperforming expert human drivers. The authors circumvent common RL challenges in sparse-reward environments by introducing a carefully constructed proxy reward system and applying the SAC algorithm for policy training.

Prior studies in the autonomous racing can be grouped into three categories:

  • Trajectory planning and following: Model Predictive Control (MPC), Model Predictive Path Integral control (MPPI)
  • Supervised learning: Imitation Learning (IL), Autonomous Land Vehicle in a Neural Network (ALVINN), Convolutional Neural Network (CNN) controller
  • Reinforcement learning: Soft-Actor-Critic (SAC)

Methodology

🎯 Step 1. Designing the Reward Function

Create a proxy reward that approximates lap time by evaluating how much progress the car makes along the track's centerline in small intervals. This gives a dense feedback signal, helping the agent learn more effectively.

But here’s the twist: fast driving often means flirting with the edge of control. To prevent the AI from simply bouncing off track walls (a cheap but effective strategy), the reward includes a penalty for wall contact, scaled by the car’s kinetic energy. The final reward at each timestep \(t\) is:

\[r_t = r^{\text{prog}}_t - \begin{cases} c_w |\mathbf{v}_t|^2 & \text{if wall contact} \\ 0 & \text{otherwise} \end{cases}\]

This encourages the agent to be fast and smooth just like a real racing pro.

🎯 Step 2: Representing the Car’s World

To make smart decisions, the AI needs to “see” its environment. But it doesn’t get pixels or radar scans—it uses structured game state data similar to what a human might infer visually:

  • Velocity & Acceleration \((\vec{v}_t, \dot{\vec{v}}_t)\)
  • Heading angle relative to the centerline
  • Rangefinder distances in 180° front arc (think of LIDAR for edges)
  • Previous steering action
  • Wall contact flag
  • Upcoming track curvature

These features are concatenated into a single observation vector \(s_t\), which serves as input to the neural network.

🎯 Step 3: Defining the Action Space

Instead of separate throttle and brake controls, the authors simplify things with a single combined throttle/brake command \((\omega_t)\), alongside a steering angle \((\delta_t)\). This reflects the observation that top human drivers rarely brake and accelerate simultaneously.

  • \(\delta_t \in [-\frac{\pi}{6}, \frac{\pi}{6}]\) radians
  • \(\omega_t \in [-1, 1]\), where -1 = full brake, 1 = full throttle

🎯 Step 4: Training the Neural Network (Policy)

The policy is trained using the Soft Actor-Critic (SAC) algorithm, a state-of-the-art model-free RL method known for stable and sample-efficient learning.

The architecture includes:

  • Policy network: Outputs steering and throttle/brake
  • Two Q-networks: Estimate value of actions
  • One value network: Helps stabilize training

🎯 Step 5: Real-Time Simulation and Training

The magic happens on four PlayStation 4 consoles, each running 20 cars in parallel in the GTS environment. The AI collects real driving experiences (observations, actions, rewards), stores them in a replay buffer, and samples from this buffer to update its policy.

Training occurs in real-time, constrained by the PS4 hardware and simulator frame rate (10 Hz during training, 60 Hz during evaluation). This is no synthetic setup—it’s close to how an AI would need to operate in a real-world autonomous racing system.

method1
Figure 1. Overview of the training pipeline: A distributed system collects data from 4 PlayStation 4 consoles running GTS simulations, with extracted features sent via the Gran Turismo API to a policy network. The policy is trained using Soft Actor-Critic (SAC) on a desktop PC, leveraging a replay buffer and mini-batch updates in parallel.

Results

results1
Figure 2. The tracks and cars used as reference settings to compare our approach to human drivers.

The learned policy achieved lap times faster than the best human players in all three race settings:

  • Setting A (Audi TT Cup, Track A): 0.15 seconds faster
  • Setting B (Mazda Demio, Track A): 0.04 seconds faster
  • Setting C (Audi TT Cup, Track C): 0.62 seconds faster
  • The policy mimics professional driving behavior such as out-in-out cornering, anticipatory braking, and maximized exit speeds. Variance in lap time was significantly lower than that of expert humans, suggesting high consistency. Robustness tests under observation noise and inference delay show graceful degradation, maintaining top-tier performance up to 50 ms delay and 9% observation noise.

    result-table1
    Table 1. Time Trial Comparisons Between our Approach, Human Online Competitors, and the Built-in GTS AI for the 3 Race Settings.

    Discussion

    The paper convincingly demonstrates that end-to-end model-free RL can match and exceed human capabilities in high-speed racing. The approach’s strength lies in its simplicity and adaptability, eliminating the need for handcrafted cost functions or model-based control. Limitations include narrow generalization across unseen cars/tracks and lack of multi-agent interaction. Future work could explore meta-learning and policy distillation for broader applicability and competitive racing environments.

    Conclusion

    This work sets a new benchmark in autonomous racing by achieving super-human performance through a combination of realistic simulation, thoughtful reward design, and scalable RL training. It validates the potential of model-free deep reinforcement learning for real-world-like vehicle control and opens pathways for its application in broader autonomous driving contexts.

    Reviewer Notes

    Key Points

    1

    This paper introduces a deep RL framework that achieves super-human racing performance in Gran Turismo Sport.

    2

    Using a novel reward function based on course progress and kinetic wall penalties, the SAC-trained policy matches expert-level driving behavior.

    3

    The approach demonstrates strong robustness and realism, paving the way for future autonomous racing systems.

    Extended Analysis

    1

    Generalization to unseen cars/tracks remains limited without retraining.

    2

    Lack of interaction modeling (no multi-agent racing or obstacle avoidance).

    3

    Dependency on proprietary simulation environments may hinder reproducibility.