F1 Car Semantic Segmentation

Before 2019 I couldn't have told you who Lewis Hamilton or Fernando Alonso are but after hearing about F1 on a podcast I quickly became enamored with the technological complexity and sporting achievements featured in the sport. The combination of world class athletes and multi-million dollar machines reach speeds over 350km/h and more impressively sustain 6g horizontal forces while cornering at 250km/h.

Since my introduction to the sport I have watched every race and consumed an unreasonable amount of technical and sporting analysis in between. One of the draws of the sport is the incredible depth that can be applied to analysis of any of its aspects. Even the seemingly trivial components of the sport have been heavily iterated and optimized by the teams. This obsession is matched by the fans, making videos, writing articles and long threads about anything tangentially related to performance.

There is a lot of great data made available by F1 which can provide great insights into car dynamics and driver performance. This is mainly available through the gem that is the Ergast API and the FastF1 module for Python. I have used this API in the past to have some fun with the data and I will likely continue in the future but for the best analysis of this specific data I can point to F1 Data Analysis who makes great visualizations and insights.


Despite the high quality(but tragically low frequency) telemetry data being provided by F1, the focus of this project is to try out some semantic image segmentation transfer learning to identify a leading F1 car from the perspective of the onboard camera installed on all F1 cars.

Rear Onboard Perspective Front Onboard Perspective

Semantic image segmentation can be summarized as classifying every pixel of an image to one of the known classes. Transfer learning is the machine learning concept of training a neural network on a certain task and then repurposing the model and training it on a new task. This is often applied when creating a large, complex model for a domain with a small training set. This is the situation that I find myself in as semantic image segmentation is a very complex problem and I have to construct the training set myself. I will spare any further explanation of these topics as there are great articles and courses that provide a lot more insight than I can.

TODO: Potentially link to further resources above.

Dataset Creation

Unsurprisingly I could not find any datasets for my exact case of identifying F1 cars in an image so I had to create it myself. I downloaded a couple of videos from the F1 Youtube channel(great content) and pulled out specific onboard frames that contained a single leading vehicle. I was able to find ~140 images that contained a variety of teams, tracks, and relative car positions.

The difficult part of the dataset was the manual annotation of the images. I searched far and wide for an annotation tool and I couldn't find anything free and usable until I stumbled upon CVAT. This web app provides a variety of tools and methods for annotating and exporting images. While the tool helped, the whole process still took hours just to label the handful of images.

Image Mask 1 Image Mask 2


PyTorch is my machine learning library of choice which is convenient as it includes a few built-in models and pre-trained weights. PyTorch provides a number of models for different applications but the one that I used is ResNet101 with weights pre-trained on the ImageNet dataset. The model is based on a very interesting residual connection architecture while ImageNet is a well known dataset consisting of 14M+ images often used for model training and testing.

The only change I made to the model is the replacement of the final classifer layer from 21 output classes to the 2 output classes that I need.(car and 'not car') Then I froze the weights of the entire model except for the classifier so that the training will only modify the classifier. This allows the future training to leverage the feature identification of the original network while adapting the final few layers to solve the specific task of identifying an F1 car in an image.

Transformations and Training

Minimal transformations are applied to the input during training and prediction. I simply normalize the pixel values using values pre-calculated from the training set and there is optional input resizing but performance is affected for smaller image sizes. I decided not to include any data augmentation like rotation or cropping because all predictions will be made from the same camera angle with the same perspective on the leading car. The augmentation would provide more samples but in variations that wouldn't occur in application.

Even with the small training dataset it only takes a few epochs for the network to converge on 'relatively clean' segmentations. I have tested a number of different optimizers, learning rates, and transformations with the best combination seeming to be an Adam optimizer with a learning rate varying from 1e-3 to 1e-5 over a few epochs. I used a batch size of 6 for training as I am somewhat lacking on memory and just loading the model can start to stress my personal system.


I am quite pleased with the results, providing a patchy segmentation of the unseen sample images as shown below. Analyzing more of the masks it is mainly a tyre segmentation model which makes sense as the tyres comprise most of the profile of the car from the level of the onboard camera. Additionally, the tyres are consistent in colour and shape while the body of the cars can vary in colour and contour depending on the team, resulting in less 'confident' segmentation.

Segmentation Process

An entire clip worth of images can be processed by the model and stitched back together with ffmpeg to generate the annotated clips below.

As a whole this isn't something that I would put into production as the segmentation is still quite spotty but the segmentation information can be used to create a nice bounding box or other floating tag, potentially combining it with telemetry data to show the identity and speed of the leading car. If I could speed this up to realtime analysis and sync it with the FastF1 data then this could be a fun continuation of the project.