Berkeley Deep Drive Dataset — BDD100K

Hisho Rajanathan
7 min readJan 9, 2021

--

This dataset was used for my thesis on: ‘Using Deep Learning for Object Detection in Real-Time for Self-Driving Cars’, where I compared YOLOv3, YOLOv4 and Faster R-CNN on the BDD100K to identify the highest accuracy for 2D object detection.

Object Detection using YOLOv4

Introduction

The Berkeley Deep Drive Dataset (BDD100K) is a diverse, annotated dataset. The BDD100K dataset can be used for 2D & 3D object detection, instance segmentation, lane markings and driveable areas.

Consisting of over 100,000 video clips or over 1,100 hours of driving videos in different conditions¹. Unlike the KITTI dataset, the BDD100K dataset uses camera’s to collect video information which it did in partnership with Nexar. The video was collected via Nexar dashcam at 720p and 30fps (Frames per Second)¹. The dataset comprises videos from the USA but in multiple cities: New York, Berkeley, San Francisco, and Bay Area. The diverse weather conditions and the different locations that the videos have been filmed in have been the winning factor for choosing this dataset.

The videos have not been synthesised and all videos have been captured by volunteers using the Nexar dashcam. This dataset provides realistic videos captured by the public in the USA.

The authors of the dataset have kindly performed frame extraction on 100,000 video clips to provide images with their respective bounding box coordinates of the objects, which has greatly reduced the pre-processing time for this dataset. Each video clip has its frame extracted from the 10th seconds of the video, allowing us to perform object detection. All images are in RGB and are 1280 x 720 pixels.

The properties of each of the videos have been given in a .JSON file including: weather, time of day, scene, objects, and their respective bounding boxes with other fields which will not be used in this project.

The bounding box co-ordinates are given in the formats x1, x2, y1 and y2.

Exploratory Data Analysis

The BDD100K is diverse in the sense that there are multiple weather conditions, multiple objects present in the frame as well as being filmed in different parts of the USA.

Different locations

This dataset will be beneficial as a holistic view of what autonomous vehicles will need to detect is gained.

Figure 1 Number of Bounding Boxes per Images

Figure 1 shows the number of bounding boxes in each image of the training dataset providing by the authors of BDD100K. The majority of images have nine to twenty-one different objects, however some images have 91 objects in the frame. Looking into the image with 91 bounding boxes, there are several pedestrians on the sidewalk which the autonomous vehicle will need to detect. In 2019, pedestrians accounted for 14% of all casualties², concluding that having images with a high number of pedestrians will be beneficial for this project.

Figure 2 Width vs Height of Bounding Boxes

Figure 2 shows the size of the bounding boxes per object category. All object categories have very small bounding box sizes, however Figure 2c shows the general size of bounding boxes for buses to be larger than the other categories. The original training dataset is heavily imbalanced, with more bounding boxes containing cars and trains being the lowest. This is a realistic representation of what an autonomous vehicle will face on the roads of the USA.

Perusing deeper into the dataset, the label file for the training dataset does not include information for some of the images supplied. There were 173 images without their corresponding label file. As a result, to reduce the time spent on manually creating bounding boxes for these images, these images were removed from the dataset.

Creating new training dataset

As advised by Fisher Yu, at Berkeley, one author of the dataset, the validation labels should be used for testing, hence the training images have to be split out into a new training and validation datasets. These missing images could have been manually labelled, however, to avoid any inconsistencies from manually labelling them and how Berkeley had labelled them, they were omitted from the dataset.

The process to create a new training and validation dataset was to list out the images in the training dataset that was supplied by Berkeley and apply shuffling to randomise the file names. Once this was complete, a 90:10 split was adopted to split the training dataset into a new training dataset and validation dataset; resulting in 62,873 images in the new training dataset and 6,990 images in the new validation dataset. The original validation dataset will now be used for testing.

from random import shuffle
from sklearn.model_selection import train_test_split
def randomize_files(file_list):
shuffle(file_list)
random_list = randomize_files(file_list)X = y= os.listdir('/train')X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)

The analysis below refers to the training dataset which has been explained above as to how it has been derived.

Weather Conditions

The video’s taken from the BDD100K dataset including a variety of weather such as clear weather, overcast, etc. Looking into the weather of the training dataset, we can see that several images occurred in clear weather.

Figure 3 Number of images with different weather conditions

Figure 3 highlights some weather conditions as ’undefined’ as these extracted frames from the videos are not clear.

Some discrepancies have been found in the dataset, such as images that have been tagged as snowy weather conditions.

Figure 4 Incorrectly labelled images

Figure 4a can be seen as ’clear’ weather, however, the image has been tagged as snow by the creators of the Berkeley dataset. Snow can be seen on the right of the image on the pavement (sidewalk), however, the weather condition is technically clear. Although we can question the nature of the labelling, it may be possible to add more labels to the dataset, however, this could complicate the object detection modelling.

We can see another example of incorrectly tagged images in Figure 4b, where the label shows that the image was taken in rainy conditions and at dawn/dusk, however, the conditions are clear and most likely during the day. It looks like it has been raining as the road surface looks shiny, but these labels are very subjective.

Different Scenes

The Berkeley dataset includes several scenes, which makes this dataset superior to other autonomous driving datasets such as the KITTI dataset³.

Figure 5 Number of images in different scenes

The KITTI dataset only captures videos from a small town in Germany, Karlsruhe, in excellent weather conditions which include rural and city scenes. The Berkeley dataset as seen in Figure 5 has seven different scenes including: residential, city, streets, highway, etc. The Berkeley dataset has images and videos from four different areas in America: New York, Berkeley, San Fransisco and The Bay Area¹. This gives enough differentiation between the images and should hopefully provide better results in object detection when testing the models on unseen data.

Time of Day

Figure 6 Number of images at different times of day

From Figure 6, the videos were taken at all times of the day from morning to night. This will help reduce augmentation of images as normally for datasets, image augmentation will need to be applied to imitate night conditions. This dataset includes several images taken at night, so we will see how the object detection models perform in night conditions or low light.

Figure 7 Number of images in different weather split by time of day

Figure 7 shows the split between the time of day and weather. The dataset is imbalanced as most of the dataset contains clear weather at night. Image augmentation, which will be explained later, will be applied for snowy weather, rainy, and foggy weather to help improve the models for object detection in these poor weather.

Object Categories

Figure 8 Number of instances of each category

This dataset is superior to other datasets because of the number of different objects present. Although the dataset is imbalanced, as there are more ’car’ objects, there is still a high variety of objects. The class imbalance could be ’fixed’ by duplicating the images to try and equal the number of ’car’ objects, however, all images in the training dataset contain ’car’. Moreover, no additional image outside of the BDD100K dataset has been used. This dataset is realistic as it was filmed in different weather conditions, different times of the day, and has various different objects. This is what autonomous vehicles would need to process to find the different objects on the road. An object that the dataset is missing is animals, which would need to be detected by autonomous vehicles in the real world. In the UK, there are over 10 million accidents with animals and 1 million with birds, which cost the UK government roughly £20 million each year⁴. This is one of the drawbacks of the dataset, as it does not account for all objects on the road, but it is hard to prepare a dataset with several objects. If this dataset does include animal detection, this will be positive, helping reduce the number of animal accidents on the road as well as save the number of animal deaths, human deaths and inadvertently saving the government money.

Thank you for reading this post!

Link to my GitHub repository: https://github.com/Hishok

Follow me on Medium for more posts.

--

--