Object Detection Pipeline using Remo
In this tutorial, we will use Remo to accelerate and improve the process of building a transfer learning pipeline for an Object Detection task.
In particular, we will:
- Use Remo to browse through our images and annotations
- Use Remo to understand the properties of the dataset and annotations by visualizing statistics.
- Create a custom train, test, valid split in-place using Remo image tags.
- Fine tune a pre-trained FasterRCNN model from torchvision and do some inference
- Visually compare bounding box predictions with the ground truth
Along the way, we will see how the Dataset visualization provided Remo helps to gather insights to improve the dataset and the model.
Let's start by importing the relevant libraries:
from PIL import Image
import os
import glob
import random
import csv
random.seed(4)
import pandas as pd
import numpy as np
import tqdm
import torch
from torch.utils.data import DataLoader, Dataset
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as transforms
import remo
remo.set_viewer('jupyter')
Adding Data to Remo¶
-
The dataset used in this example is a subset of the Open Images Dataset.
-
The directory structure of the dataset is:
├── object_detection_dataset ├── images ├── image_1.jpg ├── image_2.jpg ├── ... ├── annotations ├── annotations.csv ├── model_predictions.csv
# The dataset will be extracted in a new folder
if not os.path.exists('object_detection_dataset.zip'):
!wget https://s-3.s3-eu-west-1.amazonaws.com/object_detection_dataset.zip
!unzip -qq object_detection_dataset.zip
else:
print('Files already downloaded')
# The path to the folders
path_to_images = './object_detection_dataset/images/'
path_to_annotations = './object_detection_dataset/annotations/'
annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')
To visualise the labels as strings rather than IDs, we can use a dictionary mapping the two of them.
# Mapping between Class name and Index
cat_to_index = {'Wheel' : 1,
'Car' : 2,
'Person' : 3,
'Land vehicle' : 4,
'Human body' : 5,
'Plant' : 6,
'Tire' : 7,
'Vehicle' : 8,
'Vehicle registration plate' : 9}
Train / test split¶
In Remo, we can use tags to organise our images. Among other things, this allows us to generate train / test splits without the need to move image files around.
To do this, we just need to pass a dictionary (mapping tags to the relevant images paths) to the function
remo.generate_image_tags()
.
im_list = [os.path.abspath(i) for i in glob.glob(path_to_images + '/**/*.jpg', recursive=True)]
im_list = random.sample(im_list, len(im_list))
# Definining the train test split
train_idx = round(len(im_list) * 0.4)
valid_idx = train_idx + round(len(im_list) * 0.3)
test_idx = valid_idx + round(len(im_list) * 0.3)
# Creating a dictionary with tags
tags_dict = {'train' : im_list[0:train_idx],
'valid' : im_list[train_idx:valid_idx],
'test' : im_list[valid_idx:test_idx]}
train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv')
remo.generate_image_tags(tags_dictionary = tags_dict,
output_file_path = train_test_split_file_path,
append_path = True)
Create a dataset¶
To create a dataset we can use remo.create_dataset()
, specifying the path to data and annotations.
The class encoding (if required) is passed via a dictionary.
For a complete list of formats supported, you can refer to the docs.
# The annotations.csv is generated in the same path of the sub-folder
object_detection_dataset = remo.create_dataset(name = 'object-detection-dataset',
local_files = [path_to_images, path_to_annotations],
annotation_task = 'Object Detection')
Visualizing the dataset
To view and explore images and labels, we can use Remo directly from the notebook. We just need to call dataset.view()
.
object_detection_dataset.view()
Looking at the dataset, we notice some interesting points:
- There is a significant degree of overlap in bounding boxes of different classes (e.g. Wheel and Car)
- Bounding box sizes vary a good amount across Wheel and Car objects
- Pictures of Cars can be taken from different angles
Dataset Statistics
Using Remo, we can quickly visualize some key Dataset properties that can help us with our modelling, without needing to write extra boilerplate code.
This can be done either from code, or using the visual interface.
object_detection_dataset.get_annotation_statistics()
[{'AnnotationSet ID': 347, 'AnnotationSet name': 'Object detection', 'n_images': 7, 'n_classes': 9, 'n_objects': 50, 'top_3_classes': [{'name': 'Wheel', 'count': 28}, {'name': 'Car', 'count': 9}, {'name': 'Tire', 'count': 4}], 'creation_date': None, 'last_modified_date': '2020-09-01T11:10:37.164406Z'}]
object_detection_dataset.view_annotation_stats()
Looking at the statistics we can gain some useful insights like:
-
Some labels are not present in the test and valid set, but are present in the training set. This means we will not get an indicative model performance for these class (which is fine for the tutorial's sake, but in real life we would want to fix that)
-
The Wheel class has more instances than any other class in the dataset. Higher reported performance on this class might be caused by this.
Feeding Data into PyTorch¶
Here we start working with PyTorch. To load the data, we will define a custom PyTorch Dataset
object (as usual with PyTorch).
In order to adapt this to your dataset, the following are required:
- train_test_valid_split (Path to Tags): path to tags csv file for Train, Test, Validation split. Format: file_name, tag.
- annotations (Path to Annotations): path to the annotations CSV File. Format : file_name, classes, xmin, ymin, xmax, ymax,
- mapping (Mapping): a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}
class ObjectDetectionDataset(Dataset):
"""
Custom PyTorch Dataset Class to facilitate loading data for the Object Detection Task
"""
def __init__(self,
annotations,
train_test_valid_split,
mapping = None,
mode = 'train',
transform = None):
"""
Args:
annotations: The path to the annotations CSV file. Format: file_name, classes, xmin, ymin, xmax, ymax
train_test_valid_split: The path to the tags CSV file for train, test, valid split. Format: file_name, tag
mapping: a dictionary containing mapping of class name and class index. Format : {'class_name' : 'class_index'}, Default: None
mode: Mode in which to instantiate class. Default: 'train'
transform: The transforms to be applied to the image data
Returns:
image : Torch Tensor, target: Torch Tensor, file_name : str
"""
self.mapping = mapping
self.transform = transform
self.mode = mode
self.path_to_images = './object_detection_dataset/images/'
# Loading the annotation file (same format as Remo's)
my_data = pd.read_csv(annotations)
# Here we append the file path to the filename.
# If dataset.export_annotations_to_file was used to create the annotation file, it would feature by default image file paths
my_data['file_name'] = my_data['file_name'].apply(lambda x : os.path.abspath(f'{self.path_to_images}{x}'))
my_data = my_data.set_index('file_name')
# Loading the train/test split file (same format as Remo's)
my_data['tag'] = pd.read_csv(train_test_valid_split, index_col='file_name')
my_data = my_data.reset_index()
# Load only Train/Test/Split depending on the mode
my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True)
self.data = my_data
self.file_names = self.data['file_name'].unique()
def __len__(self) -> int:
return self.file_names.shape[0]
def __getitem__(self, index: int):
file_name = self.file_names[index]
records = self.data[self.data['file_name'] == file_name].reset_index()
image = np.array(Image.open(file_name), dtype=np.float32)
image /= 255.0
if self.transform:
image = self.transform(image)
# here we are assuming we don't have labels for the test set
if self.mode != 'test':
boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
area = torch.as_tensor(area, dtype=torch.float32)
if self.mapping is not None:
labels = np.zeros((records.shape[0],))
for i in range(records.shape[0]):
labels[i] = self.mapping[records.loc[i, 'classes']]
labels = torch.as_tensor(labels, dtype=torch.int64)
else:
labels = torch.ones((records.shape[0],), dtype=torch.int64)
iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)
target = {}
target['boxes'] = boxes
target['labels'] = labels
target['image_id'] = torch.tensor([index])
target['area'] = area
target['iscrowd'] = iscrowd
target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)
return image, target, file_name
else:
return image, file_name
def collate_fn(batch):
return tuple(zip(*batch))
The train, test and validation datasets are instantiated and wrapped around a DataLoader
method.
tensor_transform = transforms.Compose([transforms.ToTensor()])
# Here the operations provided with Remo are integrated into a workflow in PyTorch
# by using the custom ObjectDetectionDataset method.
train_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
train_test_valid_split = train_test_split_file_path,
transform = tensor_transform,
mapping = cat_to_index,
mode = 'train')
test_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
train_test_valid_split = train_test_split_file_path,
transform = tensor_transform,
mapping = cat_to_index,
mode = 'test')
train_data_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_data_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
Training the Model¶
In this tutorial, we use a Faster RCNN
architecture with a ResNet-50 Backbone
, pre-trained on on COCO train2017. This is loaded directly from torchvision.models
To train the model, we specify the following details:
- Model: The edited version of the pre-trained model.
- num_classes: The number of classes present in the dataset = actual n of classes + 1 for background of the image (that's a peculiarity of Faster RCNN)
- Optimizer: The optimizer used for training the network
- Num_epochs: The number of epochs for which we would like to train the network.
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 10
loss_value = 0.0
num_epochs = 5
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)
for epoch in range(num_epochs):
train_data_loader = tqdm.tqdm(train_data_loader)
for images, targets, image_ids in train_data_loader:
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
optimizer.zero_grad()
losses.backward()
optimizer.step()
print('\nTraining Loss : {:.5f}'.format(loss_value))
Visualizing Predictions¶
Using Remo, we can easily iterate through the images to compare the model predictions against the original labels.
To do this, we just need to upload the model predictions to a new AnnotationSet
, which we call model_predictions
# Mapping Between Predicted Index and Class Name
mapping = { value : key for (key, value) in cat_to_index.items()}
detection_threshold = 0.3
results = []
model.eval()
test_data_loader = tqdm.tqdm(test_data_loader)
with torch.no_grad():
for images, image_ids in test_data_loader:
images = list(image.to(device) for image in images)
outputs = model(images)
for i, image in enumerate(images):
boxes = outputs[i]['boxes'].data.cpu().numpy()
scores = outputs[i]['scores'].data.cpu().numpy()
boxes = boxes[scores >= detection_threshold].astype(np.int32)
scores = scores[scores >= detection_threshold]
image_id = image_ids[i]
for box, labels in zip(boxes, outputs[i]['labels']):
results.append({'file_name' : os.path.basename(image_id),
'classes' : mapping[labels.item()],
'xmin' : box[0],
'ymin' : box[1],
'xmax' : box[2],
'ymax' : box[3]})
model_predictions_path = path_to_annotations + 'model_predictions.csv'
with open(model_predictions_path, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['file_name', 'classes', 'xmin', 'ymin', 'xmax', 'ymax'])
writer.writeheader()
writer.writerows(results)
predictions = object_detection_dataset.create_annotation_set(annotation_task='Object Detection',
name = 'model-predictions-oid',
paths_to_files = [train_test_split_file_path, model_predictions_path])
object_detection_dataset.view()
By visualizing the predicted boxes against the ground truth, we can go past summary performance metrics, and visually inspect model biases and iterate to improve it.
For example, we might notice in the picture below how the model incorrectly but clearly predicts the left car lamp to be a "Wheel", perhaps due to the shape being similar.