AI Car Damage Detection: How it works - Part 2

In this blog, we will run through our existing systems  that have been working to train our AI models and the new systems that we have implemented since then. Let’s dive in!

AI Car Damage Detection: How it works - Part 2

In our previous blog, we explored the different models we have used so far for training our damage detection model to achieve 90%-95% accuracy during vehicle inspections. 

Since we published the last blog, there’s been a lot of advancement in the tech used to train AI models. In a pursuit to increase our car damage detection model’s efficiency, we have also implemented a lot of this tech into our current training systems. 

In this blog, we will run through our existing systems  that have been working to train our AI models and the new systems that we have implemented since then. Let’s dive in!

How has Inspektlabs been handling vehicle damage detection using AI 

Before we get into the details of the models we have been using, let’s first understand the capabilities of AI and the types of damages it can detect. 

Types of car damages

Today’s AI models used for vehicle damage detection (trained using various computer vision models) can mostly detect physical damages on a vehicle. 

These physical damages can be divided into three categories: 

  • Metal Damage - These include damages that occur on metal parts of the vehicle such as dents, scratches, tears etc
  • Glass Damage - These include damages that occur due to an impact on a vehicle’s glass parts (windshield, back glass. Window, headlights etc.) and can be classified into cracks, chips, spider cracks, and large-range glass damage.
  • Miscellaneous Damage - This damage includes physical damages that aren’t metal or glass damage such as gap between parts, dislocations etc

Models used in AI car damage detection

Let’s quickly recap the different models we have been using so far to train our AI model for efficient damage detection on vehicles. 

We have been relying on two main methods to train our AI models. These include -

  • Object Detection method - This model uses deep learning and its advances as the primary algorithm to detect damage through an image fed to it. The most commonly used object detection models include YOLO (You only Look Once), EfficientDet, and Faster R-CNN, which provide high accuracy and can detect multiple types of damage in an image.
  • Segmentation method - This model is used to partition or classify images into meaningful parts or regions by assigning labels to each pixel. This is achieved by using advanced models like Vision Transformers

While both these models have their pros and cons, the ideal solution to achieve the best results during AI-based damage detection is to use an ensemble of both models i.e. combining the best features of both models to achieve the ideal results. 

This is also known as the Ensemble model, which uses the strengths of both models, covering up the drawbacks of each other and, hence, results in fewer errors in the final prediction during AI-powered damage detection. 

Updates to Inspektlabs’ AI damage detection model

The last few years have seen a significant boost in how AI models can be trained. With the emergence of self-supervised learning (SSL) models, training these models to achieve certain tasks has become a lot easier. 

What are Self-supervised learning models?

Self-supervised learning is a machine learning methodology that allows AI models to learn from unlabeled information. 

Instead of relying on humans to label each image, the model learns by creating its own "tasks" from the data. This process helps it understand patterns, features, and meaningful representations from the data provided. 

Some of the most popular AI models that are trained using the self-supervised learning methodology include LLMs like ChatGPT, DeepSeek etc. that are trained using a CLM (Casual Language Modeling) and an MLM (Masked Language Modeling) approach. 

The objective of a CLM is to predict the next sequence of words based on the previous set of words given to it, by processing inputs in a left-to-right manner and generating coherent and contextually accurate text that makes sense. 

In comparison, an MLM’s objective is to predict missing words in a sentence in case they have been masked or are empty in order to make sense from the incomplete sentences. 

While both these models work great for text-based AI models like ChatGPT and DeepSeek, they are not ideal for Inspektlabs’ AI model that uses computer vision for damage detection. 

Self-supervised learning for computer vision

In the case of computer vision, the goal of the AI model is to interpretto be interpret and analyse different images fed to it and generate relevant data based on its interpretation. 

While building a computer vision model, self-supervised learning can come in handy by removing the dependency of asking a human to label individual images, and instead is trained by creating its own “tasks” from the data where it learns to identify different patterns in the images provided. 

This method of training the computer vision model comes with its own set of benefits, such as - 

  • Efficient use of time and resources i.e. no need for massive labeling efforts.
  • Easily extracting useful information from raw data
  • Using the self-supervised model as a starting point (pre-trained model) for tasks with labeled data results in faster training and better results.

How Self-supervised learning works

Think of self-supervised learning where the model tries to predict missing information in an image or understand relationships between different parts of the data. By solving these tasks, the model learns useful features that can be applied to other tasks, such as classification or object detection or segmentation.

Some of the most popular self-supervised learning models used for computer vision include 

  • DINO (Distillation with No Labels): Great for learning representations that work well in object detection and segmentation tasks.
  • BYOL (Bootstrap Your Own Latent): Focuses on feature extraction without needing explicit negative samples.
  • SimCLR (Simple Framework for Contrastive Learning): Learns by comparing similar and dissimilar images after augmentation.
  • MoCo (Momentum Contrast): Builds a dynamic dictionary for contrastive learning with a momentum-updated encoder.
  • MAE (Masked Autoencoders): Similar to MLM in NLP but applied to images; reconstructs masked patches of images.

Here’s what a practical workflow for a self-supervised learning model in computer vision looks like - 

  1. Self-Supervised Pretraining: Train a model on millions of unlabeled images using DINO, BYOL, or SimCLR.
  2. Fine-tuning on Labeled Data: Once the model has learned useful features, fine-tune it on a smaller labeled dataset for your specific task.

While self-supervised learning helps unlock the hidden potential of vast amounts of unlabeled data, it might not be ideal for the use-case of damage detection. 

Challenges involved in using Self-supervised learning for damage detection

(image to be added - Infographic depicting the challenges of self-supervised learning in damage detection)

  1. Self-supervised learning creates its own labels, but they don’t always focus on damages.In our case, the images mainly show cars, with damages being less noticeable. As a result, the model tends to focus on recognizing cars rather than detecting the damages.
  2. Damage patterns are complicated, making it hard for self-supervised learning to generalize them.Detecting damages—especially small or subtle ones—is challenging because the model struggles to identify these fine details consistently.
  3. Self-supervised learning requires a pretask to help the model recognize patterns.The model first works on a simpler task (like solving a jigsaw puzzle, predicting image rotation, or handling image augmentations) to learn useful features. It then tries to match patterns between the original image and the transformed version from the pretask.
  4. The success of self-supervised learning depends on data quality and selection.Even though we have a large dataset, feeding it all at once doesn’t improve results—it just leads to “garbage in, garbage out.” Careful data selection is important, but with so much data, this process becomes time-consuming.

A better solution to train the AI model for improved efficiency would be to used a semi-supervised model with data labelled with boxes (and not pixels). 

What is semi-supervised learning? 

Semi-supervised learning in computer vision is a machine learning approach that combines a small amount of labeled data with a large amount of unlabeled data to train models. 

This method is particularly useful because obtaining labeled data for computer vision tasks (like image classification, object detection, and segmentation) can be time-consuming and expensive, while unlabeled data is often abundant.

Benefits of using semi-supervised learning in computer vision

  • Cost-Effective: Reduces the need for extensive manual labeling.
  • Better Performance: Leveraging unlabeled data can improve model generalization.
  • Data Efficiency: Makes use of large datasets that would otherwise go unused.

Why do we use boxes instead of pixels while training our model using semi-supervised learning? 

  1. Creating bounding boxes (BBox) is quick and straightforward. In our case, we already had a large amount of BBox data available.
  2. In our use case, we converted the BBox data into segmentation data, which resulted in rectangular segmentation areas.
  3. Training with large BBox segmentation data helped the model detect damage presence and location but lacked precise damage area detection.
  4. Next, we fine-tuned the model using actual segmentation data. Fine-tuning with actual segmentation data improved accuracy, allowing the model to capture the exact damaged region.

Conclusion

Advancements in AI and machine learning have significantly enhanced Inspektlabs’ vehicle damage detection capabilities. 

By transitioning from purely supervised models to integrating self-supervised and semi-supervised learning, we’ve improved accuracy, efficiency, and scalability. 

Leveraging bounding box data for faster training and fine-tuning with segmentation data has allowed our models to pinpoint damage with greater precision. 

As technology evolves, we remain committed to refining our AI systems to deliver faster, more reliable, and cost-effective vehicle inspections.