To unlock the object detection application, visit the Application Store.
What is object detection?
Object detection is a computer vision technique that aims to identify objects of interest in an image, such as vehicles, people, or buildings. During object detection, the model is applied to an input image, and the model outputs a set of bounding boxes and class labels, indicating the location and identity of the objects in the image. The model can also estimate the confidence score of each detection, indicating how confident it is in the detection being correct.
There are several reasons why object detection is used in computer vision:
Automation: Object detection can be used to automate tasks that would otherwise require manual intervention, such as monitoring surveillance cameras, detecting and counting objects in an image, or tracking moving objects in a video.
Scene Understanding: Object detection is an important aspect of scene understanding, as it provides information about the objects present in an image and their spatial relationships.
Human-Computer Interaction: Object detection is used in computer vision for human-computer interaction, such as recognizing gestures, detecting faces and facial features, or detecting and tracking hand movements.
Robotics: Object detection is essential for robotics applications, such as autonomous navigation, object grasping, and manipulation.
Augmented Reality: Object detection is used in augmented reality to detect and track objects in real-time, to provide a more immersive and interactive experience.
- Go to the Console.
- Click on "CREATE" to display the New Project pop-up window.
- Click on "COMPUTER VISION".
- Click on "OBJECT DETECTION".
- Choose between "FOR REAL-TIME INFERENCE" and "NOT FOR REAL_TIME INFERENCE" (see below)
- Enter a project name in the dedicated field.
- Click on "CREATE".
How do I choose between real-time inference and non-real-time inference?
Inference refers to the process of using a trained model to make predictions about new data. This is done by inputting new data into the model and using the model's learned relationships between inputs and outputs to generate a prediction.
Real-time inference and non-real-time inference refer to the speed at which a machine learning model processes and outputs predictions.
Real-time inference: Real-time inference refers to the ability of a model to make predictions in real-time, or near real-time, as new data is being inputted. This means that the model must process data and generate predictions quickly enough that the output is still relevant and usable. Real-time inference is important in applications where speed is a critical factor, such as video streaming, autonomous vehicles, or gaming.
Non-real-time inference: Non-real-time inference refers to models that are not designed for real-time predictions. These models can take longer to process data and generate predictions, as the results are not required immediately. Non-real-time inference is commonly used in batch processing applications, where a large amount of data is processed in a batch, and results can be delivered at a later time.
The choice between real-time and non-real-time inference depends on the requirements of the application and the available computational resources in your plan. Real-time inference requires more computational resources, as the model must process data quickly, whereas non-real-time inference is typically more computationally efficient but may take longer to produce results.
The workflow to train your object detection model is divided into 3 stages:
Train: The training stage is the most crucial stage of the AI modeling pipeline. It is divided into 2 modules:
Preprocess: Before running the training process, the data must be carefully selected and annotated with semantic information such as object labels and bounding boxes.
Run: Running the model training will push the data batch into multiple training cycles until the model has had enough opportunities to learn the patterns in the data.
Evaluate: Evaluate the performance of the trained model on new, unseen data and make any necessary adjustments to improve its accuracy and performance.
Predict: Apply your newly trained model to make predictions based on the patterns learned from the training data.
- Click "TRAIN" at the top-left corner of the Project view to display the "1. Preprocess" module.
The "1. Preprocess" module is displayed by default when clicking on the "TRAIN" stage.
- If not, click on "1. Preprocess".
To get started with your object detection model training, you must first gather the raw data that will be annotated. This includes images from multiple sources that are related to the use case you are trying to solve.
Once the data is ready to be annotated, refer to the Master the labeling tool to learn how to import images or datasets, and label images.
Now that your data is annotated, you can run the training module.
- Always in the "TRAIN" stage of the workflow, click on the "2. Run" module to toggle the Inference Model panel.
- Enter the adequate value in the Epochs field (see below).
- Click 'TRAIN" on the top-right corner of the interface.
- During this process, a few Processing and Loading pop-up windows will appear, you can click away to make them disappear.
- A graph illustrating the progress of the Training Score will appear and evolve with each cycle. Wait until the end of all your training cycles.
How to select the right value for Epochs?
An epoch is a single iteration through the entire training dataset in machine learning. During each epoch, the model is trained on a batch of training data and the parameters of the model are updated based on the results. The goal of each epoch is to improve the model's performance on the training data. Typically, multiple epochs are run during the training process to ensure that the model has seen the entire training dataset multiple times and has had enough opportunities to learn the patterns in the data.
More epochs are usually better for training a deep learning model, as this allows the model to see more training examples and to continue refining its weights and biases. However, there is a trade-off between the number of epochs and overfitting. If the model is trained for too many epochs, it may start to memorize the training data and become less capable of generalizing to new, unseen data.
We recommend using 100 as the general value for the Epochs field. However, this value may vary depending on multiple factors.
How do I know if my model is learning properly?
A graph illustrating the progress of the Training Score will appear and evolve with each cycle. If the graph curve consistently goes up, this means that the model is training properly.
If the training score starts to plateau, this indicates that the model is not improving despite more training, it may be time to stop training and avoid overfitting.
What is the loss score?
- Click on "Loss" to display the loss score graph.
The loss score is a key metric that measures the difference between the model's predictions and the true target values, and is used to guide the training process and evaluate the model's performance. The goal of training a machine learning model is to minimize the loss score, so that the model can make predictions that are as accurate as possible.
The loss score is a key metric that is used to evaluate the performance of a model during training, validation, and testing. The goal of training a machine learning model is to minimize the loss score, so that the model can make predictions that are as close as possible to the true target values.
The loss score is calculated after each training iteration or epoch, and is used to update the model's parameters (e.g., weights and biases) so that the model can learn from its mistakes and make better predictions on the next iteration.
The loss score mirrors the training score. If it goes down, it means that the model is learning properly.
The "Configuration" panel contains different options to change the image processing settings.
Tiling represents a sophisticated technique in image processing that involves partitioning expansive images into smaller, overlapping sections, or tiles. This strategic division serves as a powerful solution for tackling the challenges posed by ultra-high-resolution images, facilitating streamlined processing and in-depth analysis.
With tiling, the Deep Block platform enhances its capability to effectively manage and analyze intricate images. This approach is particularly advantageous when engaging in tasks like object detection or image segmentation. The method involves dividing the larger image into a grid of rows and columns, effectively creating a mosaic of interconnected tiles.
After choosing the adequate number of rows and columns, click on "TILE" to start the tiling process. The status will change to "TILE" until the process is over.
To optimize the tiling process, it's recommended to maintain a balanced size for each tile, aiming for dimensions of approximately 1000 pixels by 1000 pixels. This guideline ensures that each tile encapsulates a substantial amount of information while remaining manageable for processing. For instance, an image of dimensions 8k x 8k pixels can be segmented into an 8 x 8 grid, providing a cohesive framework for comprehensive analysis.
- Click on the "EVALUATE" stage at the top-left corner of the Project view to display the "3. Evaluate" module.
The "3. Evaluate" module resembles the "1. Preprocess" module.
The Categories panel is displaying the same categories or class labels as the "1. Preprocess" module. To learn more about categories, refer to Mastering the labeling tool.
The Evaluate panel resembles the Train panel in the "1. Preprocess" Module and follows the same functioning. That is where you can import your images or data sets for the evaluation stage.
The evaluation dataset should be different from the training dataset so that your model capabilities can be tested on new, unseen data.
To learn more about how to import images and data sets, refer to Mastering the labeling tool.
Once your data is imported, you must annotate it, if it is not already, just like in the training phase. This way, the evaluate module will be able to compare your annotations with the model predictions and establish a score.
Once your evaluation data set is labeled and ready, you can now launch the evaluation module.
- Click on "Evaluate" at the top-right corner of the Project view.
- Your dataset will be processed, please wait until the evaluation is over and the processing status returns to "IDLE".
After evaluation, performance scores are now available.
- Click on the "Score" tab in the bottom-left panel of the Project view.
Your model performance score is composed of important 3 metrics:
- mAP: mAP, or mean Average Precision, is a metric for measuring the average accuracy of your model. The average precision of a model is defined as the average of its precision scores for different recall values. Precision is defined as the number of true positive detections divided by the total number of detections. A high precision score means that the model is producing few false positive detections. By extension, if your mAP score is high, it means that the model is producing few false positive detections at multiple recall values.
- Recall: Recall is defined as the number of true positive detections divided by the total number of ground-truth objects in the image. A high recall score means that the model is detecting a large proportion of the ground-truth objects in the image.
- F1 Score: used to evaluate the overall performance of the model in terms of its ability to correctly identify and segment objects in images. The F1 score provides a single number by balancing precision and recall.
- Click on the "PREDICT" stage at the top-left corner of the Project view to display the "4. Predict" module.
The threshold value is used to distinguish between different classes or categories in an image. It is used to separate foreground (object of interest) from background pixels. The threshold value is used to determine the value of each pixel in the image, and pixels with a value greater than the threshold are assigned to the foreground class, while pixels with a value less than the threshold are assigned to the background class.
- Enter the appropriate value in the Threshold score (%) field.
The choice of threshold value in object detection is an important step in the process, as it can significantly impact the quality of the detection results. A good threshold value should be chosen such that it accurately separates the object of interest from the background, and does not introduce false positive or false negative pixels. The choice of the threshold value is often determined through experimentation but we recommend using a value of 70% for most applications.
- Click on " " to add an image via your webcam.
- Click on " " to download the JSON file for the current project.
- Click on " " to import images that you wish to use.
- Click on " " to remove an image after selecting it.
- If a prediction has already been made, click on "CLEAR BOXES" to remove all bounding boxes.
Image file formats supported are: png, apng jpg, svg, tiff, bmp, gif, ico and jp2 (10GB max file size).
Once your dataset is uploaded, you are ready to launch the prediction.
- Click on "PREDICT" at the top-right corner of the Project view.
- The processing will start. Depending on the number of images uploaded, this process could take several minutes. You can stop it at any time by clicking on "STOP".
Wait until the processing status returns to "IDLE". By then, the model would have created bounding boxes around the desired objects of interest.
The Statistics tab indicates, per category, the number of boxes within the project dataset or in the selected image.