The ability to quickly and accurately identify objects in aerial imagery can be useful in a variety of scenarios, such as urban planning, crop surveillance, and traffic surveillance. This model is trained to do so for 60 different types of objects (including buses, vehicle lots, buildings, oil tankers, etc.) using one of the largest and highest quality public datasets of annotated high-resolution satellite imagery. Covering a total area of over 45,000 square kilometers, this dataset contains many variations of each of the 60 object classes, resulting in a robust model. This model is Open Source but was developed by the Modzy data science team.
23.2% 60-class mAP
The mean of Average Precision scores across all classes. Further information here.
This model was trained and validated using the public xView2dataset, which consists of high-resolution satellite imagery annotated with building locations and damage scores before and after natural disasters. This model achieves a 60-class mean average precision (mAP) score of 0.2319, calculated using the methodology outlined by SIMRDWN. An Intersection over Union (IoU) threshold of 0.5 was used for most classes and a threshold of 0.25 was used for objects that are comparably smaller, such as vehicles. The model performs best on electro-optical satellite imagery with a ground sample distance of 0.3 meters. For fast inference time, this model should have access to at least one GPU.
This model detects 60 classes of objects within overhead electro-optical (EO) satellite imagery. It utilizes an adapted version of YOLO called YOLT2, provided by the open source SIMRDWN framework. YOLT2 is a pipeline that is tailored towards satellite imagery, where larger convolutional filters are replaced by smaller 3×3 filters, refining the model’s ability of detecting small objects from a distance. This model was trained and validated using the public xView2 dataset, and accepts a TIFF, PNG, or JPEG image as its input. It returns a JSON file containing detected bounding boxes and corresponding object class names and their corresponding confidence scores.
The training set consists of 80% of the original images and annotations, randomly sampled without replacement, from the public xView2 dataset. The images were then chipped to fit the YOLT network’s window size of 416 x 416 pixels and processed at their native resolution. Image chips that contained no objects were discarded.
The input(s) to this model must adhere to the following specifications:
The input file should be a 3-channel electro-optical satellite image.
This model will output the following:
The output file (results.json) will contain detected vehicle bounding boxes. Each bounding box will contain the corresponding class name, confidence score, and top left/bottom right x,y coordinates defining the box. This model can detect the following object classes:
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience and Modzy product offering.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.