The ability to quickly and accurately identify objects in aerial imagery can be useful in a variety of scenarios, such as urban planning, crop surveillance, and traffic surveillance. This model is trained to do so for 60 different types of objects (including buses, vehicle lots, buildings, oil tankers, etc.) using one of the largest and highest quality public datasets of annotated high-resolution satellite imagery. Covering a total area of over 45,000 square kilometers, this dataset contains many variations of each of the 60 object classes, resulting in a robust model. This model is Open Source but was developed by the Modzy data science team.
Many models are available for limited use in the free Modzy Basic account.
23.2% 60-class mAP
The mean of Average Precision scores across all classes.
This model was trained and validated using the public xView2dataset, which consists of high-resolution satellite imagery annotated with building locations and damage scores before and after natural disasters. This model achieves a 60-class mean average precision (mAP) score of 0.2319, calculated using the methodology outlined by SIMRDWN. An Intersection over Union (IoU) threshold of 0.5 was used for most classes and a threshold of 0.25 was used for objects that are comparably smaller, such as vehicles. The model performs best on electro-optical satellite imagery with a ground sample distance of 0.3 meters. For fast inference time, this model should have access to at least one GPU.
This model detects 60 classes of objects within overhead electro-optical (EO) satellite imagery. It utilizes an adapted version of YOLO called YOLT2, provided by the open source SIMRDWN framework. YOLT2 is a pipeline that is tailored towards satellite imagery, where larger convolutional filters are replaced by smaller 3×3 filters, refining the model’s ability of detecting small objects from a distance. This model was trained and validated using the public xView2 dataset, and accepts a TIFF, PNG, or JPEG image as its input. It returns a JSON file containing detected bounding boxes and corresponding object class names and their corresponding confidence scores.
The training set consists of 80% of the original images and annotations, randomly sampled without replacement, from the public xView2 dataset. The images were then chipped to fit the YOLT network’s window size of 416 x 416 pixels and processed at their native resolution. Image chips that contained no objects were discarded.
The input(s) to this model must adhere to the following specifications:
The input file should be a 3-channel electro-optical satellite image.
This model will output the following:
The output file (results.json) will contain detected vehicle bounding boxes. Each bounding box will contain the corresponding class name, confidence score, and top left/bottom right x,y coordinates defining the box. This model can detect the following object classes:
Get a video demo and join the community of developers and customers building the future of Artificial Intelligence.