This model can identify and mark the outlines of buildings within an overhead image. It accepts 900 pixel by 900 pixel RGB PNG or JPEG images. It outputs a list of predicted building masks as Well-Known Text (WKT) polygons. This model can detect buildings within off-nadir images, meaning that the sensor is looking down at the surface an angle, rather than straight down.
Create a Modzy account to get started →
84% F1 Score, 85% Precision, and 74% Recall
This model was trained on the SpaceNet Off-Nadir imagery dataset consisting of WorldView 2 Satellite images (chipped into 450m by 450m tiles) taken over Atlanta, GA on December 22, 2009 ranging from 7 to 54 degrees off-nadir. This model achieves a precision value of 0.85, a recall value of 0.834, and an F1 score of 0.84. This model performs best when provided input imagery close to a ground sample distance (GSD) of 0.5 m/pixel.
F1 is the harmonic mean of the precision and recall, with best value of 1. It measures the balance between the two metrics.
A higher precision score indicates that the majority of labels predicted by the model for different classes are accurate.
A higher recall score indicates that the model finds and predicts correct labels for the majority of the classes it is supposed to find.
This model consists of an ensemble of two models, one having a U-Net-like architecture and the other having a Feature Pyramid Network (FPN) architecture. U-Net is a fully convolutional network originally developed for biomedical image segmentation at the University of Freiburg. This architecture consists of a contracting path and an expanding path in a U-like architecture, along with feature map concatenation at each level of the expanding path from the corresponding level of the contracting path. FPN was developed by Facebook AI Research (FAIR), Cornell University and Cornell Tech. Its architecture is similar to that of U-Net, but it differs in the manner in which the feature map concatenation is performed.
This model was trained on the SpaceNet Off-Nadir imagery dataset consisting of WorldView 2 Satellite images taken over Atlanta, GA on December 22, 2009 ranging from 7 to 54 degrees off-nadir. These images were chipped into 450m by 450m tiles. 80 percent of the tiles were used for training. Data augmentation was performed including random flips, shifts, rotations, scaling, contrast, and brightness. The training took 20 hours on 1 NVIDIA Tesla V100. One U-Net and one FPN model were trained using Adam optimization with initial learning rate 0.001, learning rate decay with 3 consecutive epochs without improvement in loss and early stopping after 20 epochs without improvement in loss. Following this policy, the U-Net model trained for 56 epochs and the FPN model trained for 40 epochs.
20 percent of the image tiles were used for validation.
The input(s) to this model must adhere to the following specifications:
This model will output the following:
Get a video demo of Modzy, the ModelOps and MLOps software platform that businesses use to deploy, integrate, run, and monitor AI—anywhere.