Off Nadir Building Detection

Model by Modzy

This model can identify and mark the outlines of buildings within an overhead image. It accepts 900 pixel by 900 pixel RGB PNG or JPEG images. It outputs a list of predicted building masks as Well-Known Text (WKT) polygons. This model can detect buildings within off-nadir images, meaning that the sensor is looking down at the surface an angle, rather than straight down.

  • Description

    Product Description

    PERFORMANCE METRICS:

    84% F1 Score, 85% Precision, and 74% Recall

    This model was trained on the SpaceNet Off-Nadir imagery dataset consisting of WorldView 2 Satellite images (chipped into 450m by 450m tiles) taken over Atlanta, GA on December 22, 2009 ranging from 7 to 54 degrees off-nadir. This model achieves a precision value of 0.85, a recall value of 0.834, and an F1 score of 0.84. This model performs best when provided input imagery close to a ground sample distance (GSD) of 0.5 m/pixel.

    F1 is the harmonic mean of the precision and recall, with best value of 1. It measures the balance between the two metrics. Further information here.

    A higher precision score indicates that the majority of labels predicted by the model for different classes are accurate. Further information here.

    A higher recall score indicates that the model finds and predicts correct labels for the majority of the classes it is supposed to find. Further information here.

    OVERVIEW:

    This model consists of an ensemble of two models, one having a U-Net-like architecture and the other having a Feature Pyramid Network (FPN) architecture. U-Net is a fully convolutional network originally developed for biomedical image segmentation at the University of Freiburg. This architecture consists of a contracting path and an expanding path in a U-like architecture, along with feature map concatenation at each level of the expanding path from the corresponding level of the contracting path. FPN was developed by Facebook AI Research (FAIR), Cornell University and Cornell Tech. Its architecture is similar to that of U-Net, but it differs in the manner in which the feature map concatenation is performed.

    TRAINING:

    This model was trained on the SpaceNet Off-Nadir imagery dataset consisting of WorldView 2 Satellite images taken over Atlanta, GA on December 22, 2009 ranging from 7 to 54 degrees off-nadir. These images were chipped into 450m by 450m tiles. 80 percent of the tiles were used for training. Data augmentation was performed including random flips, shifts, rotations, scaling, contrast, and brightness. The training took 20 hours on 1 NVIDIA Tesla V100. One U-Net and one FPN model were trained using Adam optimization with initial learning rate 0.001, learning rate decay with 3 consecutive epochs without improvement in loss and early stopping after 20 epochs without improvement in loss. Following this policy, the U-Net model trained for 56 epochs and the FPN model trained for 40 epochs.

    VALIDATION:

    20 percent of the image tiles were used for validation.

    INPUT SPECIFICATION

    The input(s) to this model must adhere to the following specifications:

    Filename Maximum Size Accepted Format(s)
    image
    10M .jpg, .png

    OUTPUT DETAILS

    This model will output the following:

    Filename Maximum Size Format
    results.json 5M .json