Emerging Pattern Detection

Model by Booz Allen

This model observes patterns in a spatiotemporal dataset and identifies time periods and locations in the dataset where the pattern changes from usual behavior. The model accepts time series data with geographic coordinates and forms a geographic grid of the observed area. The model analyzes timeframes of the gridded data and identifies timeframes where the distribution of observations in the grid varies from normal. We use a Local Outlier Factor method to determine how well a grid distribution clusters with other distributions. This model can be used to analyze spatiotemporal data in multiple ways, such as identifying days where city traffic was greatly altered from a citywide perspective down to a granular city block perspective.

  • Description

    Product Description

    PERFORMANCE METRICS:

    This model’s performance was analyzed with 10,000+ simulations. When no new pattern was introduced, false positives were detected 7.2% of the time at 3 standard deviations and 1.7% of the time at 5 standard deviations. When a new pattern is introduced, it is detected by the model above 7 standard deviations and gets no false positives.

    OVERVIEW:

    This model utilizes an algorithm called Local Outlier Factor which is a variation of k-nearest neighbors clustering. Local Outlier Factor’s goal is to identify outliers as points that have substantially lower density than its neighbors. Heatmaps are scored on a gradient from 1 to 12 to indicate how anomalous a heatmap observation appears to the model. 12 is the maximum score an observation can get and indicates a strong confidence that the indicated timeframe contains an emerging pattern. This model can identify changes at the granular cell level as well as changes across the entire geographic grid.

    TRAINING:

    This model must be trained on historic data for the same dataset that is being analyzed. This model does not generalize to different geographies or situations. This process analyzes historic trends and establishes a baseline of normal behavior to identify deviations and flagging them as outliers.

    VALIDATION:

    This model was validated using a synthetic database with different traffic situations simulated over a baseline level of city traffic. Our model was able to correctly identify the simulated changes.