Image-Based Geolocation

Model by Modzy

This model predicts the geo-location of an image, based on its contents (e.g., architecture or key landmark). This capability can be used for a variety of tasks including geo-positioning, geo-tagging, and geo-coding. Predicting the geographical location of photos without any prior knowledge is a challenging task, simply because of the infinite possible locations, variations, and conditions images could be captured. Additionally, these images are often ambiguous and provide very few visual clues about their respective recording location.

  • Description

    Product Description


    59.9% Top 1 Accuracy

    79.4% Top 5 Accuracy

    This model was trained and validated on a dataset created using the Flickr APIs. On the validation subset, the model achieves a top 1 accuracy score of 0.5931 and a top 5 accuracy score of 0.7944. The model tends to perform better on images that contain cues for geo-location, including key landmarks, building architecture, streets, landscape, monuments, statues, or bridges.

    Top 1 Accuracy: The ratio of the number of correct predictions of the top 1 predicted class to the total number of input samples. Further information here.

    Top 5 Accuracy: The fraction of the top 5 predictions made by the classifier. Further information here.


    This model treats the geo-location task as an image classification problem, where the challenge lies in localizing photos from a diverse set of cities or regions around the world. The Xception architecture is used to determine geo-location classifications of images.


    This model was trained on a dataset created using the Flickr APIs by querying photos based on geo-location data and photo tags. The dataset consists of roughly 10,000 images per class for a total of 45 classes. The 45 classes (cities and regions) were chosen based on popularity, size, geographic distribution, and the amount of available geo-tagged photos on Flickr. The model was trained on roughly 80% of the full dataset.


    The model was validated on 20% of the dataset compiled using the Flickr APIs. The model achieves a top 1 accuracy score of 0.5931 and a top 5 accuracy score of 0.7944.


    The input(s) to this model must adhere to the following specifications:

    Filename Maximum Size Accepted Format(s)
    image 1M .png, .jpg, .jpeg


    This model will output the following:

    Filename Maximum Size Format
    results.json 1M .json

    The JSON output of this model will contain the top five geolocation predictions for the input image. These predictions will come from the following classes:

    Amsterdam Athens Bangkok Barcelona
    Beijing Berlin Budapest Buenos Aires
    Cairo Cape Town Central America Chicago
    Dallas Delhi Denver Dubai
    Dublin Florence Havana Helsinki
    Hong Kong Indonesia Istanbul Lisbon
    London Los Angeles Madrid Mexico City
    Middle East Moscow New York Northwestern Africa
    Paris Saint Petersburg Quebec City Rio
    Rome San Francisco Santiago Sub-Saharan Africa
    Sydney Tokyo Toronto Valencia