NSFW Image Classification

Model by Open Source

This model classifies imagery as either Suitable For Work (SFW) or Not Suitable For Work (NSFW) based upon the presence of pornographic content in an image. It takes an image as its input and returns a JSON output with floating point scores for the model’s determination of the image’s SFW and NSFW probabilities. This model can be used forensically across an IT system to hunt for unauthorized media. The model could also be used to moderate job data flows and to segregate data when an end user’s job requires the viewing of possibly objectionable content.

  • Description

    Product Description

    PERFORMANCE METRICS:

    This model wraps Yahoo!’s NSFW Deep Learning Neural Network, which was open sourced in 2016. Metrics were not provided in the Git repo. However, given the exceedingly low Recall rate of current manual methods, and the importance of the model to current client systems, we decided to include the algorithm in the Modzy platform. Performance metrics for custom use may be tuned by adjusting the minimum NSFW probability score at which the end users wishes to flag an image as NSFW. Yahoo! recommends > 0.8 NSFW and < 0.2 NSFW as reasonable default values to consider an image NSFW or SFW respectively.

    OVERVIEW:

    This model was created by Yahoo! Engineering. It was trained by their staff using their own internal dataset of imagery that they considered to be NSFW. Since the NSFW label is subjective and contextually dependent, situations arise where concepts that are inappropriate in one setting are considered appropriate for another e.g. a gory image might be considered NSFW at a children’s book publisher, but commonplace and acceptable at a medical textbook publisher. This lack of an absolute standard for NSFW lead the Yahoo! team to focus this model’s subject matter on the one area that is generally considered NSFW in the majority of workplace environments: pornographic imagery.

    The model architecture is a ResNet50 model.

    TRAINING:

    The model was developed by training several versions of its architecture against the 1,000 class ImageNet dataset. The model which performed the best against ImageNet, then underwent transfer learning against Yahoo! identified and editorially labeled datasets. The data used for training this NSFW model consisted of a large collection of pornographic and nonpornographic imagery collected by and housed at Yahoo!. Due to the nature of the training material and copyright concerns, Yahoo! chose not to release the training data.

    VALIDATION:

    The model was validated against a holdout set of the pornographic and nonpornographic imagery housed at Yahoo!. As with the training data, it was not released.

    INPUT SPECIFICATION

    The input(s) to this model must adhere to the following specifications:

    Filename Maximum Size Accepted Format(s)
    image 100M .jpg, .png, .tif

    OUTPUT DETAILS

    This model will output the following:

    Filename Maximum Size Format
    results.json 1M .json