This model redacts user-specified objects within a given video. This capability is useful for ensuring the privacy of entities recorded in the video, as well as censoring sensitive content. This model first identifies the entity of interest, redacts it with a black box, and tracks it throughout the video, ensuring it is never known.
57% Average Accuracy – The average of the accuracies of various classes. Further information here.
35% Expected Average Overlap (EAO) – The Expected Average Overlap, which is the average amount of time that the predicted bounding box overlaps with the object of interest over a window of frames. Further information here.
97% Robustness – The fraction of correct predictions made by the classifier on the synthetically generated adversarial test dataset. This metric measures the resiliency of the model against adversarial attacks. Further information here.
The object tracking model was trained on the Youtube-BBdataset, which contains 100,000 videos. It was tested on datasets for the Visual Object Tracking (VOT) challenges:VOT2015, VOT2016, and VOT2017, each containing 60 videos, as well as OTB2015, which contains 100 videos. The model achieves an average accuracy of 0.57, average robustness of 0.97, and Expected Average Overlap of 0.35.
This model works in two stages: first, the user-specified objects in the first frame are tracked throughout the video. The object tracking model uses one of two feature extractors: AlexNet and ResNet. The feature extractor is then followed by a Siamese Region Proposal network (Siamese-RPN), which tracks the objects and returns their per-frame bounding boxes. These objects are then redacted by setting all the pixels within their corresponding bounding boxes to black.
The object tracking model was trained on the Youtube-BB dataset for 50 epochs using stochastic gradient descent with a learning rate which decreased during the training in log space from 0.01 to 0.000001. The redaction portion of this model is deterministic and therefore did not require any training
The object tracking model was tested on the VOT2015, VOT2016, and VOT2017 datasets, each containing 60 videos. It was also validated using the OTB2015dataset, which contains 100 videos.
The inputs to this model must adhere to the following specifications:
The input video cannot have a higher resolution than 4096×2160 and, as a maximum, the first 10,000 frames will be read. The “config.json” file must contain the bounding boxes in the first frame of the objects to be tracked and redacted throughout the video. The format should be as follows:
"boundingBox": [[x1, y1, width3, height1], [x2, y2, width3, height2]]
All bounding box values are pixel counts. The values “x1” and “y1” correspond to the top left bounding box corner, with “x1” being the number of pixels from the left of the frame, and “y1” being the number of pixels down from the top of the frame. For example, [10, 20, 30, 40] would signify a 30-by-40 (width-by-height) bounding box with its top left corner located 10 pixels to the right and 20 pixels down from the top left of the frame.
[10, 20, 30, 40]
This model will output the following:
The output file will be named “output.mpg” and will contain the video with the selected objects tracked and redacted (pixels within the tracked bounding boxes set to black).
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience and Modzy product offering.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.