Sentiment Exploitation Algorithm

Model by Booz Allen

This model classifies text by sentiment relating to an event or topic, while accounting for sentiment directionality. This model accepts text in English. The model outputs the predicted sentiment of the text as Positive, Neutral or Negative. Analysts manually labeling data is laborious and time-consuming. This model categorizes text at a rate that human analysts cannot match. This model will be refreshed frequently to improve performance by training on constantly expanding proprietary datasets and new NLP technology.

  • Description

    Product Description

    PERFORMANCE METRICS:

    60% Recall

    A higher recall score indicates that the model finds and predicts correct labels for the majority of the classes it is supposed to find. Further information here.

    This model was trained on a proprietary dataset gathered and labeled by Booz Allen Open Source Analysts. This dataset includes over 11,000 text documents from social media, blogs and conventional new media each labeled as having a sentiment of Positive, Neutral or Negative. The training dataset used 8,800 of these documents and the model was tested on the remaining 2,200 unseen text documents. The model obtains an accuracy of 60% on the test dataset. The model’s strength is its capability in classifying sentiment of many documents in less time it would take human analysts. This model is limited to English language text. The model is used to tag high volumes of documents and will be refreshed frequently to improve performance by training on constantly expanding proprietary datasets and new NLP technology.

    OVERVIEW:

    This model utilizes an artificial Recurrent Neural Network (RNN) architecture called the Long Short Term Memory (LSTM) network. The LSTM architecture is used to learn on sequential tasks, such as strings of text. This model also uses bi-directional LSTMs, which improves sequence classification. This model was built in Python with the Keras deep learning library and the Tensorflow deep learning framework. This model will be refreshed frequently to improve performance by training on constantly expanding proprietary datasets and new NLP technology.

    TRAINING:

    This model was trained on a proprietary dataset gathered and labeled by Booz Allen Open Source Intelligence (OSINT) Analysts. This dataset includes over 11,000 text documents from social media, blogs and conventional new media each labeled as having a sentiment of Positive, Neutral or Negative. The training dataset used 8,800 of these documents and the model was tested on the remaining 2,200 text documents. The model obtains an accuracy of 60% on the test dataset. The model uses a bi-directional Long Short Term Memory (LSTM) network. This model will be refreshed frequently to improve performance by training on constantly expanding proprietary datasets and new NLP technology.

    VALIDATION:

    The performance of the model was tested on the holdout dataset of 2,200 documents.