Text Summarization

Model by Modzy

This model condenses a text article into a summary of a few sentences. It takes as input any English text document and returns a summary of the input text.

This model can be used in a range of settings including media monitoring, newsletter summary generation, book summarization, and automated content creation.

  • Description

    Product Description

    PERFORMANCE METRICS:

    This model achieves a ROUGE score of 0.33.

    This model was trained, validated, and tested on the CNN/Daily Mail dataset, and a ROUGE metric was evaluated for both extractive and abstractive phases of the model. The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of metrics for evaluating automatic summarization of texts as well as machine translation. 

    Recall-Oriented Understudy for Gisting Evaluation (ROUGE) is a set of metrics used for evaluating machine translation and automatic summarization by comparing an automatically produced summary/translation against a set of human-made references.

    Further information here.

    OVERVIEW:

    This model solution was derived from the Fast Abstractive Summarization-RL model. It has a three-fold architecture. First, it uses a sentence-level RL technique for abstractive summarization, effectively utilizing the word-then-sentence hierarchical structure without annotated matching. Next, the model achieves a new state-of-the-art result on all metrics of multiple versions of the summarization dataset, as well as a test-only dataset. These results are obtained for both the extractive and abstractive modes, without loss in language fluency. Finally, the parallel decoding results in a significant 10-20x speed-up over the previous best neural abstractive summarization system

    TRAINING:

    This model was trained on a dataset that consists of approximately 312,085 online news articles from CNN and Daily Mail along with corresponding summaries.

    On average, the articles are 781 tokens in length and the summaries are 56 tokens in length. This model solution was derived from the Fast Abstractive Summarization-RL model.

    VALIDATION:

    The performance of the model was tested on a validation dataset consisting of 13,367 articles from the CNN/Daily News dataset.

    INPUT SPECIFICATION

    The input(s) to this model must adhere to the following specifications:

    Filename Maximum Size Accepted Format(s)
    input.txt 1M .txt

    OUTPUT DETAILS

    This model will output the following:

    Filename Maximum Size Format
    results.json 1M .json