This model identifies the 12 entity types in the input text: CPA, DISEASE, ENZYME, GENE, INTERACTION, MPA, NEGREG, PATHWAY, POSREG, PROTEIN, REG, and VAR. It can also extract the relations between these entities as CauseOf and ThemeOf. The model accepts the text of the biomedical literature in English. It then returns a JSON file containing the input text, an entity list, and a relation list. The entity list includes the location, label, and probability of prediction per entity. And the relation list includes the location of the two entities, the relation label, and the probability of prediction per relation.
Create a Modzy account to get started →
82% Average Accuracy – The average of the accuracies of various classes.
75% F1 Score – The harmonic mean of the precision and recall, with best value of 1. It measures the balance between the two metrics.
86% Precision – A higher precision score indicates that the majority of labels predicted by the model for different classes are accurate.
66% Recall – A higher recall score indicates that the model finds and predicts correct labels for the majority of the classes it is supposed to find.
We established a collective system by implementing a network model using modified BERT, which receives PubMed IDs of the literature as input to return gene-disease associations described in the unstructured abstract. The system is comprised of models that perform: (1) entity recognition (including genes, disease, and its effects), and (2) relation extraction.
This model was trained on AGAC Task 1 and Task 2 training dataset and In-house dataset. In total, the training set contained 2750 texts with NER and Rel annotation labels.
The performance of the model was tested on the AGAC sample dataset, which contained 50 texts.
See how quickly you can deploy and run models, connect to pipelines, autoscale resources, and integrate into workflows with Modzy—the ModelOps and MLOps platform