There has been an exponential growth in the number of scientific publications related to COVID-19 since Dec, 2019. On March 16, 2020, the White House issued a call to action asking the data science community to develop literature mining tools that can help the scientific community answer high-priority questions related to COVID-19. tmCOVID is an interactive web-based tool to extract and summarize the bioconcepts (genes, chemicals, drugs, mutations, cell lines, species, and diseases) in the COVID-19 scientific literature. Our ongoing work includes incorporation of capabilities to support the CORD-19 dataset and generate full-text summaries by detecting most relevant sentences using network centrality methods. Automated summarization of biomedical text will enhance access to information and help identify patterns within the text. Furthermore, it will allow biomedical researchers and general public to find information related to risk factors of COVID-19 including pregnancy, smoking, and comorbidities.
SARS-CoV-2 and COVID-19
COVID-19 and 'risk factors'
COVID-19 and smoking
COVID-19 and pregnancy
The user can query PubMed abstracts or PMC full-text articles. Additional filters include restricting the search to only journal articles or case reports.
The 'Bioconcept frequency in all documents option' generates a table with the frequency of bioconcept IDs aggregated across all documents.
The 'Bioconcept frequency in each document option' generates a table with the frequency of bioconcept IDs in each document.
Additional graphical and textual summarization options are currently under development and will be released soon.
The user can sort the results by bioconcept type and search for bioconcepts of interest in the results table.
The word cloud provides a visual summarization of the top 30 most frequent bioconcepts found in the documents matching the useery query. The size of the words correlates with the frequency of occurrence.
tmCOVID uses NCBI Entrez for retreiving PubMed IDs based on the input query. PubTator is used for extracting bioconcepts (genes, chemicals, diseases, species, and mutations) from each PubMed abstract or PMC full-text article. Summary tables are generated with the frequency of occurrence of bioconcepts at the document level or across all hits. All data is stored in an RSQLite database.