How is MEDLINE indexing accomplished?
As of April 2022, all journals indexed for MEDLINE are done by automated indexing, with human review and curation of results as appropriate.
MeSH indexing for MEDINE was done completely by human indexers until 2011. At that point, while the vast majority of journals were still done by full human indexing, NLM began experimenting with a method of indexing that, for a select set of journals, involved human curation of algorithmic results. The algorithm involved was the Medical Text Indexer (MTI), which was developed by researchers in the Lister Hill Center, and this method of indexing was termed MTIFL (MTI-First Line). In 2019, we began experimenting with the full automation of indexing for some journals, with human review of select citations. The algorithm involved is a customized version of MTI termed MTIA (MTI-Auto).
What are the benefits of automated indexing for MEDLINE?
Automated indexing provides users with timely access to MeSH indexing metadata and allow NLM to scale MeSH indexing for MEDLINE to the increasing volume of published biomedical literature. Our goal is to provide MeSH indexing within 24 hours of a citation appearing in PubMed.
What text is automated indexing for MEDLINE based on?
Automated indexing is currently based on the title and abstract of an article; future work will investigate automated indexing that is based on processing of the article’s full text (where NLM has access to that text for computational purposes).
What algorithm is used for automated indexing for MEDLINE, and what quality assurance processes are in place?
Automated indexing is currently done by MTIA (MTI-Auto), which is primarily pattern based – it combines MeSH terms mapped from the title and abstract with MeSH terms appearing in the PubMed related records to produce a filtered, ranked list of MeSH descriptors, supplementary concept records (SCRs), and publication types. Machine learning is currently used for the application of subheadings. In the future, we will incorporate machine learning for all components of MeSH indexing (descriptors, SCRs, and publication types).
Human indexers perform quality assurance review of selected sets of automatically indexed citations, e.g., those involving genes and proteins, cases of known ambiguity, and clinical trials, and curate these citations as needed. Random sets of citations are also reviewed.
How long does it take for an article to be indexed for MEDLINE?
For FY 2021, the average time to index for articles fully indexed by humans was 145 days. This is in addition to any time required for bibliographic data review.
Article citations done by automated indexing are generally completed within 1 day of receipt in our indexing system and appear as indexed for MEDLINE in PubMed the following day. Again, this does not account for any time the citation may have spent in bibliographic data review.
How can I tell whether a MEDLINE record was indexed by a human vs. automatically?
Automatically indexed citations are identified in the XML record for a citation. The attribute “IndexingMethod” was added to the MEDLINE/PubMed DTD several years ago - for more information, please see this NLM Technical Bulletin article. An IndexingMethod value of “Automated” indicates that MeSH indexing was provided algorithmically; a value of “Curated” indicates that MeSH indexing was provided algorithmically and a human reviewed (and possibly modified) the algorithm results. The IndexingMethod attribute is only present if a value is specified; if this attribute is not present, the indexing method is fully human. For instructions on retrieving PubMed records in XML via API, see the Entrez Programming Utilities Help. As of February 2023, users can identify citations indexed by these various methods in the PubMed interface as follows:
- Automated - search indexingmethod_automated
- Curated - search indexingmethod_curated
- Fully human indexed – Search: medline[sb] NOT (indexingmethod_curated OR indexingmethod_automated)
How can I provide feedback about MEDLINE indexing or report potential errors?
Users are highly encouraged to provide feedback or report problematic indexing through our Customer Service link, which is available via the “Help” link at the bottom of every webpage. This will take you to the NLM Support Center. Continue to click on ‘Write to the help desk’ to fill out the form and submit feedback and/or questions.
As an author, how can I identify MeSH terms to provide as keywords?
Since early 2013, PubMed has displayed publisher-supplied keywords in the KEYWORDS field of the abstract display. Authors who wish to supply those keywords using the MeSH vocabulary can consult the MeSH Browser. They may also use a tool called MeSH on Demand that identifies MeSH terms in text using the NLM Medical Text Indexer program. After processing the text, MeSH on Demand returns a list of MeSH terms relevant to the text that was input.
If authors wish their articles to be retrieved by their preferred terminology, they should ensure that these words appear in the title or abstract, where they will be searchable as text words.
Where can I learn more about the MeSH vocabulary used for indexing?
The MeSH homepage is a valuable resource for information related to the MeSH vocabulary.