Published April 12, 2022 | Version 2.0
Dataset Open

LivingNER terminology: NCBI Taxonomy translated to Spanish

  • 1. Barcelona Supercomputing Center

Description

Official NCBI Taxonomy FTP dump (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/) with the terms translated to Spanish by a Neural Machine Translator fine-tuned for the biomedical domain.

We have added the following terms: 

2560602    Mumps orthorubulavirus    Mumps orthorubulavirus    scientific name    Paperas ortorubulavirus
2560526    Human orthorubulavirus 4    Human orthorubulavirus 4    scientific name    orthorubulavirus humano 4
2847144    hepatitis C virus genotype 1a    hepatitis C virus genotype 1a    scientific name    virus de la hepatitis C genotipo 1a
2560525    Human orthorubulavirus 2    Human orthorubulavirus 2    scientific name    orthorubulavirus humano 2
_NOCODE_    out of NCBI Taxonomy scope    out of NCBI Taxonomy scope    NA    mención no codificable a NCBI Taxonomy

The first 4 were added because they appear in the LivingNER corpus, and are present in the browser version of NCBI Taxonomy.

The last one (_NOCODE_) is added to identify terms in LivingNER corpus not present in the NCBI Taxonomy.

 

 

Format:

Tab-separated file with the following columns:

  • tax_id: the id of node associated with this name
  • name_txt: name itself
  • unique name: the unique variant of this name if name not unique
  • name class: (synonym, common name, scientific name, ...)
  • Spanish name: name in Spanish

 

Please cite if you use this dataset:

A. Miranda-Escalada, E. Farré-Maduell, S. Lima-López, D. Estrada, L. Gascó, M. Krallinger, Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of livingner shared task and resources, Procesamiento del Lenguaje Natural (2022)

@article{amiranda2022nlp,
title={Mention detection, normalization \& classification of species, pathogens, humans and food in clinical documents: Overview of LivingNER shared task and resources},
author={Miranda-Escalada, Antonio and Farr{\'e}-Maduell, Eul{`a}lia and Lima-L{\'o}pez, Salvador and Estrada, Darryl and Gasc{\'o}, Luis and Krallinger, Martin},
journal = {Procesamiento del Lenguaje Natural},
year={2022}
}

 

Resources

 

For more information visit https://temu.bsc.es/livingner/ or email us at encargo-pln-life@bsc.es

Check out the translator demo: https://textmining.bsc.es/translator

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

Files (267.4 MB)

Name Size Download all
md5:8fb3d39c479de41060f461423e21cef1
267.4 MB Download