N L P Tasks

Natural Language Processing

 

 

 

Supporting Resources - backup

The goal of the supporting resources for the BioNLP Shared Task 2016 is to provide the task participants with annotations from state-of-the-art automated tools in order to minimise the time-investment necessary to participate in the shared task and to allow for participants to experiment on how to leverage automated analyses provided by existing Natural Language Processing systems. [responsibility...]


Resources and Formats

In the following, are listed resources that are available for the shared task. Each resource is a file to be downloaded, it is the result of a tool (tools are presented) on the train and dev datasets.


POS Tagging

Genia Tagger is tool for part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text.

  • genia-tagger_train+dev_resources.zip: the resources produced by Genia tagger on the train and dev datasets

Parsing

Stanford Parser is a statistical parser...

  • stanford-parser_train+dev_resources.zip : the resources produced by Stanford Parser on the train and dev datasets.

Enju Parser parses sentences with the ENJU dependency parser.

  • enju-parser_train+dev_resources.zip : the resources produced by Enju Parser on the train and dev datasets

CCG Parser provides syntax parsing with CCG Parser.

  • ccg-parser_train+dev_resources.zip : the resources produced by CCG parser on the train and dev datasets.

Term Extraction

BioYatea extracts terms from the corpus using the YaTeA term extractor...

  • bioyatea_train+dev_resources.zip : the resources produced by BioYatea on the train and dev datasets

Named Entity Recognition

Stanford NER [synopsis...]

  • stanfordner_train+dev_resources.zip : the resources produced by Stanford NER on the train and dev datasets

LINNAEUS  is a software for species name recognition and normalization...

  • linnaeus_train+dev_resources.zip : the resources produced by LINNAEUS on the train and dev datasets

OrganismTagger is a hybrid rule-based/machine-learning system that extracts organism mentions from the biomedical literature, normalizes them to their scientific name, and provides grounding to the NCBI Taxonomy database...

  • organismtagger_train+dev_resources.zip : the resources produced by Organism tagger on the train and dev datasets.

SR4GN is a software that provides a species recognition for gene normalization...

  • sr4gn_train+dev_resources.zip : the resources produced by SG4GN on the train and dev datasets

[ ! moved from bb3_supporting-resources

SPECIES identifies taxonomic mentions in documents and maps them to corresponding NCBI Taxonomy entries. If you make use of the SPECIES annotations, please cite: Pafilis, E., Frankild, S.P., Fanini, L., Faulwetter, S., Pavloudi, C., Vasileiadou, A., Arvanitidis, C. and Jensen, L.J. (2013). The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text. PLoS One, 8(6), p.e65390.

  • SPECIES_train+dev_resources.zip : the resources produced by SPECIES on the train and dev datasets
  • ]

Sentence Splitting & Tokenization

Segmentation is an internal alvisnlp plan that generates...

  • segmentation_train+dev_resources.zip : the resources produced by segmentation on the train and dev datasets

Data Visualization

Brat is a tool for visualization of annotations...

  • brat_train+dev_resources.zip : the resources produced by Brat on the train and dev datasets.