BB4 Evaluation results

Participating teams

Ten teams participated to all six sub-tasks and submitted a total of 31 runs.

AliAI: rel, rel+ner
Amrita_Cen: rel (2)
AmritaCen_healthcare: norm, norm+ner
BLAIR_GMU: kb (2), kb+ner (2), norm (2), norm+ner (2), rel (2), rel+ner (2)
BOUN-ISIK: norm (2), rel (2)
MIC-CIS: norm+ner (2)
PADIA_BacReader: norm
UTU: rel (2), rel+ner(2)
whunlp: rel
Yuhang_Wu: rel

Evaluation

The evaluation is described on the Ev a lu ate p r e dictions page. You can also evaluate your predictions.

You can download all the charts and tables shown below (BB-rel, BB-rel+ner, BB-norm, BB-norm+ner, BB-kb, BB-kb+ner).

Baseline

Submissions are compared to a simple baseline:

Case-insensitive string matching for NER and normalization.
All valid pairs of arguments inside a sentence for relation extraction.

Confidence intervals

The confidence intervals have been obtained by bootstrap resampling (n=100).

BB-rel

7 teams, 11 runs.

Global results

Recall, Precision and F1 for both relation types (micro average).

IMG

Lives_In

Recall, Precision and F1 for Lives_In relations.

The ticks on top of each bar indicates the score for relations that do not cross sentence boundaries.

IMG

Exhibits

Recall, Precision and F1 for Exhibits relations.

The ticks on top of each bar indicates the score for relations that do not cross sentence boundaries.

IMG

BB-rel+ner

3 teams, 5 runs.

Slot Error Rate

The Slot Error Rate (SER) is shown instead of F1, because substitution errors are penalized both in Recall and Precision.

SER is an error rate, therefore lower values are better.

Named entity boundaries

Named-entities boundaries accuracy is measured by the Jaccard index.

Global results

Recall, Precision and SER for both relation types (micro average).

IMG

Lives_In (Habitat)

Recall, Precision and SER for Lives_In relations where the argument is of type Habitat.

The tick on each bar indicates the gain when entity boundaries accuracy is ignored.

IMG

Lives_In (Geographical)

Recall, Precision and SER for Lives_In relations where the argument is of type Geographical.

The tick on each bar indicates the gain when entity boundaries accuracy is ignored.

IMG

Exhibits

Recall, Precision and SER for Exhibits relations.

The tick on each bar indicates the gain when entity boundaries accuracy is ignored.

IMG

BB-norm

4 teams, 6 runs.

Global results

The result is the average distance between predicted and reference normalizations.

For Microorganism entities, a strict equality is used.

For Habitat and Phenotype entities, the Wang distance is used (w=0.65).

IMG

Microorganisms

Average of strict equality of normalizations for Microorganisms entities.

IMG

Habitats

Average Wang distance of normalizations for Habitat entities.

Habitats (exact)

Average strict equality of normalizations for Habitat entities.

Habitats (new in test)

Average Wang distance of normalizations for Habitat entities. Only normalizations with concepts absent from the training and development set were considered.

IMG

Phenotypes

Average Wang distance of normalizations for Phenotype entities.

Phenotypes (exact)

Average strict equality of normalizations for Phenotype entities.

Phenotypes (new in test)

Average Wang distance of normalizations for Phenotype entities. Only normalizations with concepts absent from the training and development set were considered.

IMG

BB-norm+ner

3 teams, 5 runs.

Slot Error Rate

The Slot Error Rate (SER) is shown instead of F1, because substitution errors are penalized both in Recall and Precision.

SER is an error rate, therefore lower values are better.

Named entity boundaries

Named-entities boundaries accuracy is measured by the Jaccard index.

Global results

Recall, Precision, and SER for all entities.

The score for each individual entity is the product of boundaries accuracy (Jaccard) and normalization (BB-norm).

IMG

Microorganisms

Results for Microorganism entities only (Jaccard . Equality).

IMG

Habitats

Results for Habitat entities only (Jaccard . Wang).

IMG

Phenotypes

Results for Phenotype entities only (Jaccard . Wang).

IMG

Microorganisms NER

Results for Microorganism entities boundary accuracy (Jaccard).

IMG

Habitats NER

Results for Habitat entities boundary accuracy (Jaccard).

IMG

Phenotypes NER

Results for Phenotype entities boundary accuracy (Jaccard).

IMG

BB-kb and BB-kb+ner

1 team, 2 runs.

The evaluation emulates the capacity of systems to populate databases from a corpus. The pairs of database references (NCBI and OntoBiotope) are evaluated regardless of their text-bound anchors or of their corpus redundancy.

The Mean References is the average of the Wang similarity (w=0.65) of the OntoBiotope argument.

BB-kb

IMG

BB-kb+ner

IMG

BB4

Bacteria Biotope 2019

Participating teams

Evaluation

Baseline

Confidence intervals

BB-rel

Global results

Lives_In

Exhibits

BB-rel+ner

Slot Error Rate

Named entity boundaries

Global results

Lives_In (Habitat)

Lives_In (Geographical)

Exhibits

BB-norm

Global results

Microorganisms

Habitats

Habitats (exact)

Habitats (new in test)

Phenotypes

Phenotypes (exact)

Phenotypes (new in test)

BB-norm+ner

Slot Error Rate

Named entity boundaries

Global results

Microorganisms

Habitats

Phenotypes

Microorganisms NER

Habitats NER

Phenotypes NER

BB-kb and BB-kb+ner

BB-kb

BB-kb+ner