Examples of topic indexing with Maui
All examples are produced using data sets available for download under Resources and MultiplyIndexedData.
Automatic tagging with Maui
We run a test on 180 documents from CiteULike. For each document, tags on which at least two human taggers agreed were considered as correct. All other tags as incorrect. Maui achieved F-Measure of nearly 50%, i.e. a half of the top 10 automatically extracted tags were correct.
Here are three examples of different performance levels. Numbers in brackets indicate how many people agreed on this tag. Bold indicates a correct term. Italic indicates that Maui matched a tag assigned by one person.
Excellent performance (F-Measure 80%). Different time courses of learning-related activity in the prefrontal cortex and striatum
| At least two taggers | One tagger | Maui |
learning (5) striatum (4) monkey (3) prefrontal cortex (2) reversal (2) | basalganglia, rt, dlpfc, prefrontal, caudate, pfc, stimulusresponseassociation, neurophysiology, dynamics, striatothalamocortical | learning striatum monkey prefrontal cortex basal ganglia |
Average performance (F-Measure 46%). Global and regional drivers of accelerating CO2 emissions
| At least two taggers | One tagger | Maui |
co2 (3) emissions (3) carbon (2) ipcc (2) economics (2) climate (2) projections (2) regional (2) | global, scenarios, sres, china | co2 emissions carbon global energy |
Poor performance (F-Measure 0%). Drought sensitivity shapes species distribution patterns in tropical forests.
| At least two taggers | One tagger | Maui |
precipitation (2) drought (2) ecology (2) vegetation (2) climate (2) | tropical, tropic | tropical tropical forests drought sensitivity species regional |
Assignment of Wikipedia terms to Computer Science articles
In this experiment, 20 computer science technical reports were used. Each document received topics from 15 teams of graduate students. Multiple data set was used to compute Maui's consistency with the consistency of people with each other. On average Maui outperforms 8 of the human teams.
Consistency analysis is used to show three different levels of Maui's performance. Numbers in brackets indicate how many human teams chose that topic.
Excellent performance. Maui: 52%, people 42%
Document Occam's razor: The cutting edge for parser technology
| Most frequent topics by 15 teams | Topics assigned by Maui |
Yacc (13) Parsing (12) Compiler-compiler (9) Backus Naur form (9) Compiler (6) | Yacc (13) Parsing (12) Compiler-compiler (9) Compiler (6) Programming language (4) |
Average performance. Maui: 30%, people 53%
Document A Safe, Efficient Regression Test Selection Technique
| Most frequent topics by 15 teams | Topics assigned by Maui |
Regression testing (15) Software maintenance (13) Control flow graph (10) Software testing (9) Algorithm (7) | Software maintenance (13) Algorithm (7) Test suite (2) Computer software (1) Control flow (0) |
Poor performance. Maui: 17%, people 17%
Document Cone trees in the UGA graphics system
| Most frequent topics by 15 teams | Topics assigned by Maui |
Hierarchical model (7) 3D computer graphics (7) Visualization (graphic) (6) Tree (data structure) (5) Computer graphics (3) | Computer graphics (3) Visualization (2) PARC (company) (2) Visual display unit (0) Graphics (0) |
Assignment of Agrovoc terms to agricultural documents
These experiments resulted from the collaboration with the Food and Agriculture Organization (FAO) of the United Nations. Each document is automatically indexed with terms from the domain-specific thesaurus Agrovoc. The results are compared to those assigned by professionals.
English agricultural documents
Six professional indexers at FAO indexed 30 documents each. Using consistency measure, we computed the agreement among the indexers and their agreement with Maui. Maui's consistency is only slightly worse than the consistency of professionals.
Here are examples of three performance levels that Maui achieved:
Excellent performance. Maui 51% vs. Indexers 40%
Document The dynamics of sanitary and technical requirement assisting the poor
| Topics by 6 professional indexers | Topics assigned by Maui |
Food safety (5) Livestock (5) Standards (5) Poverty (5) Developing countries (4) Food chains (4) Phytosanitary measures (4) Animal production (2) | Food safety (5) Livestock (5) Standards (5) Developing countries (4) Food chains (4) Animal health (2) FAO (2) Risk management (2) |
Average performance. Maui 29% vs. Indexers 35%
Document Climate change and the forest sector
| Topics by 6 professional indexers | Topics assigned by Maui |
Climatic change (6) International agreements (5) Forests (4) Greenhouse effect (4) Legislation (4) Forestry policies (4) Pollution control (4) Greenhouse gases (3) | Climatic change (6) Forests (4) Greenhouse gases (3) Forest management (3) Property (0) Climate (0) Land use (0) Forest products (0) |
Poor performance. Maui 7% vs. Indexers 35%
Document Phosphorus limitation of microbial processes in tropical forests
| Topics by 6 professional indexers | Topics assigned by Maui |
Tropical rain forests (6) Phosphorus (6) Soil chemicophysical properties (3) Soil fertility (3) Soil microorganisms (3) Soil biology (2) Microorganisms (2) Biodegradation (2) | Carbon (2) Costa Rica (2) Tropical forests (0) Respiration (0) Rain forests (0) Primary productivity (0) Forests (0) Soil (0) |
French and Spanish agricultural documents
After training on 66 French and 46 Spanish documents, no modification to code other than a new stemmer, stopwords list and encoding. Matching terms are shown in bold.
French document Relations terres-eau dans les bassins versant ruraux.
| Topics by professionals | Topics assigned by Maui |
Impact sur l'environnement Ressource en eau Utilisation des terres Aménagement de bassin versant Conservation de l'eau Qualité de l'eau Utilisation de l'eau Réglementation Eau de ruissellement Développement rural Organisation socioéconomique | Impact sur l'environnement Ressource en eau Utilisation des terres Bassin versant Pollution de l'eau Changement climatique Pollution atmosphérique Pollution par l'agriculture Étude de cas Métal lourd |
Spanish document Evaluación de los recursos forestales mundiales 2000 .
| Topics by professionals | Topics assigned by Maui |
Ordenación forestal Plantación forestal Productos forestales Recursos forestales Ciencias forestales Reforestación Sostenibilidad Transferencia de tecnología | Ordenación forestal Plantación forestal Productos forestales Recursos forestales Inventarios forestales Tierras forestales Bosques Biodiversidad Utilización de la tierra Muestra |
Indexing with Medical Subject Headings
After training on 450 documents from PubMed, Maui assigned MeSH terms like the following.
Document Determining lifestyle correlates of body mass index using multilevel analyses.
| assigned by professionals | assigned by Maui |
Adult Aged Body Mass Index Female Humans Life Style Linear Models Longitudinal Studies Male Middle Aged Norway Obesity Questionnaires Risk Factors Urban Population | Aged Body Mass Index Cardiovascular Diseases Humans Longitudinal Studies Middle Aged Motor Activity Norway Risk Factors |
Subject indexing with High Energy Physics thesaurus terms
Example document from CERN Document server Two-loop electroweak corrections to Higgs production at hadron colliders.
| assigned by professionals | assigned by Maui |
quark: top Higgs mass standard model LHC-B quantum chromodynamic spontaneous symmetry breaking Higgs boson symmetry breaking intermediate boson: mass | quark: top Higgs particle standard model LHC-B quantum chromodynamics cross section Higgs particle: mass hadron: production gluon: fusion boson |