Progress in the development of neural networks that classify images of slipper orchids and Javanese butterflies. Talk to LEBEN at Leiden University's biology department, IBL, 20 September 2016.
2. Taxonomic classification1
of digitized specimens2
using machine learning3
1. To give the right taxonomic name to a thing, or at least
approximate it to a higher level (e.g. Genus, Family)
2. Photographs of biological objects, e.g. from a natural
history collection and taken in a standardized setup
3. Machine learning explores the study and construction of
algorithms that can learn from and make predictions on
data
3. Case study: slipper orchids
Slipper orchids
• Traded illegally
• Photographed “in the wild”
4. Case study: Javanese butterflies
Van Groenendael-Krijger collection
• Collected in the 1930s
• Photographed in standardized setup
5. Project structure overview
• Open source, freely
available at:
github.com/naturalis
• Designed as loosely
coupled, swappable
modules
• Intended for re-use for
multiple cases
6. Project structure: reference images
photos [table]
id INTEGER NOT NULL
md5sum VARCHAR(32) NOT NULL
path VARCHAR(255)
title VARCHAR(100)
description VARCHAR(255)
photos_tags [table]
photo_id INTEGER NOT NULL
tag_id INTEGER NOT NULL
tags [table]
id INTEGER NOT NULL
name VARCHAR(50) NOT NULL
photos_taxa [table]
photo_id INTEGER NOT NULL
taxon_id INTEGER NOT NULL
taxa [table]
id INTEGER NOT NULL
rank_id INTEGER NOT NULL
name VARCHAR(50) NOT NULL
description VARCHAR(255)
ranks [table]
id INTEGER NOT NULL
name VARCHAR(50) NOT NULL
11. Results: SURF features
• PCA plots of the “speeded up robust
features” show clustering both at the
genus (top) and species (bottom) level
• Some species are so dimorphic that
the sexes are treated as separate
species (not shown)
• Some individuals are
“gynandromorphic”, though there is
likely positive collection bias
• Some taxa are much more variable
than others
12. Results: k-folds cross-validation
• Split the data in k (2, 5, 10) partitions
• Train on 1 partition, use k-1 as “out-of-sample” data
• Count number of correct/incorrect/unknown identifications
13. Next steps
• Application of trained neural networks to the entire
VGKS collection (once that is fully digitized)
• Testing other classifiers in addition to ANNs
• Improvement of the end user interface, possibly
as a native ‘app’ or on the web
• Extension of the platform to additional cases,
such as shells (snails, bivalves)
• Do more with the image feature data: mimicry,
character displacement, dimorphism
14. Acknowledgements
Naturalis sector Collection
• Max Caspers
• Luc Willemse
• Jan Moonen
• Digitization volunteers
Hogeschool Leiden
• Barbara Gravendeel
• Patrick Wijntjes
• Saskia de Vetter
LIACS
• Fons Verbeek
• Mengke Li
• Yuanhao Guo
IBL
• Wim van Tongeren
WUR
• Feia Matthijssen
Made possible by
• Naturalis internal grant for
application-oriented research
• The Van Groenendael-Krijger
Stichting
• Kind contributions of photos by
numerous orchid breeders