Machine learning and citizen science…a winning combination!
* Sarah Huebner, who heads up the Snapshot Safari team has written the following blog to give all participants of Snapshot Safari projects the low down on machine learning advances that are being introduced today*
In the era of Big Data, when equipment allows us to collect data faster than we assess it, researchers are always looking for ways to enhance and accelerate the process between data collection and analysis. We here at Snapshot Safari are proud to have been the first camera trapping project partnering with citizen scientists on Zooniverse when we introduced Snapshot Serengeti, and to have expanded that model from one African park to dozens. Now we hope to improve the data pipeline once again by integrating machine learning to reduce the amount of volunteer effort required to classify data from our participating sites.
‘Machine learning’ refers to Artificial Intelligence algorithms that have been trained for a specific task or purpose. These algorithms are fed millions of images labeled with their correct names and are ‘trained’ to recognize those animals again in different settings. These models generate ‘predictions’ based on the training they’ve received and provide confidence levels to let us know how sure they are that is the correct label. Because Snapshot Serengeti has been running since 2010, it has generated millions of images over the years, which make a perfect training dataset for machine learning (ML) algorithms. We are employing ML models to drastically reduce the effort required to retire empty images (no animals present) and to retire images of common animals like wildebeest and zebra.
First, our ML models have become quite good at telling us whether animals are present or not. This helps us to more easily spot cameras where vegetation has grown in front of the lens, resulting in hundreds of pictures of grass blowing in the wind. Pretty, but not quite what we’re after, so we can eliminate those prior to upload. Secondly, we have modified the retirement rules on Snapshot projects (implemented starting today as new seasons are launched) so that only two volunteers need to confirm the computer’s prediction of ‘empty’. This means instead of 10 or even 20 people viewing those photos, only two people will see them and can push them out of the dataset quickly.
Those of you who have been working on this project for a while know that the wildlife you’re most likely to see are zebras and wildebeest, and you all are great at identifying those! Because those are easy identifications, they too will retire with fewer views than before. What this means practically is that you should see more images of rare and cryptic species like predators and fewer blank images. We have implemented a number of retirement rules behind the scenes to make this happen, based on varying confidence levels produced by the algorithm. For example, our simulations have proven that even at only 50% confidence, the computer is right 99.6% of the time when it tells us that an image is empty. Therefore, any ‘empty’ prediction with confidence of 50% or more will only need two human views to confirm that the computer is correct. Likewise, if the model tells us that it’s a human with a confidence level of 80% or higher, we will retire with just two confirmations.
We will continue to improve the algorithm’s capabilities by using our most valuable asset—all of you! We hope that you will be as interested as we are in advancing the use of ML to make the classifying process more fun and satisfying. The algorithm is pretty good at species, but now we need to improve its ability to count animals, so we will soon be introducing a special project, ‘Snapshot Focus’, which will feature images the algorithm has reviewed and marked each animal with a bounding box. We will ask you to tell us whether the ML model got it right. Stay tuned for that and other special projects!
We are launching three new sites today—Camdeboo National Park, Kgalagadi Transfrontier Park, and DeHoop Nature Reserve, all from South Africa. These three projects have the new retirement rules in place, as will Season 12 of Snapshot Serengeti, which will launch in June. As new seasons or new projects come online, they will be set up with these rules and perhaps more as we refine the data pipeline. Let us and the moderators know how it goes. We are so thankful for your efforts and support, which help us to return data to our collaborators at reserves in Africa quickly and with confidence that it is correct thanks to the combination of citizen science and machine learning. Happy classifying!
Research Manager, Snapshot Safari
May 28, 2019
For more information about the machine learning algorithms created using Snapshot Serengeti images, see:
Willi, Marco, Pitman, Ross Tyzack, Cardoso, Annabelle W., Locke, Christina, Swanson, Alexandra, Boyer, Amy, Veldthuis, Marten, and Fortson, Lucy. (2019) Identifying animal species in camera trap images using deep learning and citizen science. Methods in Ecology and Evolution 10(1):80-91.
Norouzzadeh, Mohammad Sadegh, Nguyen, Anh, Kosmala, Margaret, Swanson, Alexandra, Palmer, Meredith S., Packer, Craig, and Clune, Jeff. (2017) Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences 115(25):E5716-E5725.
To read about how algorithms make decisions in comparison to humans, see:
Miao, Z., Gaynor, K.M., Wang, J., Liu, Z., Muellerklein, O., Norouzzadeh, M.S., McInturff, A., Bowie, R.C., Nathon, R., Stella, X.Y. and Getz, W.M. (2018) A comparison of visual features used by humans and machines to classify wildlife. bioRxiv, p.450189.