Machine learning and citizen science…a winning combination!
* Sarah Huebner, who heads up the Snapshot Safari team has written the following blog to give all participants of Snapshot Safari projects the low down on machine learning advances that are being introduced today*
In the era of Big Data, when equipment allows us to collect data faster than we assess it, researchers are always looking for ways to enhance and accelerate the process between data collection and analysis. We here at Snapshot Safari are proud to have been the first camera trapping project partnering with citizen scientists on Zooniverse when we introduced Snapshot Serengeti, and to have expanded that model from one African park to dozens. Now we hope to improve the data pipeline once again by integrating machine learning to reduce the amount of volunteer effort required to classify data from our participating sites.
‘Machine learning’ refers to Artificial Intelligence algorithms that have been trained for a specific task or purpose. These algorithms are fed millions of images labeled with their correct names and are ‘trained’ to recognize those animals again in different settings. These models generate ‘predictions’ based on the training they’ve received and provide confidence levels to let us know how sure they are that is the correct label. Because Snapshot Serengeti has been running since 2010, it has generated millions of images over the years, which make a perfect training dataset for machine learning (ML) algorithms. We are employing ML models to drastically reduce the effort required to retire empty images (no animals present) and to retire images of common animals like wildebeest and zebra.
First, our ML models have become quite good at telling us whether animals are present or not. This helps us to more easily spot cameras where vegetation has grown in front of the lens, resulting in hundreds of pictures of grass blowing in the wind. Pretty, but not quite what we’re after, so we can eliminate those prior to upload. Secondly, we have modified the retirement rules on Snapshot projects (implemented starting today as new seasons are launched) so that only two volunteers need to confirm the computer’s prediction of ‘empty’. This means instead of 10 or even 20 people viewing those photos, only two people will see them and can push them out of the dataset quickly.
Those of you who have been working on this project for a while know that the wildlife you’re most likely to see are zebras and wildebeest, and you all are great at identifying those! Because those are easy identifications, they too will retire with fewer views than before. What this means practically is that you should see more images of rare and cryptic species like predators and fewer blank images. We have implemented a number of retirement rules behind the scenes to make this happen, based on varying confidence levels produced by the algorithm. For example, our simulations have proven that even at only 50% confidence, the computer is right 99.6% of the time when it tells us that an image is empty. Therefore, any ‘empty’ prediction with confidence of 50% or more will only need two human views to confirm that the computer is correct. Likewise, if the model tells us that it’s a human with a confidence level of 80% or higher, we will retire with just two confirmations.
We will continue to improve the algorithm’s capabilities by using our most valuable asset—all of you! We hope that you will be as interested as we are in advancing the use of ML to make the classifying process more fun and satisfying. The algorithm is pretty good at species, but now we need to improve its ability to count animals, so we will soon be introducing a special project, ‘Snapshot Focus’, which will feature images the algorithm has reviewed and marked each animal with a bounding box. We will ask you to tell us whether the ML model got it right. Stay tuned for that and other special projects!
We are launching three new sites today—Camdeboo National Park, Kgalagadi Transfrontier Park, and DeHoop Nature Reserve, all from South Africa. These three projects have the new retirement rules in place, as will Season 12 of Snapshot Serengeti, which will launch in June. As new seasons or new projects come online, they will be set up with these rules and perhaps more as we refine the data pipeline. Let us and the moderators know how it goes. We are so thankful for your efforts and support, which help us to return data to our collaborators at reserves in Africa quickly and with confidence that it is correct thanks to the combination of citizen science and machine learning. Happy classifying!
Research Manager, Snapshot Safari
May 28, 2019
For more information about the machine learning algorithms created using Snapshot Serengeti images, see:
Willi, Marco, Pitman, Ross Tyzack, Cardoso, Annabelle W., Locke, Christina, Swanson, Alexandra, Boyer, Amy, Veldthuis, Marten, and Fortson, Lucy. (2019) Identifying animal species in camera trap images using deep learning and citizen science. Methods in Ecology and Evolution 10(1):80-91.
Norouzzadeh, Mohammad Sadegh, Nguyen, Anh, Kosmala, Margaret, Swanson, Alexandra, Palmer, Meredith S., Packer, Craig, and Clune, Jeff. (2017) Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences 115(25):E5716-E5725.
To read about how algorithms make decisions in comparison to humans, see:
Miao, Z., Gaynor, K.M., Wang, J., Liu, Z., Muellerklein, O., Norouzzadeh, M.S., McInturff, A., Bowie, R.C., Nathon, R., Stella, X.Y. and Getz, W.M. (2018) A comparison of visual features used by humans and machines to classify wildlife. bioRxiv, p.450189.
A Picture Worth A Thousand Words – Part 2
I have written about this before, I know but this series of images has got me going again.
The Snapshot Serengeti images are great and they have captured some stunning stuff over the years (melanistic serval, oxpeckers roosting at night on giraffe, a buffalo hunt by lions) and individually they have produced some amazing portraits but every once in a while, the old adage ‘a picture is worth a thousand words’ doesn’t quite ring true.
I posted images of a male seemingly strolling through the savannah a while back, musing on what he had been up to. The other day whilst browsing through images I found more parts to the picture. More images of him moving at different times and a female.
I am sure that Snapshot Serengeti followers could add to this series of images if we delve further but I was intrigued.
What’s happening? Is it a male and female spending some days together whilst mating (as happens with lions)? Or are there other pride members around? The female looks as though she has blood staining her face so perhaps, they have been feasting on a kill, alternately moving back and forth to water or shade between snacking.
It is one of those instances when you would like the camera to just swivel a bit to see if we can learn more but technology (or at least affordable tech) has not quite reached that state yet.
So, we will just have to sit back and enjoy what we can see, a pair of very full looking handsome lions and let our imaginations do the rest. Sometimes no knowing is part of the fun.
Machine Learning Update
It has been just over a year now since Snapshot Safari was launched. Snapshot Serengeti as the original Zooniverse citizen science camera trap project has remained the flagship project but there are now several other projects under the Safari banner.
One of the good things about joining forces with all these other projects is that collaboration tends to bring perks that operating on your own doesn’t. For example, different areas of expertise that some of us would just never have thought of, being somewhat far from the mind of field researchers and ecologists.
I am talking about the computer learning side of things, ML (Machine Learning) in particular. The team behind Snapshot Safari have been working hard on this aspect essentially to help speed up the rate of classification needed now that there is so much data across different projects.
They have recently engaged a specialist in machine learning at the University of Minnesota who is helping them to develop ML for use with the Snapshot Safari projects.
The idea is to run an algorithm on the data prior to uploading it to the Zooniverse, this will identify most of the misfire and vegetation only images meaning us volunteers will be left with more of the good stuff, the actual animal images.
The algorithm has been trained on millions of images from Snapshot Serengeti, now for these latest batches the team have asked the ‘machine’ to predict which species it is seeing, how many and what the behaviour of the animals is. Don’t panic we really are in the early stages and the idea is to compare what it came up with against the actual results from the volunteers whom we already know are pretty darn accurate. The team doesn’t expect great things yet from the ML and doesn’t foresee computers taking over any time soon but it will be interesting to see how the ML goes.
So, for now things will be continuing as normal, no changes in how you classify or what you do but hopefully there will be fewer blank images with no animals. As the team stresses, they are a looonnng way from machines taking over.
For those of you who enjoy being part of this developmental side of the project the Safari team will soon be launching a side project called Snapshot Focus. It is designed to also look at how well the ML is at recognising animals in trickier images, especially those with multiple animals. If you feel like helping out as a break from the usual work flow it couldn’t be simpler. All you have to do is answer yes/no as to whether the computer has managed to put a bounding box around every animal in the image.
Look out for updates from the team over the coming months to learn more about these developments but meanwhile there are still lots of images to classify on Snapshot Serengeti.
Trials and Tribulations
Data collection is the back bone of field research work and can sound glamorous and exciting to those who are office bound but I will let you into a little secret, it can be exhausting and frustrating and unrewarding too.
Firstly, you have to remember that researchers often work in remote places and whilst this is amazing it does lead to some logistical nightmares. Take for instance my recent experience. My task was to visit 18 Ilchokuti or lion guardians from KopeLion to collect the data they had recorded during the previous month. Now they are spread out over 1300k2, in itself quite a distance but when you factor in the rough at best, non-existent at worst roads you begin to have an idea of the task. I would be lucky if it didn’t rain, that would only add to the woes. Another thing to remember is that, barring a few lucky people working for high profile organisations, most researchers have to nurse their aged vehicles along, fixing things as you go. This trip wasn’t too bad as we seemed to only suffer from door catches failing so nothing a bit of string or a Leatherman wouldn’t fix. The budgets just never seem to run to decent cars.
Just as I was about to feel smug about the lack of rain hampering our journey it dawned on me that dry conditions held their bad points too. Dust! The fine dust covering some of the landscape here is deadly. It penetrates everything and with a three-day trip planned with no opportunity for a shower, boy does it get tiring. Forget enjoying the scenery as you drive, you mostly feel as if you are in a cloud only with a yellow tinge that makes it hard to breath in place of the fluffy white.
Anyway, I can’t really complain it was a wonderful three days and meeting up with a couple of our guys in the middle of nowhere under a great baobab tree acting as our office for an hour or so was something to make you smile.
My colleague, Meritho Katei, over in the Serengeti has an even harder job under similar conditions. I was simply rendezvousing with other people, collecting and issuing data sheets and downloading GPS data. Meritho is trying to pick up on the lion monitoring for the Serengeti Lion project that has been on hold for a while.
His task is to reconnect with the prides of lions previously being followed and studied and to catch up on the family histories. New members need to be identified, files made on them and changes in pride composition noted. He is working with the Snapshot Serengeti camera trap data to see where the prides are hanging out but of course we aren’t quite up to date with the classifying so that’s not the greatest help. Instead he is relying on a lot of kilometres driving, following up on tourist sightings and tracking data and a good set of eyes to track down the prides and observe them.
So as I washed the dust out of my hair, luxuriating in a hot shower after my three day successful, mission accomplished trip, I had to reflect that poor Meritho was in for many months of hard slog catching up with those lions and with the rains coming things are about to get even harder. Good luck Meritho!
A Team Effort
Snapshot Serengeti has been on the go since 2010 in one form or another and over those years a team of dedicated people has kept it running. The base of the effort is the 225 camera-traps that have been snapping away continuously for that whole period. Of course for that to happen there needs to be researchers and assistants on the ground physically looking after camera-traps, a scientific team who coordinate all data processing and analysis, a management team running the administration of the project and generous funders to keep everything alive and kicking.
Snapshot Serengeti could also not work without all the thousands of volunteer citizen scientist who generously give their time and energy to classifying all the millions of images, ultimately helping the researchers to answer scientific questions we hope will aid in the conservation of all that we love about the Serengeti.
Here in these blogs we have celebrated all these people but it dawned on me recently that there is one group of people that seem to have been forgotten, our moderators.
Our amazing team at Snapshot Serengeti deserve a special mention. They, like our citizen scientists are volunteers, dedicating their time and expertise for free. Contrary to what some may think they are not part of the scientific team in as far as they are not university students who do the job as part of their studies. No, they are a mixed bunch in terms of back ground and do the job plain and simply because they love the Serengeti and love the project. They spend a huge amount of time online helping other users with their classifications, guiding new users through some of the pit falls they know only too well and sharing their collective knowledge through prompt responses to questions and great information posts helping others with less experience to understand the Serengeti and its wildlife. They also have to deal with the odd, luckily very infrequent, troll which is a thankless task in diplomacy. We are privileged to have such an amazing team and I know that they are greatly appreciated by Snapshot Serengeti’s participants.
So thank you to davidbygott, maricksu and tillydad who have been with us since the beginning and welcome to parsfan and nmw. You Guy’s are the best and Snapshot Serengeti would not be the experience it is without all your help.
Living In The Lions Den
Snapshot Serengeti is in the limelight again!
A new paper titled “No respect for apex carnivores: Distribution and activity patterns of honey badgers in the Serengeti” has been published by a team from the University of Wisconsin and University of Ljubljana using the Snapshot Serengeti data classified by our citizen scientists.
Honey badgers are surprisingly understudied. Although extremely charismatic the fact that they have large territories, up to 541km2 for an adult male in the Kalahari, and no clear habitat preferences makes it hard to predict where to find and study them.
The Snapshot Serengeti data of course is a dream come true to many researchers enabling them to ask scientific questions without having to wait potentially years to collect data themselves.The team took advantage of the open access data, courtesy of Snapshot Serengeti to look at what they could learn about honey badgers and how they live alongside other predators. Ferocious as they are honey badgers are killed by lion, hyena and leopard and so the team wanted to know whether they avoided areas where these large carnivores were active.
Well it seems that despite ending up as an occasional victim the honey badger is quite happy living alongside the larger carnivores, at least in the Serengeti anyway according to the authors. It appears as if the honey badger actively seeks out the same habitats as the large carnivores. The authors modelled a variety of different explanatory scenarios to see which would be the best fit to explain honey badger distribution across the Serengeti study area. Included where variables such as habitat preference, water availability, cover availability, lion abundance, and leopard abundance. Their best models showed that the presence of all three large carnivores coincided with the presence of honey badgers and that there was also a positive correlation temporally between leopard, hyena and honey badger showing that they use the same habitat at the same time.
It’s interesting stuff. The authors do point out that although the data set was huge there was actually very few incidence of honey badger over the 3 year period covered by their work and so their sample size was small. It does however show just how valuable the data collected by Snapshot Serengeti and the other Snapshot Safari projects can be, if nothing else to give scientists a relatively inexpensive way to explore questions before undertaking more specific research work themselves.
You can read the paper here, although it is not open access unfortunately: https://www.sciencedirect.com/science/article/pii/S1616504717302720
These two images illustrate the point nicely, you can clearly see the same camera has captured honey badger and spotted hyena with in 13 days of each other. Interestingly both in day light.
Recap on Snapshot
Whilst stretching the corners of my brain to think about a new topic to write about in the Snapshot Serengeti blogs it astounds me to realise just how long we have been going for; over 7 years now as Snapshot Serengeti and almost 10 if you include the Serengetilive days.
It is also humbling to know how dedicated our followers are and what support we get from them. Our fun would have been over long ago if the community had not backed us. It has occurred to me that Snapshot Serengeti’s followers do so in differing ways. Those who follow our facebook and twitter pages or WordPress fans who follow us through our blogs may have missed what it is we are up to. So at risk of boring those of you who do know I thought it was about time to reiterate what it is we at Snapshot Serengeti do and how it all works.
Our largest group of followers do so at www.snapshotserengeti.org helping us classify the millions of camera-trap images that are produced by around 225 camera-traps placed in a permanent grid pattern in our study zone in the Serengeti National Park. For continuity’s sake these sites, after an initial bit of trial and error have remained in their fixed spots since they were first chosen by the projects designer, Dr Ali Swanson back in 2010.
Originally the camera-trap grid was set up to answer questions on carnivore interactions specifically if carnivores were avoiding one another spatially and temporally, it soon became apparent that it could be used to pose many more scientific questions amongst them herbivore coexistence and predator prey relationships. The wisdom to leave this permanent window of observance into the lives of the Serengeti animals should lead to many future studies and has spawned many new similar camera-trap projects around the world.
It’s not all about the animals, in fact since teaming with Zooniverse the project has been as much about the advancement of citizen science as anything else. Back in the Serengetilive days there were so few of us taking part that we used to have our names up in a sort of league table of who had classified the most images. Each classified image was labelled by the classifiers name. Now of course there are far too many participants to bother with that kind of thing, besides with multiple people having to agree on each classification it might get messy. The work on developing a robust algorithm that dealt with the uncertainties in each individual classification was so involved it also paved the way for many more projects and several scientific papers.
So what do we ask classifiers to do? Well first you are presented with either a run of 3 images (day time) or 1 image (night time). You are then asked to decide and record what animals are present, numbers of each species, behaviour and whether there are young present or not. It’s pretty straight forward with prompts along the way. If you don’t know what the animal is you simply guess. Yes you read that right, you guess. One thing the developers worked out is that the whole project works better if you cannot skip images. For one thing it avoids all the hard or boring images being left till the end. As each image has to be agreed upon by several classifiers before it is retired this tends to smooth out any miss classifications and research has shown we are around 97% accurate.
If you find something good or something you cannot id and are curious you can add the image to Talk which is the discussion forum. There we have some very dedicated moderators who will help you with your queries.
All in all Snapshot Serengeti is about learning and sharing both for the researchers and for the community of classifiers so if you have been enjoying the facebook posts or reading the blogs but have never had a go classifying get yourself over there to www.snapshotserengeti.org and have a go.
A few years back the Snapshot Serengeti community classified its first ever zorilla (Ictonyx striatus). It’s not an animal that most people are familiar with and you would probably not expect to even see one on a safari in Africa. It is certainly not on the big five list or even on the little five list but just like its bigger cousin the honey badger it’s a gutsy little creature.
The zorilla, or African striped polecat belongs to the Mustelidae family that includes the well known honey badger, otters and weasels. Although superficially similar looking to skunks they are unrelated.
So how would you recognise one? First of all they are small, only 28 – 38 cm long with a bushy tail of around 25cm. The most striking feature is the black and white stripy coat. The body is overall black with four white stripes running from the head to the tail which is mostly white. There are white patches on the face and head and the small ears are often white rimmed. The fur is quite long lending the zorilla a slightly scruffy appearance.
Zorillas share more than their looks with skunks; they are also able to squirt noxious smelling liquid from their anal glands as a defence mechanism against predators. So if you do ever encounter one, give this little guy the respect it demands and stay well back.
These tough little creatures will eat a wide range of items, invertebrates, reptiles, rodents and birds but they seem to favour rodents and insects. They are known to tackle venomous snakes and large rats by pinning them to the ground and repeatedly biting at the back of the neck to make the kill. Nocturnal, they find most prey by smell and won’t hesitate to follow prey down into burrows or use their strong claws to un-earth something. Its small elongated size is perfectly adapted for this task.
Zorillas are found across sub Saharan Africa in a wide variety of habitats but seem to avoid the wetter rainforest belt through west and central Africa. They are mainly terrestrial but are known to be good climbers and happy to swim if needed. Due to their widespread presence in Africa their conservation status is classed as Least Concern by the IUCN red list.
So why don’t we see more on our camera-traps? Well not much is known about zorillas, their small size and nocturnal habits make them hard to study and in a place like the Serengeti they are just not a high priority species. However we do know they are present because they do show up on the camera-traps from time to time. I suspect that the main reason they are not picked up more frequently is their small size coupled with their fast paced frenetic life. The weasel family is notoriously hard to capture on camera-trap as they tend to shoot through the trigger zone before the camera has got its-self together to trigger. Perhaps we will see more zorillas in the future when the camera trigger speeds increase.
This is a great sequence from one of our 225 camera-traps that are tirelessly snapping away in the heart of the Serengeti National Park. One of the largest and longest running camera-trap projects, Snapshot Serengeti has been running for over 8 years with out a break.
The millions of images generated by so many cameras are processed by the amazing online community of citizen scientists without whom the team of scientists would probably still be working their way through season 2 rather than ploughing their way through season 10 (that’s were we are at currently, November 2015 to September 2016).
Those of you who have helped out on Snapshot Serengeti will realise that there is a great variety of images that come up, they are randomly assigned to each classifier and currently have to have at least 10 matched classifications before being retired. A great many are of grass or the tail end of animals as they pass by. Its the frustration of camera-trapping, when you look at the results you just wish you could nudge the camera to the left a bit to get a better image but of course its way to late for that once you have the image safely in hand.
But every once in a while we get a stunning image worthy of a professional photographer or one that shows really interesting behaviour and those are the images that get people hooked on returning day after day to help out on Snapshot Serengeti. A little fix of wildlife in its environment enjoyed from your home.
This sequence of a lion pride is great. In all likely hood the four individuals we can see here are not the only members, others could be out of frame. It looks as though the female has a pink tone to her muzzle, now it could be a trick of the light but its also possible that the pride have recently fed. The full looking belly and the relaxed nature of the other members would lend weight to this possibility but we will never know for sure (unless more images from that camera-trap reveal more proof!)
It is most definitely one of those moments when you wish you could pan the camera around to see what else is going on, did we almost get a kill on camera? are there a bunch of cubs laying under a bush to the right? are there a pair of resplendent pride males slumbering to the left?
If you do discover more from this series do let us know but for the time being we shall have to let our imagination ramble.
The Dung Collectors
I promised I would have some news about what the Serengeti team has been up to recently in the field. Our beloved camera-trap grid is still being cared for, cards downloaded, batteries replaced and cameras given the once over. So all is well on that front but what is the latest question being asked by the team.
Well thanks to the spatial occupancy modelling of the Snapshot Serengeti camera-trap grid we have learned a lot about how the animals share the environment. What we can’t derive from the camera trap images is the details of what the different species are doing when they are in those spaces and how so many large herbivores can exist together. It could be that they simply facilitate each others foraging or maybe they are using different resources. Scientists have identified what is known as niche partitioning, a mechanism that sees different species specialising in eating different proportions of grasses verses non-grasses; pure grazers and pure browsers and a sliding scale between the two. A second mechanism sees different species eating different parts of the same plant.
These two mechanisms seem to make perfect sense but it is not understood to what extent these two truly affect coexistence of large herbivores. This is where the Snapshot Serengeti team research comes in.
Under our own Dr Michael Anderson they have teamed up with Dr Rob Pringle and researchers at Princeton University in using a revolutionary new analysis method known as DNA metabarcoding to see what exactly each animal is eating.
Up until recently scientists studying herbivore diet had two choices, they could watch their subjects and try to identify what they were eating or they could use microhistology, whereby plant parts in faeces are visually identified. As you can imagine these methods are fine for differentiating between, say, grasses and trees but don’t allow scientists to classify down to individual plant species. With DNA metabarcoding they now have that ability and it should tell us a whole lot more about how the animals divide their resources in space and time.
So that’s the science but how does the team collect this data. Well as with microhistology it involves dung. Our intrepid scientists are roaming the Serengeti collecting poop from as many different herbivores as they can and then it all has to be shipped back to the labs for analysis.
If you are thinking that our team must be highly skilled detectives able to identify a wide variety of brown pellets in the savannah grasses then think again. That’s not to say they can’t but this work relies on 100% knowing which species produced said dung its sex and age as well as a sample that has not been contaminated in anyway. The method of collection relies then, on stealthy observation waiting for an individual to lift its tail and sprinkle the ground with brown pellets before running in with your sample jar at the ready to collect the freshly deposited “clean” offerings. I have some experience with this work and believe me it does feel slightly odd to be observing animals in this way, willing them on to have a bowel movement so you can move on to the next species. It is also a little risky as you can get so engrossed at watching your target animal that you forget there are predators there watching and waiting. At least in this project it is only herbivores the team are interested in, to do the same with predator’s faeces, that’s a whole lot more smelly.
The study is still in its early stages but the team reports they are already seeing some noteworthy things.
Spoiler alert, early results suggest that there are only two ‘pure’ grazers in Serengeti (zebra and warthog) and lots of variation between wet and dry season.
We will bring you further updates once the team has finished their analysis work and have the full results. It promises to be exciting stuff. In the meantime you can think on the glamorous job a field scientist has whilst you stay clean at home helping with the job of classification.