Archive | Data Analysis RSS for this section

Progress and stuff



Some of you will have noticed that our progress bar on season 10 has not been showing any progress. Well it turns out that we have made loads of progress, it’s just the bar that was not getting anywhere.

The good folks at Zooniverse have fixed it for us and you will now see we are about half way through season 10 which is fantastic. There are just under 700 000 images to classify this season so thanks to you, are dedicated team of citizen scientists we have around 350 000 left to go. That’s 350 000 chances of finding that one image you have been waiting for. I have noticed recently lots of you posting on talk that you have classified your first ‘waterbuck’ or ‘serval’. If you haven’t discovered your dream find yet there is still time and yes there is a season 11 in the wings.

Whilst on the subject of talk I wanted to gently remind everyone of a few etiquette points.

#Hashtags, love ‘em or hate ‘em they are part of social media and they are not going away. On Snapshot Serengeti we use them for a specific reason and that is to help others to search for and find certain images.

If you have found a great image that you think others will want to see and you are certain of the species then go ahead and hashtag it, but, if you find an image that you are not sure of then please don’t hashtag it with your guess. You can still put the pictures up in talk for discussion and perhaps someone else will be along who is positive about the id and can then hashtag it. Basically, please use hashtags thoughtfully.

Which brings me to another point; if you can’t identify an image and you post it up for discussion always give us your best guess. No one will laugh; it’s what makes it fun seeing what other people make of the images when you are really stumped. Many a time I have confidently shared a tricky image almost certain for instance it’s a long sort after rhino only to have someone else’s eyes point out that if I look a bit closer that actually it is a rock! Even our expert modifiers get things wrong occasionally and are reluctant to confidently make a call on certain images. Some of them are just so darn impossible to id. So just give it your best shot, it’s what everyone else does.

The main aim is to enjoy yourself, challenge yourself and use other peoples experience when yours fails you. The Snapshot family of classifiers and moderators is a dedicated and knowledgeable bunch and as I have said before, this project would not exist without you all. Keep up the great work one and all.

Citizen Science Conference



Meredith giving a project slam Photo: Avi Baruch

Meredith has been busy this past week attending the Citizen Science conference in St Paul, Minnesota. She reports back that it was a fantastically stimulating conference that confirms the high esteem that citizen science has grown within the science community.


The yearly conference sees a diverse group of people from researchers, educators and universities to the likes of NGO’s and museums get together to discuss the use and promotion of citizen science. Although we at Snapshot Serengeti have been aware of its great impact for some time citizen science is now emerging and is recognised as a powerful tool in the advancement of research by many.


Those attending the four day event collaborated by sharing their varied experience and ideas on a variety of topics. The collection and sharing of data and how to impact policy was discussed. There was focus on how to use citizen science as an engaging teaching tool, how to bring citizen science to a wider audience and how to involve citizens more in research. Those attending brought their joint experience and expertise together to discuss how citizen science impact on science could be measured and evaluated. If you want to find out more about the conference then visit this link.


We sometimes forget when working away at classifying our stunning images on Snapshot Serengeti that there is a lot of tech going on that enables us citizen scientists to be of use to the scientists. Meredith gave what’s known as a ‘project slam’ essentially a 5 minute presentation about our work on Snapshot Serengeti  and how it has paved the way for helping other cameratrap citizen science projects. A quick look around Zooniverse will show just how many there are now.


The massive amount of data produced over several seasons through Snapshot Serengeti have allowed the development of a robust, tried and tested methodology that smaller projects would have taken years longer to develop. Just contemplate the work that went into developing interfaces, protocols, pipelines and algorithms for taking millions of classifications of untrained volunteers and turning them into a dataset which has been verified to be >97% accurate.

It is awesome to see that something we all find so truly engaging can translate into such serious stuff in the field of science. I think we, the citizen scientists, and the Snapshot team can be rightly proud of our work on this brand new branch of science

Topi versus Hartebeest

Here is another pair of antelope that are often muddled up on Snapshot Serengeti; topi and hartebeest. These two share a similar size and body shape and for those of you not familiar with them they can prove a bit tricky.

Topi and hartebeest belong to the same tribe, Alcelaphini, which also includes wildebeest. These antelope typically have an elongated face, long legs, short necks and stocky bodies. Although these antelope have reasonably large bodies their long legs mean they have retained the ability to run fast, a good adaptation for life on the open plains. It is believed that the long face developed in place of a long neck in order to reach the grasses they consume.

There are several species of both topi and hartebeest in Africa, two are found in the Serengeti. Coke’s hartebeest or kongoni (Alcelaphus cokii) are selective grazers with browse making up less than 4% of their diet. Serengeti topi (Damaliscus jimela) are 100% grazers

In both species males are territorial but topi also form leks from which to display to passing females. Males holding territory close to the lek are more desirable to females. Dominant females will actively prevent subordinate females from mating with these males.


Topi                                                         Hartebeest

So side by side we can see that the topi is much darker coloured than the hartebeest with distinct sandy socks up to its knees and conspicuous black patches on the thighs and shoulders. In contrast the hartebeest has pale legs and underbelly with a darker upper body. The paleness forms a patch on the top of the thigh.


Topi                                                                        Hartebeest

From behind the contrast between leg colour and backside is very obvious with topi sporting dark legs with pale rump and back and hartebeest pale legs and rump with dark back.

Horn shape is also different. A topi’s horns sweep up and back whereas a heartebeest’s sweep out to the side before kinking back. They also sit on a prominent bony ridge on the top of the head.

Hopefully this will help you tackle all the images waiting on season 10.

The Data Game


Team member checking a camera-trap


So we all know there are millions of images on snapshot Serengeti and that it is us citizen scientists who do all the work classifying them. The scientists can then get on with the task of figuring out what’s going on out there in the animal kingdom, hopefully in time to save some of it from our own destructive nature.

But… have you spared much thought as to how the images go from over 200 individual camera-traps dotted around the Serengeti to the Zooniverse portal  in a state for us to start our work.

Firstly the SD cards have to be collected from the cameras and as this is an ongoing study replaced with fresh SD cards. This is done about every 6 to 8 weeks. A camera traps batteries can actually go on performing far longer than this but as the field conditions can be tough you never know when a camera may malfunction. This time frame is a good balance between not ending up with months worth of gaps in the data and not spending every minute in the field changing cards.

The team are able to check about 6 to 10 sites a day so with 225 cameras in play it takes around a month just to get to each site. Mostly the cameras are snapping away happily but there are always some that have had encounters with elephants or hyena but actually some of the most destructive critters can be bugs, they like to make nests of the camera boxes. As well as checking the cameras themselves the sites need to be cleared of any interfering foliage, we all know how frustrating a stray grass blade can be.



Snake camping out inside the camera-trap


So with a hard drive full of all the data it then has to wait for a visiting field researcher to hand carry it back to the University of Minnesota, USA. It means the data is only received every 6 months or so but it is far safer than trusting the post. Once safely received it is up to Meredith to start the painstaking work of extracting the date time stamps. As sometimes happens there are glitches and she has to fix this by figuring out when the camera went off line or when capture events got stuck together. She says it is much like detective work. The images are then assigned codes and stored on the Minnesota Supercomputer Institute (MSI) servers.

Once it is all cleaned up and backed up it is sent to the Zooniverse team who then format it for their system giving new identifiers to each image. Finally it is ready for release to all the thousands of classifiers out there to get to work on.

So as you can see it really is a team effort and a massive under taking. It is no good collecting tonnes of data if there is no one with the time to do anything with it. I will take this opportunity again to thank you for all your help with the project. Keep up the good work.

Why we do it

Congratulations, your time classifying images on Snapshot Serengeti has resulted in yet another scientific paper. Over 70,000 of you have contributed to analysing the millions of images produced by the 225 Snapshot Serengeti cameras over the last few years. Thanks to all your effort the cameras are still rolling, creating one of the longest running cameratrap studies going.  This data set is so important to scientists because of the size of the area it covers as well as the length of time it has been recording for. It allows them to ask many and varied questions about a naturally functioning healthy ecosystem and in today’s changing world it has never been so important to figure out what makes this planet tick.

The paper ‘The spatial distribution of African Savannah herbivores: species associations and habitat occupancy in a landscape context’ was published last year in Philosophical Transactions B. Visit here to read the article.

The Snapshot Serengeti team argue that if we want to predict the impact of changes/ losses of large mammals in the future we need to have a quantitative understanding of a currently functioning ecosystem. It just so happens that the Snapshot data set is perfect for this. The Serengeti National Park is representative of the grass dominated Savannahs of East Africa which are home to the world’s greatest diversity of ungulate (hoofed animals) grazers.

The team present a neat graphic that shows how the various elements interact to affect herbivore habitat occupancy.


Predators, herbivores, termites, fire, grasses and trees all play a role in determining where different herbivores choose to roam.

It seems that herbivore body size is also important to habitat selection. For example large herbivores survive by bulk grazing whereas small herbivores concentrate on grazing quality over quantity. Recently burned ground results in new vegetation growth. This growth is relatively high in nutrients compared with unburned patches and the same can be found on and around termite mounds. Small herbivores were found to occupy these areas but the sparse coverage does not favour large herbivores that must eat more volume.

The paper highlights the complex relationship between predators, herbivores, vegetation and disturbance and is well worth a read. Next time you are classifying images see if you agree. Do you see many herds of zebra or wildebeest on burnt areas or is it mostly Thompson’s gazelle? It’s another way to look at the images you classify.

Getting Good Data, Part II (of many)

Okay, so by now you’ve heard dozens and dozens of times that you guys produce really good data: your aggregated answers are 97% correct overall (see here and here and here). But we also know that not all images are equally easy. More specifically, not all species are equally easy. It’s a lot easier to identify a giraffe or zebra than it is to decide between an aardwolf and striped hyena.

The plot below shows the different error rates for each species. Note that error comes in two forms. You can have a “false negative” which means you miss a species given that it’s truly there. And then you can have a “false positive,” in which you report a species as being there when it really isn’t. Error is a proportion from 0 to 1.

Species specific error rates.

Species specific error rates.

We calculated this by comparing the consensus data to the gold standard dataset that Margaret collated last year. Note that at the bottom of the chart there are a handful of species that don’t have any values for false negatives. That’s because, for statistical reasons, we could only calculate false negative error rates from completely randomly sampled images, and those species are so rare that they didn’t appear in the gold standard dataset. But for false positives, we could randomly sample images from any consensus classification – so I gathered a bunch of images that had been identified as these rare species and checked them to calculate false positive rates.

Now, if a species has really low rates of false negatives and really low rates of false positives, then it’s one people are really good at identifying all the time. Note that in general, species have pretty low rates of both types of error. Furthermore, species with lower rates of false negatives have higher rates of false positives. There aren’t really any species with high rates of both types of error. Take rhinos, for example: folks often identify a rhino when it’s not actually there, but never miss a rhino if it is there.

Also: we see that rare species are just generally harder to identify correctly than common species. The plot below shows the same false negative and false positive error rates plotted against the total number of pictures for every species. Even though there is some noise, those lines reflect  significant trends: in general, the more pictures of an animal, the more often folks get it right!

Error rates vs. species commonness, measured by the total number of pictures of that species

Error rates vs. species commonness, measured by the total number of pictures of that species

This makes intuitive sense. It’s really hard to get a good “search image” for something you never see. But also folks are especially excited to see something rare. You can see this if you search the talk pages for “rhino” or “zorilla.” Both of these have high false positive rates, meaning people say it’s a rhino or zorilla when it’s really not. Thus, most of the images that show up tagged as these really rare creatures aren’t.

But that’s okay for the science. Because recall that we can assess how confident we are in an answer based on the evenness score, fraction support, and fraction blanks. Because such critters are so rare, we want to be really sure that those IDs are right — but because those animals are so rare, and because you have high levels of agreement on the vast majority of images, it makes it really easy to review any “uncertain” image that’s been ID’d as a rare species.

Pretty cool, huh?

Season 8 Release!

And now, the moment you’ve all been waiting for …  Can I present to you:



I’m particularly proud of this, the first season that I’ve helped to bring all the way from the field to your computers. We’ve got a lot of data here, and I can’t wait for you guys to discover a whole host of exciting things in this new season.

This season is accompanied by IMPORTANT changes to our interface!

There’s a few more bits of data we think we can pull out of the camera trap photos this time around, in addition to all the great information we already get. One thing we’re particularly interested in is the occurrence of fire. Now, fire is no fun for camera traps (because they tend to melt), but these wildfires are incredibly important to the cycle of ecosystem functioning in Serengeti. Burns refresh the soil and encourage new grass growth, which attracts herbivores and may in turn draw in the predators. We have added a fire checkbox for you to tick if things look hot. Now, because we’re looking for things other than just animals, we replaced your option to click on “nothing there” with “no animals visible“, just to avoid confusion.

Some of the more savvy creature-identifiers among you may have noticed that there are a few Serengeti animals that wander into our pictures that we didn’t have options for. For this new season, we’ve added six new animal choices: duiker, steenbok, cattle, bat, insect/spider, and vultures. Keep an eye out for the following:



This season runs all the way from September 2013 until July 2014, when I retrieved them this summer, my first field season. Our field assistants, Norbert and Daniel, were invaluable (and inhumanly patient) in helping me learn to navigate the plains, ford dry river beds, and avoid, as much as possible, driving the truck into too many holes. Together, we set out new cameras, patched up some holes in our camera trap grid, and spent some amazing nights camped out in the bush.

Once I got the hang of the field, I spend my mornings running around to a subset of the cameras conducting a pilot playback experiment to see if I could artificially “elevate” the predation risk in an area by making it seem as though it were frequented by lions (I’m interested in the reactions of the lion’s prey, and to see whether they change their behaviors in these areas and how long it takes them to go back to normal). I’m more than a bit camera-shy (and put a lot of effort into carefully sneaking up around the cameras’ blind spots) but perhaps you’ll catch a rare glimpse of me waving my bullhorn around blaring lion roars…

Back in the lab, there’s been a multi-continental collaboration to get these data cleaned up and ready for identification. We’ve been making some changes to the way we store our data, and the restructuring, sorting, and preparing process has been possible only through the substantial efforts of Margaret, over here with me in the States, and Ali, all the way across the pond, running things from the Zooniverse itself!

But for now, our hard work on this season is over – it’s your turn! Dig in!

P.S. Our awesome developers have added some fancy code, so the site looks great even on small phone and tablet screens. Check it out!

More results!

As I’m writing up my dissertation (ahh!), I’ve been geeking out with graphs and statistics (and the beloved/hated stats program R). I thought I’d share a cool little tidbit.

Full disclosure: this is just a bit of an expansion on something I posted back in March about how well the camera traps reflect known densities. Basically, as camera traps become more popular, researchers are increasingly looking for simple analytical techniques that can allow them to rapidly process data. Using the raw number of photographs or animals counted is pretty straightforward, but is risky because not all animals are equally “detectable”: some animals behave in ways that make them more likely to be seen than other animals. There are a lot of more complex methods out there to deal with these detectability issues, and they work really well — but they are really complex and take a long time to work out. So there’s a fair amount of ongoing debate about whether or not raw capture rates should ever be used even for quick and dirty rapid assessments of an area.

Since the Serengeti has a lot of other long term monitoring, we were able to compare camera trap capture rates (# of photographs weighted by group size) to actual population sizes for 17 different herbivores. Now, it’s not perfect — the “known” population sizes reflect herbivore numbers in the whole park, and we only cover a small fraction of the park. But from the graph below, you’ll see we did pretty well.


Actual herbivore densities (as estimated from long-term monitoring) are given on the x-axis, and the # photographic captures from our camera survey are on the y-axis. Each species is in a different color (migratory animals are in gray-scale). Some of the species had multiple population estimates produced from different monitoring projects — those are represented by all the smaller dots, and connected by a line for each species. We took the average population estimate for each species (bigger dots).

We see a very strong positive relationship between our photos and actual population sizes: we get more photos for species that are more abundant. Which is good! Really good! The dashed line shows the relationship between our capture rates and actual densities for all species. We wanted to make sure, however, that this relationship wasn’t totally dependent on the huge influx of wildebeest and zebra and gazelle — so we ran the same analysis without them. The black line shows that relationship. It’s still there, it’s still strong, and it’s still statistically significant.

Now, the relationship isn’t perfect. Some species fall above the line, and some below the line. For example, reedbuck and topi fall below the line – meaning that given how many topi really live in Serengeti, we should have gotten more pictures. This might be because topi mostly live in the northern and western parts of Serengeti, so we’re just capturing the edge of their range. And reedbuck? This might be a detectability issue — they tend to hide in thickets and so might not pass in front of cameras as often as animals that wander a little more actively.

Ultimately, however, we see that the cameras do a good overall job of catching more photos of more abundant species. Even though it’s not perfect, it seems that raw capture rates give us a pretty good quick look at a system.

Lions and cheetahs and dogs, oh my! (final installment)

I’ve written a handful of posts (here and here and here) about how lions are big and mean and nasty…and about how even though they are nasty enough to keep wild dog populations in check, they don’t seem to be suppressing cheetah numbers.

Well, now that research is officially out! It’s just been accepted by the Journal of Animal Ecology and is available here. Virginia Morrell over at ScienceNews did a nice summary of the story and it’s conservation implications here.

One dissertation chapter down, just two more to go!




What we’ve seen so far, Part IV

Last week I wrote about using really simple approaches to interpret camera trap data. Doing so makes the cameras a really powerful tool that virtually any research team around the world can use to quickly survey an ecosystem.

Existing monitoring projects in Serengeti give us a really rare opportunity to actually validate our results from Snapshot Serengeti: we can compare what we’re seeing in the cameras to what we see, say, from radio-tracking collared lions, or to the number of buffalo and elephants counted during routine flight surveys.

Ingela scanning for lions from the roof of the car.

Ingela scanning for lions from the roof of the car.

One of the things we’ve been hoping to do with the cameras is to use them to understand where species are, and how those distributions change. As you know, I’ve struggled a bit with matching lion photographs to known lion ranging patterns. Lions like shade, and because of that, they are drawn to camera traps on lone, shady trees on the plains from miles and miles away.

But I’ve finally been able to compare camera trap captures to know distributions for other animals. Well, one other animal: giraffes.  From 2008-2010, another UMN graduate student, Megan Strauss, studied Serengeti giraffes and recorded where they were. By comparing her data with camera trap data, we can see that the cameras do okay.

The graph below compares camera trap captures to known densities of giraffes and lions. Each circle represents a camera trap; the bigger the circle, the more photos of giraffes (top row) or lions (bottom row). The background colors reflect known relative densities measured from long-term monitoring: green means more giraffes or lions; tan/white means fewer. For giraffes, on the whole, we get more giraffe photos in places that have more giraffes. That’s a good sign. The scatterplot visualizes the map in a different way, showing the number of photos on the y-axis vs. the known relative densities on the x-axis.



What we see is that cameras work okay for giraffes, but not so much for lions. Again, I suspect that this has a lot to do with the fact that lions are incredibly heat stressed, and actively seek out shade (which they then sleep in for 20 hours!). But lions are pretty unique in their extreme need for shade, so cameras probably work better for most other species. We see the cameras working better for giraffes, which is a good sign.

We’ve got plans to explore this further. In fact, Season 7 will overlap with a wildebeest study that put GPS collars on a whole bunch of migratory wildebeest. For the first time, we’ll be able to compare really fine scale data on the wildebeest movements to the camera trap photos, and we can test even more precisely just how well the cameras work for tracking large-scale animal movements.  Exciting!