I hope you’ve been having fun with the new Season 5 images. I have. It’s been about a week since we went live with Season 5, and we’re making good progress. It took under two weeks to go through the first three seasons in December. (We had some media attention then and lots of people checking out the site.) It took about three weeks to finish Season 4 in January. According to my super science-y image copy-and-paste method, it may take us about two months to do Season 5:
And that’s fine. But I was curious about who’s working on Season 5. The Talk discussion boards are particularly quiet, with almost no newbie questions. So is everyone working on Season 5 a returnee? Or do we have new folks on board?
I looked at the user data from a data dump done on Sunday. So it includes the first 5 or so days of Season 5. In total, there are 2,000 volunteers who had contributed to 280,000 classifications by Sunday! I was actually quite amazed to see that 6% of the classifications are being done by folks not logged in. Is that because they’re new people trying out the site — or because there are some folks who like to classify without logging in? I can’t tell.
But I can compare Season 5 to Season 4. We had 8,300 logged-in volunteers working on Season 4. Of all the classifications, 9% were done by not-logged-in folks. That suggests we have fewer newcomers so far for Season 5. But then we get to an intriguing statistic: of those 2,000 volunteers working on Season 5 in its first five days, 33% of them did not work on Season 4 at all! And those 33% apparently new folks have contributed 50% of the (logged-in) classifications!
So what’s going on? Maybe we’re getting these new volunteers from other Zooniverse projects that have launched since January. Maybe they’re finding us in other ways. (Have you seen that the site can be displayed in Finnish in addition to Polish now?) But in any case, welcome everyone and I hope you spot your favorite animal.
Me, I found this super cute baby elephant just the other day:
On Wednesday, I wrote about how well the simple algorithm I came up with does against the experts. The algorithm looks for species that have more than 50% of the votes in a given capture (i.e. species that have a majority). Commenter Tor suggested that I try looking at which species have the most votes, regardless of whether they cross the 50% mark (i.e. a plurality). It’s a great idea, and easy to implement because any species that has more than 50% of the vote ALSO has the plurality. Which means all I have to do is look at the handful of captures that the majority algorithm had no answer for.
You can see why it might be a good idea in this example. Say that for a particular capture, you had these votes:
You’d have 21 votes total, but the leading candidate, impala, would be just shy of the 11 needed to have a majority. It really does seem like impala is the likely candidate here, but my majority algorithm would come up with “no answer” for this capture.
So I tried out Tor’s plurality algorithm. The good news is that 57% of those “no answers” got the correct answer with the plurality algorithm. So that brings our correct percentage from 95.8% to 96.6%. Not bad! Here’s how that other 3.4% shakes out:
So now we have a few more errors. (About a quarter of the “no answers” were errors when the plurality algorithm was applied.) And we’ve got a new category called “Ties”. When you look for a plurality that isn’t over 50%, there can be ties. And there were. Five of them. And in every case the right answer was one of the two that tied.
And now, because it’s Friday, a few images I’ve stumbled upon so far in Season 5. What will you find?
Recently, I’ve been analyzing how good our simple algorithm is for turning volunteer classifications into authoritative species identifications. I’ve written about this algorithm before. Basically, it counts up how many “votes” each species got for every capture event (set of images). Then, species that get more than 50% of the votes are considered the “right” species.
To test how well this algorithm fares against expert classifiers (i.e. people who we know to be very good at correctly identifying animals), I asked a handful of volunteers to classify several thousand randomly selected captures from Season 4. I stopped everyone as soon as I knew 4,000 captures had been looked at, and we ended up with 4,149 captures. I asked the experts to note any captures that they thought were particularly tricky, and I sent these on to Ali for a final classification.
Then I ran the simple algorithm on those same 4,149 captures and compared the experts’ species identifications with the algorithm’s identifications. Here’s what I found:
For a whopping 95.8% of the captures, the simple algorithm (due to the great classifying of all the volunteers!) agrees with the experts. But, I wondered, what’s going on with that other 4.2%. So I had a look:
Of the captures that didn’t agree, about 30% were due to the algorithm coming up with no answer, but the experts did. This is “No answer” in the pie chart. The algorithm fails to come up with an answer when the classifications vary so much that there is no single species (or combination if there are multiple species in a capture) that takes more than 50% of the vote. These are probably rather difficult images, though I haven’t looked at them yet.
Another small group — about 15% of captures was marked as “impossible” by the experts. (This was just 24 captures out of the 4,149.) And five captures were both marked as “impossible” and the algorithm failed to come up with an answer; so in some strange way, we might consider these five captures to be in agreement.
Just over a quarter of the captures didn’t agree because either the experts or the algorithm saw an extra species in a capture. This is labeled as “Subset” in the pie chart. Most of the extra animals were Other Birds or zebras in primarily wildebeest captures or wildebeest in primarily zebra captures. The extra species really is there, it was just missed by the other party. For most of these, it’s the experts who see the extra species.
Then we have our awesome, but difficulty-causing duiker. There was no way for the algorithm to match the experts because we didn’t have “duiker” on the list of animals that volunteers could choose from. I’ve labeled this duiker as “New animal” on the pie chart.
Then the rest of the captures — just over a quarter of them — were what I’d call real errors. Grant’s gazelles mistaken for Tommies. Buffalo mistaken for wildebeest. Aardwolves mistaken for striped hyenas. That sort of thing. They account for just 1.1% of all the 4,149 captures.
I’ve given the above Non-agreement pie chart some hideous colors. The regions in purple are what scientists call Type II errors, or “false negatives.” That is, the algorithm is failing to identify a species that we know is there — either because it comes up with no answer, or because it misses extra species in a capture. I’m not too terribly worried about these Type II errors. The “Subset” ones happen mainly with very common animals (like zebra or wildebeest) or animals that we’re not directly studying (like Other Birds), so they won’t affect our analyses. The “No answers” may mean we miss some rare species, but if we’re analyzing common species, it won’t be a problem to be missing a small fraction of them.
The regions in orange are a little more concerning; these are the Type I errors, or “false positives.” These are images that should be discarded from analysis because there is no useful information in them for the research we want to do. But our algorithm identifies a species in the images anyway. These may be some of the hardest captures to deal with as we work on our algorithm.
And the red-colored errors are obviously a concern, too. The next step is to incorporate some smarts into our simple algorithm. Information about camera location, time of day, and identification of species in captures immediately before or following a capture can give us additional information to try to get that 4.2% non-agreement even smaller.
In short, delay.
In long, we’ve processed all the images and are uploading them onto the Zooniverse servers. However, it’s taking a long time. A really long time. Since Season 4, the Minnesota Supercomputer Institute (MSI) has switched over to a new system, and it seems like the upload time from this new system is painfully slow. We’ve uploaded over 25% of the images, but it’s taken a couple days uploading non-stop. So best estimate is mid to late next week for when they’ll all be uploaded. We’re trying to coordinate with the staff at MSI to see if they can increase upload speeds for us, but no guarantees.
(Man, I wish we had some images of turtles or snails or sloths or something from Serengeti… Wait! I know what’s slow — stationary, actually.)
Meanwhile, you can read a guest blog post that I wrote over at Dynamic Ecology. Dynamic Ecology is read by ecologists, so my blog post introduces the concept of citizen science (and Snapshot Serengeti, of course) to professional ecologists who may not be very familiar with it. One question that comes up in the comments is: can you do citizen science if you don’t have cool, awesome animals? Like, what if you have flies or worms or plankton instead? I think the answer is yes. But feel free to give your perspectives in the comments there, too.
I’m working on an analysis that compares the classifications of volunteers at Snapshot Serengeti with the classifications of experts for several thousand images from Season 4. This analysis will do two things. First, it will give us an idea of how good (or bad) our simple vote-counting method is for figuring out species in pictures. Second, it will allow us to see if more complicated systems for combining the volunteer data work any better. (Hopefully I’ll have something interesting to say about it next week.)
Right now I’m curating the expert classifications. I’ve allowed the experts to classify an image as “impossible,” which, I know, is totally unfair, since Snapshot Serengeti volunteers don’t get that option. But we all recognize that for some images, it really isn’t possible to figure out what the species is — either because it’s too close or too far or too off the side of the image or too blurry or …. The goal is that whatever our combining method is, it should be able to figure out “impossible” images by combining the non-”impossible” classifications of volunteers. We’ll see if we can do it.
Another challenge that I’m just running into is that our data set of several thousand images contains a duiker. A what? A common duiker, also known as a bush duiker:
You’ve probably noticed that “duiker” is not on the list of animals we provide. While the common duiker is widespread, it’s not commonly seen in the Serengeti, being small and active mainly at night. So we forgot to include it on the list. (Sorry about that.)
The result is that it’s technically impossible for volunteers to properly classify this image. Which means that it’s unlikely that we’ll be able to come up with the correct species identification when we combine volunteer classifications. (Interested in what the votes were for this image? 10 reedbuck, 6 dik dik, and 1 each of bushbuck, wildebeest(!), and impala.)
The duiker is not the only animal that’s popped up unexpectedly since we put together the animal list and launched the site. I never expected we’d catch a bat on film:
Our friends over at Bat Detective tell us that the glare on the face makes it impossible to truly identify, but they did confirm that it’s a large, insect-eating bat. Anyway, how to classify it? It’s not a bird. It’s not a rodent. And we didn’t allow for an “other” category.
I also didn’t think we’d see insects or spiders.
Moths fly by, ticks appear on mammal bodies, spiders spin webs in front of the camera and even ants have been seen walking on nearby branches. Again, how should they be classified?
And here’s one more uncommon antelope that we’ve seen:
It’s a steenbok, again not commonly seen in Serengeti. And so we forgot to put it on the list. (Sorry.)
Luckily, all these animals we missed from the list are rare enough in our data that when we analyze thousands of images, the small error in species identification won’t matter much. But it’s good to know that these rarely seen animals are there. When Season 5 comes out (soon!), if you run into anything you think isn’t on our list, please comment in Talk with a hash-tag, so we can make a note of these rarities. Thanks!
So there’s good news and there’s bad news. Which would you like first? Good news?
The good news is that the pictures from Season 5 are being processed at the Minnesota Supercomputer Institute right this minute. There are about 900,000 images total, so it will take a few days to process them all. (What are we doing? We’re resizing them, extracting the place and time they were taken, and grouping those that need it into groups of 3.) Then we’ll need to upload them to Zooniverse’s servers. That might take another day or so. If everything goes without a hitch (fingers crossed), we’ll be ready to unleash Season 5 by the end of next week! (So for those of you who wanted some warning, this is your warning. Clear you schedules. Get your work done early. Set up an ‘away’ message on your email…)
The other news is bad, I’m afraid. We just found out that the grant proposal we wrote to the National Science Foundation back in January got turned down. Our grant would have funded Snapshot Serengeti and the Serengeti Lion Project for another five years, and included money for scientists to continue to analyze all the data you’ve been generating by identifying animals in the Snapshot Serengeti images.
Our proposal was reviewed by three other scientists independently and then talked about by a group of scientists who had our proposal and the three reviews to look at. Our three reviews varied. One person thought that our proposal was the most exciting project s/he had read yet this year. But the others were a bit concerned about exactly how we would analyze the data. This proposal was a “pre-proposal,” meaning that we only had a few pages to explain what we wanted to do, how we would do it, why it’s important, and the broader impact we would have. I guess we didn’t manage to get in enough of the “how” for these reviewers.
We were all taken by surprise by the rejection. The Lion Research Center has been reliably funded by the National Science Foundation for decades. But things are changing. Firstly, this “pre-proposal” system is new; it’s only in its second year. And everyone — both proposal writers and proposal reviewers — are still figuring out what exactly should go in the new shorter pre-proposals. And secondly, the Sequester is still in place, so the National Science Foundation has less money to give out this coming year than usual.
In any case, we’re now regrouping to come up with a new funding plan. We’ll be able to apply again to the National Science Foundation in January 2014 to fund camera trapping starting in 2015. And we’ve got several papers that we plan to write in the next six months using Snapshot Serengeti data that we’ll be able to point to to show reviewers that we can properly analyze the data. Meanwhile, we’re going to try to keep the cameras rolling by looking for other funding sources to cover our year-long funding gap. Suggestions welcome.
This past spring, four seniors in the University of Minnesota’s Department of Fisheries, Wildlife, and Conservation Biology took a class called “Analysis of Populations,” taught by Professor Todd Arnold. Layne Warner, Samantha Helle, Rachel Leuthard, and Jessica Bass decided to use Snapshot Serengeti data for their major project in the course.
Their main question was to ask whether the Snapshot Serengeti images are giving us good information about the number of animals in each picture. If you’ve been reading the blog for a while, you know that I’ve been exploring whether it’s possible to correctly identify the species in each picture, but I haven’t yet looked at how well we do with the actual number of animals. So I’m really excited about their project and their results.
Since the semester is winding up, I thought we’d try something that some other Zooniverse projects have done: a video chat*. So here I am talking with Layne, Samantha, and Rachel (Jessica couldn’t make it) about their project. And Ali just got back to Minnesota from Serengeti, so she joined in, too.
Here are examples of the four types of covariates (i.e. potential problems) that the team looked at: Herd, Distance, Period, Vegetation
Herd: animals are hard to count because they are in groups
Distance: animals are hard to count because they are very close to or very far from the camera
Period: animals are hard to count because of the time of day
Vegetation: animals are hard to count because of surrounding vegetation
* This was our first foray into video, so please excuse the wobbly camera and audio problems. We’ll try to do better next time…
Today’s guest blogger is Lucy Hughes. Lucy lived and worked on a private nature reserve in South Africa for four years, carrying out field research that included a camera-trap study into the reserve’s leopard population and twice monthly bird surveys for Cape Town University’s Birds in Reserves Project (BIRP).
Arrhhh, that really hurts! A three inch thorn had just penetrated my, admittedly inadequate, footwear and was stuck deep in the sole of my foot. Thorns are a serious hazard of camera trap placement in the South African bushveld where plants with thorns or hooks seem to make up about 90% of species.
My colleague Michelle ran back to the landy to get a first aid kit whilst I set about extracting the thorn, there seemed to be an awful lot of blood. I watched the path eagerly for Michelle’s return but as she got near she seemed to slow down and as she opened her mouth to speak I knew exactly what she was going to say. “Luce, if it’s not too painful, what about spreading your blood around a bit?”
Callous as it may seem it wasn’t a bad idea. We had been having trouble with capturing clear night shots of leopards. They always seem to be in a hurry and the shots we had were often blurry making it impossible to id the individuals. We needed a way to get the leopards to pause for a second or two in shot of the camera trap.
We had been advised that scent was the answer and were experimenting with various different ones and now it seemed human blood was to be the next test. I dutifully hobbled out in front of the camera and scraped my bleeding foot around on a nice flat rock Michelle had procured, wondering about the sensibleness of using human blood as bait for a predator. My slight discomfort was all in the name of science.
In the end it didn’t work, It rained a couple of nights later and my efforts where washed away. We never did find the perfect scent. We were told that tinned sardines worked wonders as well as catnip and perfume. We tried them all. It seems our cats where immune to these. The only thing that stopped them in their tracks was the scent of other leopards. I did learn however that the scent of tinned sardines was particularly interesting to giraffe of all animals. My method was to bury a plastic cup up to its rim in sand and put a blob of sardines in the cup. Now you would have thought that giraffe would have walked on by but as the picture below testifies, giraffe just have to take a closer look. You always learn something new!
This past week I’ve been reworking a paper about a study with Anna Mosser and Craig. The study asks the question: How did lions come to live in groups? It doesn’t seem like group-living in lions would be something you would spend much time thinking about – until you realize that lions are the only cat that regularly lives in groups. What’s special about lions?
Craig’s work over the past decades has shown that seemingly intuitive ideas about why lions form groups are wrong. Lions don’t form groups in order to hunt more efficiently. Lions don’t form groups to cooperatively nurse their young. Lions don’t form groups to protect young against aggressive outsiders. Instead, it appears that the primary purpose of lion groups is to defend territories against other groups of lions.
So territorial defense appears to be the key to group living in lions. But is territorial defense the only thing that matters? That’s what we set out to investigate. We created a computer model that simulates a bunch of lions living on a landscape. The model is a simplification of what happens in real life, but it contains some essential aspects of lion living.
First, we have complex landscapes. Previous research suggests that group territoriality is more likely in complex landscapes because there are highly desirable areas that are worth defending. If you had a landscape where everything was more or less the same, then you wouldn’t need to fight your neighbor over some small patch of it; you could just wander off and find your own patch that would be more-or-less the same quality as your neighbor’s.
Second, we have various behaviors that we can turn on or off in our simulated lions. For example, we can tell them that they can live together in a territory, but they can’t cooperate to defend it. We can also tell them whether or not they can live in a territory with their parents when they grow up. And we can tell them whether they’re allowed to make their territory bigger if they recruit more lions into their group.
By manipulating the types of landscapes and the various behaviors, we explored how often our simulated lions formed groups. Our results suggest that while territorial defense is important, it’s also important to have complex landscapes with high-value real estate. If the landscape isn’t very complex, then it’s easy enough to find an area to set up a territory without fighting for it. And if the landscape is complex, but doesn’t have any areas with high value, then there’s nothing worth fighting for or defending. It’s also important that lions be able to pass their valuable territories on to their offspring, for without inheritance, the benefits of all that fighting and defending are gone in a generation.
Lions evolved on the savannas of East Africa, where the landscape is complex with patchy areas of high value (near where rivers come together, for example). Humans did too. It’s possible that the same sorts of savanna landscapes that shaped group living and territorial defense for lions did so for people, as well.
At the Zooniverse workshop last week, Philip Brohan (of Old Weather fame) showed me how to produce a cool graphic of volunteer participation. So I put together a couple graphics – one for Season 1 and one for Season 4 – to see if patterns of who does what changed over time.
In these graphics, each square represents one volunteer. And the size of the square shows how many classifications that volunteer did.
Here’s Season 1:
The big blue square is all the volunteers who didn’t create a user account; since I can’t track them without an ID, they all get lumped together. Probably most of the people in this blue square did just a few classifications at most. All together there are just over 15,000 people who created an account represented here. Those that did fewer than 50 classifications each are lumped together under the big blue square. You can see that the majority of the work was done by people who between 50 and 1,000 classifications each. There were another 100 or so volunteers who did over 1,000 classifications in Season 1.
Now here’s Season 4:
This time, it’s the big purple square that represents all the volunteers who didn’t create an account; the square is smaller than in Season 1, which isn’t very surprising. Those folks that don’t log in are generally looking at the site for the first time and we expect more of them when Snapshot Serengeti first started than later on. All together, there are about 7,500 people who created an account and who worked on Season 4 – about half the number of Season 1. The square below the purple square shows all the volunteers who did fewer than 50 classifications. You can see that the majority of the work is being done by our thousands of dedicated fans; about half of all people who worked on Season 4 did more than 50 classifications, and these volunteers accounted for the vast majority of all classifications.
PS. The Zooniverse is launching a new project today: SpaceWarps. Go check it out, while we work on getting Season 5 ready for you.