Data from Seasons 1, 2, and 3
Last week Michael Parrish sent me all your classifications for Seasons 1, 2, and 3. At 4,374,368 classifications, it’s going to take me a while to fully analyze them. Nevertheless, I’ve taken a first look through and am happy to give you some feedback.
Snapshot Serengeti volunteers classified 512,585 capture events. (We call a set of images a “capture event,” regardless of whether it consists of 1 or 3 images.) Of these capture events, 30% were from Season 1, 40% from Season 2, and 30% from Season 3. Based on your classifications, 72% of these capture events were “nothing here” and less surprisingly, Season 1 had the highest share of “nothing here” images. Season 1 was when Ali was still trying to figure out how to animal-proof the cameras and plenty of cameras got knocked off trees. I still have to double-check accuracy for these “nothing here” images, but suffice it to say that you guys classified a lot of blowing grass. Thanks for your perseverance!
And what about the Snapshot Serengeti community itself? I want to preface this by saying that in the data I get, all volunteers have been anonymized. That is, each user name has been replaced by a gibberish string of letters and numbers, so I don’t know who is who. I can tell you that we have 14,352 volunteers who created a user name. They provided us with 84% of the classifications; the rest were done by people who didn’t create – or hadn’t yet created – user names.
The median number of capture events classified by each logged-in volunteer was 63. I find that pretty awesome. In case you need a refresher on what the median is: imagine we put all 14,352 Snapshot Serengeti volunteers in a line according to how many capture events they had classified. Those that made just 1 classification would be on the far left end, and those that had classified thousands of capture events would be on the far right end. Then we would find the volunteer in the very middle of this line; she would be the 7176th volunteer from the left (7176 is half of 14,352). And we would ask how many classifications she had made. The answer would be 63; that is, half of all volunteers (on the left) made fewer than 63 classifications and half (on the right) made more than 63 classifications. Sixty-three classifications is no small number; you’ve got to be sitting there a while to do that many, and yet over 7,000 different people did so. Wow.
The most number of capture events made by one volunteer? 8,431. That’s just for Seasons 1, 2, and 3, so I’m betting that number is higher now that Season 4 is underway. The 5,000 Club is pretty exclusive: 23 of you classified more than 5,000 capture events in Seasons 1 through 3. The 1,000 Club has 829 members. And an astounding 5,777 people classified more than 100 capture events.
I continue to be amazed and humbled by your dedication to this project. Thank you.
Thank you so much, Margaret, for having the creativity to think this up. I feel privileged to have such a window into the world of wild creatures that I wouldn’t otherwise have had. As much as I enjoy the prettied up National Geographic wildlife photos, I love seeing the unedited versions even more. It’s like the difference between seeing someone all dressed up for a photo shoot vs. hanging out in the house in their PJs…
This has been one of the most fun things I have ever done on the internet! Thanks to _you_ Margeret, and Ali, and everyone else who let’s us get involved!
What software do you use for storing and analyzing this data?
What about this project has turned out different than you expected?
The data is a simple comma-delimited text file currently. I’ll be importing it into a SQL database. I did the quick analysis for this blog post using JMP (www.jmp.com), but it took a *long* time to crunch through all the data. I’m planning on writing perl scripts to do more complex analysis.
I was blown away by how fast all the images from Seasons 1 through 3 were classified; I had thought it would take weeks to months. I am amazed at how many images of nothing-but-grass people are willing to put up with in order to classify animals. I’m sure there will be more surprises as I did deeper into the data.
> how many images of nothing-but-grass people are willing to put up with in order to classify animals.
I’ll bet there’s a sociology dissertation waiting in that statement too – something about how we’re getting tired of instant gratification and artistic perfection. If I just wanted nice pictures of elephants and lions, Google Image will give me all I want in seconds. Instead I’ll spend hours looking at grass and indistinct “bird (other) – moving” because the next image just might be the very tip of an impala’s horn, or enough spots to figure it’s either a cheetah or a serval but the tell-tale ears are maddeningly just outside the frame.
This is a both a treat and an opportunity for us. Yes, there is a lot or grass and maybe too many wildebeests sometimes but then there is the lion sitting there, or a zebra rolling in the dirt but it brings me back to my initial intro to African Wildlife with SA friends, and then Kruger and Chobe (where we were chased by hippo). The opportunity lies in helping you learn about the animals and perhaps find help for that poor lion with the raw neck or at least record what has happened.
You may have created no rest for the weary for yourselves feeding our curiosity.
Thanks for the update and
Thank you for making it possible for us to share in your research.
Is the (anonymized) user-generated data available someplace on Zooniverse (or elsewhere)? It’d be nice to be see all the data we generated.
We’re working on putting together some visual tools so that everyone can look at the generated data in a map context.