As we prepare to launch Season 7 (yes! it’s coming soon! stay tuned!), I thought I’d share with you some things we’ve seen in seasons 1-6.
Snapshot Serengeti is over a year old now, but the camera survey itself has been going on since 2010; you guys have helped us process three years of pictures to date!
First, of the >1.2 million capture events you’ve looked through, about two-thirds were empty. That’s a lot of pictures of grass!
But about 330,000 photos are of the wildlife we’re trying to study. A *lot* of those photos are of wildebeest. From all the seasons so far, wildebeest made up just over 100,000 photos! That’s nearly a third of all non-empty images altogether.
We also get a lot of zebra and gazelle – both of which hang out with the wildebeest as they migrate across the study area. We also see a lot of buffalo, hartebeest, and warthog — all of which lions love to eat.
We also get a surprising number of photos of the large carnivores. Nearly 5,000 hyena photos! And over 4,000 lion photos! (Granted, for lions, many of those photos are of them just lyin’ around.)
Curious what else? Check out the full breakdown below…
As Meredith mentioned last week, she, Craig, and I are counting down the days until we head out to sunny California for an academic conference. I am really looking forward to above-zero temperatures. I am rather less enthused about the prospect of presenting a poster. Yes, it is good networking. Yes, I get to personally advertise results from a study that are currently in review at a journal (and hopefully will be published “soon”). Yes, I get to engage with brilliant minds whose research I have read forward, backwards, and sideways. Despite all of that, I’m still not excited.
Poster-ing is perhaps the most awkward component of an academic conference. Academics are not known for their mingling skills. Add to that the inherent awkwardness of having to lurk like an ambush predator by your poster while fellow ever-so-socially-savvy scientists trudge through the narrow aisle ways, trying to sneak non-committal glances at figures and headings without pausing long enough for the poster-presenter to pounce with their “poster spiel.” For the browsers who do stop and study your poster, you have stand there pretending that you aren’t just standing there breathing down their necks while they try to read your poster until they decide that a) this is really interesting and they want to talk to you, or b) phew that was close, they almost got roped into having to talk to you about something they know/care nothing about. Most conferences have figure out that poster sessions are a lot less painful if beer is served.
Working with big, fuzzy animals means that I usually get a pretty decent sized crowd at my posters. About half of those people want to ask me about job opportunities or to tell me about the time that they worked in a wildlife sanctuary and got to hug a lion and do I get to hug lions when I’m working? I once had a pleistocene re-wilding advocate approach me for advice on – no joke – introducing African lions into suburban America. But they aren’t all bad. I’ve met a number of people in poster sessions who have gone on to become respected colleagues and casual friends. I’ve met faculty members whose labs I am now applying to for post-doctoral research positions. And I’ve learned how to condense a 20-page paper into a 2 minute monologue — which is a remarkably handy skill to have.
As much as I gripe and grumble about poster sessions, I know they’re good for me. At least with this one, I’ll be close to the beach!!
Below is a copy of my (draft) poster for the upcoming Gordon Research Conference that a chunk of the Snapshot Serengeti team will be at. It’s mostly on data outside of Snapshot Serengeti, but you might find it interesting nonetheless! (Minor suggestions and typo corrections welcome! I know I still have to add a legend or two…)
I have successfully survived the trials and tribulations of my first semester of graduate school! Huzzah! That being said, a student’s work is never done – you can still find me sitting in my office, plugging away at data and up to my eyeballs in pdfs and textbooks. Although it certainly helps when I know that, in a few short weeks, I’ll be showing off my preliminary data on a nice warm beach in California. Well, the Gordon Research Conference that Ali and I will both be attending will probably not be held directly ON the beach, but it’s a nice fantasy to have when your fingers are freezing off in Minnesota.
The theme of the conference is predator-prey interactions, but approached from a very interdisciplinary standpoint. Topics range from genes and the causes of childhood anxiety up through ecosystems, evolution, and Craig’s presentation on man-eating lions. It’s been over a year since I last attended a conference, and it’s going to be intimidating and inspiring to meet the Who’s Who in our field. All the papers piled up around my desk, underlined and annotated and thoroughly mulled over? Hopefully I’ll have a chance to chat with their authors in person and get these scientists’ input on the direction of my current research ideas.
My particular focus, predator intimidation (“fear”), is delightfully billed in the conference descriptions as “the persistent threat of immediate violent death.” The blurb continues on to state that “most wild animals are in peril every moment of every day of being torn limb from limb by any number of predators.” Language far more colorful that I can get away with in most of my proposals, but certainly right on point! There will be talks on fear’s impacts on evolutionary ecology and population- and ecosystem-level processes as well as about the effect of predators as stressors that I’m am particularly keen to attend.
As excited as I am, I’m honestly a bit frantic trying to synthesize our Snapshot data to produce distribution graphs and other basic preliminary results. A few months ago, I couldn’t have programmed my own name into “R” – the bread and butter statistical program of beloved (well, it’s a bittersweet relationship) by biologists. With long evenings in front of the computer and by the generous grace and goodwill of Ali, I’ve been making progress. Ideally, I would like to show up to this conference with not only an outline of my research to be picked apart by the aforementioned greatest minds in the field, but also with maps of the monthly distributions of several herbivore species in relation the changing vegetative landscape and predator movements. No breakthroughs so far; I foresee a great deal of coffee in my future between now and January…
P.S. Congrats to Margaret for defending her PhD!!!
I’ve got to echo Margaret’s apology for our sporadic blog posts lately. Things have been a bit hectic for all of us — Dr (!!!) Margaret Kosmala is finishing up her dissertation revisions and moving on to an exciting post-doctoral position at Harvard, our latest addition, Meredith, is finishing up her first semester (finals! ah!), and I’m knee deep in analyses (and snow!).
So,\ please bear with us through the craziness and rest assured that we’ll pick up the blog posts again after the holidays. In the meanwhile, I’ll show you something that got me really excited last week. (Warning: this involves graphs, not cute pictures.)
Last week, I was summarizing some of the Snapshot Serengeti data to present to my committee members. (My committee is the group of faculty members that eventually decide whether my research warrants a PhD, so holding these meetings is always a little nerve-wracking.) As a quick summary, I made this graph of the total number of photographs of the top carnivores. Note that I’m currently only working with data from Seasons 1-3, since we’re having trouble with the timestamps from Seasons 4-6, so the numbers below are about half of what I’ll eventually be able to analyze.
The height of each bar represents the total number of pictures for each species. The color of the bar reflects whether or not a sighting is “unique” or “repeat.” Repeated sightings happen when an animal plops down in front of the camera for a period of time, and we get lots and lots of photos of it. This most likely happens when animals seek out shade to lie in. Notice that lions have wayyyy more repeated sightings percentage-wise than other species. This makes sense — while we do occasionally see cheetahs and hyenas conked out in front of a well-shaded camera, this is a much bigger issue for lions.
I also dived a little deeper into the temporal patterns of activity for each species. The next graph shows the number of unique camera trap captures of each species for every hour of the day. See the huge spike in lion photos from 10am-2pm? It’s weird, right? Lions, like the other carnivores, are mostly nocturnal….so why are there so many photos of them at midday? Well, these photos are almost always lions who have wandered over for a well-shaded naptime snoozing spot. While there are a fair number of cheetahs who seem to do this too, it doesn’t seem to be as big of a deal for hyenas or leopards.
Why is this so exciting? Well, recall how I’ve repeatedly lamented about the way shade biases camera trap captures of lions? Because lions are so drawn to nice, shady trees, we get these camera trap hotspots that don’t match up with our lion radio-collar data. The map below shows lion densities, with highest densities in green, and camera traps in circles. The bigger the circle, the more lions were seen there.
The “lion hotspots” in relatively low density lion areas have been driving me mad all year. These are nice, shady trees that lions are drawn to from up to several kilometers away, and I’ve been struggling to reconcile the lion radio-collar data with the camera trapping data.
What the graphs above suggest, though, is that there likely to be much less bias for hyenas and leopards. Lions are drawn to shade, because they are big and bulky and easily overheated. We see this in the data in the form of many repeated sightings (indicating that lions like to lie down in one spot for hours) and in the “naptime spike” in the timing of camera trap captures that suggest lions seeking out shade trees to go to. Although this remains a bit of an issue for cheetahs, what the graphs above suggest is that using camera traps to understand hyena and leopard activity will be much less biased and much more straightforward — ultimately, much easier than it is for lions. And this is really good news for me.
Last week I posted an animated GIF of hourly carnivore sightings. To clarify, the map showed patterns of temporal activity across all days over the last 3 years — so the map at 9am shows sites where lions, leopards, cheetahs, and hyenas like to be in general at that time of day (not on any one specific day).
These maps here actually show where the carnivores are on consecutive days and months (the dates are printed across the top). [For whatever reason, the embedded .GIFs hate me; click on the map to open in a new tab and see the animation!]
Keep in mind that in the early days (June-Sept 2010) we didn’t have a whole lot of cameras on the ground, and that the cameras were taken down from Nov 2010-Feb 2011 (so that’s why those maps are empty).
The day-by-day map is pretty sparse, and in fact looks pretty random. The take-home message for this is that lions, hyenas, cheetahs, and leopards are all *around*, but the chances of them walking past a camera on any given day are kinda low. I’m still trying to find a pattern in the monthly distributions below.
So this is what I’ve been staring at in my turkey-induced post-Thanksgiving coma. Could be worse!
Truth be told, I *have* been working on data analysis from the start. It’s actually one of my favorite parts of research — piecing together the story from all the different puzzle pieces that have been collected over the years.
But right now I am knee-deep in taking a closer look at the camera trap data. Since we have *so* many cameras taking pictures every day I want to look at where the animals are not just overall, but from day to day, hour to hour. I’m not 100% sure what analytical approaches are out there, but my first step is to simply visualize the data. What does it look like?
So I’ve started making animations within the statistical programming software R. Here’s one of my first ones (stay tuned over the holidays for more). Each frame represents a different hour on the 24 hour clock: 0 is midnight, 12 is noon, 23 is 11pm, etc. Each dot is sized proportionally to the number of captures of that species at that site at that time of day. The dots are set to be a little transparent so you can see when sites are hotspots for multiple species. [*note: if the .gif isn't animating for you in the blog, try clicking on it so it opens in a new tab.]
Deep breath; I promise it will be okay.
By now, many of you have probably seen the one image that haunts your dreams: the backlit photo of the towering acacia that makes the wildebeest in front look tiny, with those two terrible words in big white print across the front — “We’re Done!” Now what are you going to do when you drink your morning coffee?? Need a break from staring at spreadsheets?? Are in desperate need of an African animal fix?? Trust me, I know the feeling.
Deep breath. (And skip to the end if you can’t wait another minute to find out when you can ID Snapshot Serengeti animals again.)
I have to admit that as a scientist using the Snapshot Serengeti data, I’m pretty stoked that Seasons 5 and 6 are done. I’ve been anxiously watching the progress bars inch along, hoping that they’d be done in time for me to incorporate them in my dissertation analyses that I’m finally starting to hash out. Silly me for worrying. You, our Snapshot Serengeti community, have consistently awed us with how quickly you have waded through our mountains of pictures. Remember when we first launched? We put up Seasons 1-3 and thought we’d have a month or so to wait. In three days we were scrambling to put up Season 4. This is not usually the problem that scientists with big datasets have!
Now that Seasons 5 and 6 are done, we’ll download all of the classifications for every single capture event and try to make sense of them using the algorithms that Margaret’s written about here and here. We’ll also need to do a lot of data “cleaning” — fixing errors in the database. Our biggest worry is handling incorrect timestamps — and for whatever reason, when a camera trap gets injured, the time stamps are the first things to malfunction (usually shuttling back to 1970 or into the futuristic 2029). It’s a big data cleaning problem for us. First, one of the things we care about is when animals are at different sites, so knowing the time is important. But also, many of the cameras are rendered non-functional for various reasons - meaning that sometimes a site isn’t taking pictures for days or even weeks. To properly analyze the data, we need to line up the number of animal captures with the record of activity, so we know that a record of 0 lions for the week really means 0 lions, and not just that the camera was face down in the mud.
So, we now have a lot of work in front of us. But what about you? First, Season 7 will be on its way soon, and we hope to have it online in early 2014. But that’s so far away! Yes, so in the meanwhile, the Zooniverse team will be “un-retiring” images like they’ve done in previous seasons. This means that we’ll be collecting more classifications on photos that have already been boxed away as “done.” Especially for the really tricky images, this can help us refine the algorithms that turn your classifications into a “correct answer.”
But there are also a whole bunch of awesome new Zooniverse projects out there that we’d encourage you to try in the meanwhile. For example, this fall, Zooniverse launched Plankton Portal, which takes you on a whole different kind of safari. Instead of identifying different gazelles by the white patches on their bums, you identify different species of plankton by their shapes. Although plankton are small, they have big impacts on the system — as the Plankton Portal scientists point out on their new site, “No plankton = No life in the ocean.”
Wherever you choose to spend your time, know that all of us on the science teams are incredibly grateful for your help. We couldn’t do this without you.
Last week, william garner asked me in the comments to my post ‘Better with experience’ how well the experts did on the about 4,000 images that I’ve been using as the expert-identified data set. How do we know that those expert-identifications are correct?
Here’s how I put together that expert data set. I asked a set of experts to classify images on snapshotserengeti.org — just like you do — but I asked them to keep track of how many they had done and any that they found particularly difficult. When I had reports back that we had 4,000 done, I told them that they could stop. Since the experts were reporting back at different times, we actually ended up doing more than 4,000. In fact, we’d done 4,149 sets of images (captures), and we had 4,428 total classifications of those 4,149 captures. This is because some experts got the same capture.
Once I had those expert classifications, I compared them with the majority algorithm. (I hadn’t yet figured out the plurality algorithm.) Then I marked (1) those captures where experts and the algorithm disagreed, and (2) those captures that experts had said were particularly tricky. For these marked captures, I went through to catch any obvious blunders. For example, in one expert-classified capture, the expert classified the otherBirds in the images, but forgot to classify the giraffe the birds were on! The rest of these marked images I sent to Ali to look at. I didn’t tell her what the expert had marked or what the algorithm said. I just asked her to give me a new classification. If Ali’s classification matched with either the algorithm or the expert, I set hers as the official classification. If it didn’t, then she, and Craig, and I examined the capture further together — there were very few of these.
And that is how I came up with the expert data set. I went back this week to tally how the experts did on their first attempt versus the final expert data set. Out of the 4,428 classifications, 30 were marked as ‘impossible’ by Ali, 1 was the duiker (which the experts couldn’t get right by using the website), and 101 mistakes were made. That makes for a 97.7% rate of success for the experts. (If you look at last week’s graph, you can see that some of you qualify as experts too!)
Okay, and what did the experts get wrong? About 30% of the mistakes were what I call wildebeest-zebra errors. That is, there are wildebeest and zebra, but someone just marks the wildebeest. Or there are only zebra, and someone marks both wildebeest and zebra. Many of the wildebeest and zebra herd pictures are plain difficult to figure out, especially if animals are in the distance. Another 10% of the mistakes were otherBird errors — either someone marked an otherBird when there wasn’t really one there, or (more commonly) forgot to note an otherBird. About 10% of the time, experts listed an extra animal that wasn’t there. And another 10% of the time, they missed an animal that was there. Some of these were obvious blunders, like missing a giraffe or eland; other times it was more subtle, like a bird or rodent hidden in the grass.
The other 40% of the time were mis-identifications of the species. I didn’t find any obvious patterns to where the mistakes were; here are the species that were mis-identified:
|wildebeest||6||buffalo, hartebeest, elephant, lionFemale|
|hartebeest||5||gazelleThomsons, impala, topi, lionFemale|
|gazelleGrants||4||impala, gazelleThomsons, hartebeest|
|reedbuck||3||dikDik, gazelleThomsons, impala|
Does experience help with identifying Snapshot Serengeti images? I’ve started an analysis to find out.
I’m using the set of about 4,000 expert-classified images for this analysis. I’ve selected all the classifications that were done by logged-in volunteers on the images that had just one species in them. (It’s easier to work with images with just one species.) And I’ve thrown out all the images that experts said were “impossible.” That leaves me with 68,535 classifications for 4,084 images done by 5,096 different logged-in volunteers.
I’ve counted the number of total classifications each volunteer has done and given them a score based on those classifications. And then I’ve averaged the scores for each group of volunteers who did the same number of classifications. And here are the results:
Here we have the number of classifications done on the bottom. Note that the scale is a log scale, which means that higher numbers get grouped closer together. We do this so we can more easily look at all the data on one graph. Also, we expect someone to improve more quickly with each additional classification at lower numbers of classifications.
On the left, we have the average score for each group of volunteers who did that many classifications. So, for example, the group of people who did just one classification in our set had an average score of 78.4% (black square on the graph). The group of people who did two classifications had an average score of 78.5%, and the group of people who did three classifications had an average score of 81.6%.
Overall, the five thousand volunteers got an average score of 88.6% correct (orange dotted line). Not bad, but it’s worth noting that it’s quite a bit lower than the 96.6% that we get if we pool individuals’ answers together with the plurality algorithm.
And we see that, indeed, volunteers who did more classifications tended to get a higher percentage of them correct (blue line). But there’s quite a lot of individual variation. You can see that despite doing 512 classifications in our set, one user had a score of only 81.4% (purple circle). This is a similar rate of success as you might expect for someone doing just 4 classifications! Similarly, it wasn’t the most prolific volunteer who scored the best; instead, the volunteer who did just 96 classifications got 95 correct, for a score of 99.0% (blue circle).
We have to be careful, though, because this set of images was drawn randomly from Season 4, and someone who has just one classification in our set could have already classified hundreds of images before this one. Counting the number of classifications done before the ones in this set will be my task for next time. Then I’ll be able to give a better sense of how the total number of classifications done on Snapshot Serengeti is related to how correct volunteers are. And that will give us a sense of whether people learn to identify animals better as they go along.