Last week I posted an animated GIF of hourly carnivore sightings. To clarify, the map showed patterns of temporal activity across all days over the last 3 years — so the map at 9am shows sites where lions, leopards, cheetahs, and hyenas like to be in general at that time of day (not on any one specific day).
These maps here actually show where the carnivores are on consecutive days and months (the dates are printed across the top). [For whatever reason, the embedded .GIFs hate me; click on the map to open in a new tab and see the animation!]
Keep in mind that in the early days (June-Sept 2010) we didn’t have a whole lot of cameras on the ground, and that the cameras were taken down from Nov 2010-Feb 2011 (so that’s why those maps are empty).
The day-by-day map is pretty sparse, and in fact looks pretty random. The take-home message for this is that lions, hyenas, cheetahs, and leopards are all *around*, but the chances of them walking past a camera on any given day are kinda low. I’m still trying to find a pattern in the monthly distributions below.
So this is what I’ve been staring at in my turkey-induced post-Thanksgiving coma. Could be worse!
Truth be told, I *have* been working on data analysis from the start. It’s actually one of my favorite parts of research — piecing together the story from all the different puzzle pieces that have been collected over the years.
But right now I am knee-deep in taking a closer look at the camera trap data. Since we have *so* many cameras taking pictures every day I want to look at where the animals are not just overall, but from day to day, hour to hour. I’m not 100% sure what analytical approaches are out there, but my first step is to simply visualize the data. What does it look like?
So I’ve started making animations within the statistical programming software R. Here’s one of my first ones (stay tuned over the holidays for more). Each frame represents a different hour on the 24 hour clock: 0 is midnight, 12 is noon, 23 is 11pm, etc. Each dot is sized proportionally to the number of captures of that species at that site at that time of day. The dots are set to be a little transparent so you can see when sites are hotspots for multiple species. [*note: if the .gif isn't animating for you in the blog, try clicking on it so it opens in a new tab.]
Maybe from time to time you’ve wondered: Who are these scientists running Snapshot Serengeti? How did they get where they are? (And why am I sitting here instead of traipsing across the Serengeti myself?)
Ali and I are both graduate students at the University of Minnesota. What that means is that a while ago (seven years for me!) we filled out an application and wrote some essays for admission to the University of Minnesota’s graduate school — just like you would do for college admissions. The difference is that for graduate school, you also need to identify an advisor — a faculty member who will become both your mentor and your judge — and an area of research that you want to pursue. And while the admissions materials matter, it’s very important that your future advisor want to take you on as a student and that your area of research interest meshes well with hers or his.
In the U.S., you can apply for a Masters program or a Ph.D. program. In some places you can get a Masters on the way to a Ph.D., but that’s not the case at Minnesota. So I applied for the Ph.D., got admitted and started as a Ph.D. student in the fall of 2007. I’m pretty much only going to talk about Ph.D.s from here on out. And I should point out that graduate school systems vary from country to country. I’m just going to talk about how it works in the U.S. because I’m not terribly familiar with what happens in other countries.
For the first 2-3 years in our program, students spend much of their time taking classes. These are mostly higher level classes that assume you already took college-level classes in basic biology, math, etc. I came in with an college degree in computer science, and so a bunch of the classes I took were actually more fundamental ecology and evolution classes so I could get caught up. But many classes are reserved for just graduate students or for grad students plus motivated seniors.
At the same time as taking these classes, students are expected to come up with a research plan to pursue. The first couple years are filled with a lot of anxiety about what exactly to do, and there are plenty of missteps. My first attempt at a research project involved tracking the movement of wildebeest in the Serengeti using satellites and airplane surveys. (Yes, you can see individual wildebeest in Google Earth if you hunt around!) But it turned out not to be a logically or financially feasible project, so I discarded it — after a lot of time and energy investment.
Around the end of the second year and beginning of the third year, grad students in the U.S. take what are called “preliminary” or “comprehensive” exams. These vary from school to school and from department to department. But they usually consist of both a written and oral component. In some places the goal of these exams to to assess whether you know enough about the broad discipline to be allowed to proceed. In other places, the goal is to judge whether or not you’ve put together a reasonable research plan. The program Ali and I are in leans more toward the latter. It requires a written proposal about what you plan to do for research. This proposal is reviewed by several faculty who decide whether it passes or not.
If you pass your written component, you then give a public talk on your proposed research followed by a grueling two to three hour interview with your committee. In our program, students choose their committee members, following a few sets of rules about who can be on it. My committee had five people, including my two advisors. They took turns asking me questions about my proposed research, how I would collect data, analyze it, how I would deal with adversity. The committee then met without me to decide whether I passed or not. (spoiler: I passed)
So, assuming a student passes the preliminary exams, she or he is then considered a “Ph.D. Candidate,” which basically means that all requirements except the actual dissertation itself have been fulfilled. If you’ve ever heard the term “A.B.D.” or “All But Dissertation,” that is what this means. The student got through the first hurdles, but never got a dissertation done (or accepted).
Now it’s time for the research. With luck, persistence, motivation, and lack of confounding factors, a student can do the research and write the dissertation in about three years. Doing research at first is slow because, like learning anything new, you make mistakes. I spent a lot of time gathering data that I’m not going to end up using. Now that I’ve been doing research for a few years, I can better estimate which data is worth collecting and which is not. And so I’m more efficient. While doing research, the student is also reading other people’s related research, and often picking up a side-project or two.
Eventually, the student, together with the advisor(s) and committee members, decides that she or he has done enough research to prove that she or he is a capable professional scientist. All the research gets written up into a massive tome called the dissertation. These days, it’s not uncommon for graduate students in the sciences to write up their dissertation chapters as formal papers that then get published in scientific journals. Sometimes one or more chapters is already published by the time the dissertation is submitted.
When the writing of the dissertation is finished, it gets sent to the committee to read. The student then gives a formal, public talk on the results of the dissertation research, followed by another two to three hour interview with the committee. This time it’s called the “Dissertation Defense,” and the committee asks questions about the research results (and possibly asks the student to fight a snake). The committee then meets without the student and comes up with a decision of whether the student passes or not. There is also often a conditional part of this decision that requires some portion of the dissertation to be revised or added to. So, a decision of “pass, conditional on the following revisions:” is pretty common.
I should mention that while being a grad student has been mostly quite fun, you may not want to drop your day job and run off to academia just yet. There’s the issue of funding. On the plus side, you can acquire funding in the sciences so that you don’t have to take on debt to do your degree (which is not so true in the humanities). Ali and I have both applied for and received fellowships that have allowed us to do most of our graduate program without having to work. But many — maybe most — grad students in the sciences work essentially part-time jobs (20 hours/week) as teaching assistants for faculty. This can really slow down research progress, as well as making some types of research impossible (for example, those that require lengthy trips to the Serengeti). Whether working or on fellowship, students typically gross no more than $30,000 annually, and often less than $25,000, which can be quite reasonable (single person living in a low-cost-of-living area) or prohibitive (person supporting a family living in a high-cost-of-living area). Benefits are pretty much non-existent, with the exception of health coverage, which can range from great (thanks, Minnesota!) to really bad to non-existent.
I mention all this this because I am about to defend my dissertation! In a little less than two weeks I will give a talk, sit down with my committee, and try to convince them I’m a decent scientist. Wish me luck.
Crazy week this week, so I just wanted to post a link to this fascinating and hilarious blog post about plant communication. Yup, you heard right. Animals aren’t the only things that communicate: plants do too! But instead of using sound, plants communicate via chemicals.
First, some plants respond to hungry predators (e.g. bugs, mammalian herbivores) by producing bad-tasting or toxic chemicals that stops would-be-munchers in their tracks — this is called and “induced response” or “induced defense” and is pretty well documented in terrestrial plants. But what’s even cooler is that attacked plants might also release chemical signals to “talk” to neighbors — allowing un-munched-on plants to trigger pre-emptive defenses. Originally known as the “talking trees hypothesis,” this interplant communication was first described in the 1980′s — though more recent research suggests that “eavesdropping” might better capture the true nature of the interaction.
More recently and perhaps even cooler is that plants not only “talk” to each other, but to other animals! The blog post linked above describes an intertidal plant that basically calls in the predators of its predators. When the plant gets munched on, by, say, a snail, it releases a chemical signal that attracts things that eat snails, like crabs.
Crazy, and awesome. Even though I think I’ll stick with studying big furry things, plants are pretty cool.
Most of you have probably seen this picture:
As well as the ones after it:
This series of photos was taken at site H11 along the Loyangalani river and remains, to me, one of the most amazing accomplishments of our camera trap survey to date.
First, seeing a kill is rare. In the 47 years that the Lion Project has been watching Serengeti’s lions, we’ve only seen lions with about 4,000 carcasses; of those, we’ve only actually seen them in the act of killing 1,100 animals. That might sound like a lot, but with one or two people on the ground, almost every day of the year, racking up nearly 50,000 sightings, that’s not that often.
I don’t love this series simply because this random, stationary, complacently-stuck-to-a-tree camera trap caught this rather rare event – but because it goes on to document the story that follows: A single lioness takes down a zebra much bigger than herself. Within minutes, her sister joins her (free meal!). Note how big their bellies already are though, when they begin to eat. These aren’t particularly hungry lions to begin with. About 45 minutes later, they are staring out of view of the camera, and then comes a group of hyenas. The carcass goes back and forth between them throughout the night, with a jackal darting in to sneak a nibble.
Food stealing, or kleptoparasitism, is a major part of life for Serengeti carnivores. Contrary to long-standing popular belief (reinforced by the Lion King), hyenas are not skulking scavengers living only off others’ leftovers. Hyenas are quite adept predators and scavenge only about 40% of their diet; lions scavenge at least 30% of theirs. And, in fact, lions steal a lot more food from hyenas than is apparent at first glance. More often than not, when we see hyenas lurking anxiously around a pride of lions demolishing a carcass, it’s because hyenas made the kill, and lions stole it away. Research from Kenya suggests lions might actually suppress hyena populations simply by stealing their food.
On the flip side, work from Botswana suggests that hyenas are able to steal food from lions if and only if hyenas outnumber lions by at least 4 to one, and there are no adult male lions present. (Remember, males are half again as big as females: hyenas don’t stand a chance.) But observations that Craig and a former graduate student made from the Ngorongoro Crater further revealed that even when lions do give up a kill, they are so full they can barely move – it’s simply not worth the effort to fend off hyenas any more.
So, kleptoparasitism is a part of life if you are a Serengeti carnivore, but it’s not always as simple as the movies make it out to be. It’s a pretty cool mechanism that might be driving predator dynamics though – I just wish it weren’t so hard to test!!
Does experience help with identifying Snapshot Serengeti images? I’ve started an analysis to find out.
I’m using the set of about 4,000 expert-classified images for this analysis. I’ve selected all the classifications that were done by logged-in volunteers on the images that had just one species in them. (It’s easier to work with images with just one species.) And I’ve thrown out all the images that experts said were “impossible.” That leaves me with 68,535 classifications for 4,084 images done by 5,096 different logged-in volunteers.
I’ve counted the number of total classifications each volunteer has done and given them a score based on those classifications. And then I’ve averaged the scores for each group of volunteers who did the same number of classifications. And here are the results:
Here we have the number of classifications done on the bottom. Note that the scale is a log scale, which means that higher numbers get grouped closer together. We do this so we can more easily look at all the data on one graph. Also, we expect someone to improve more quickly with each additional classification at lower numbers of classifications.
On the left, we have the average score for each group of volunteers who did that many classifications. So, for example, the group of people who did just one classification in our set had an average score of 78.4% (black square on the graph). The group of people who did two classifications had an average score of 78.5%, and the group of people who did three classifications had an average score of 81.6%.
Overall, the five thousand volunteers got an average score of 88.6% correct (orange dotted line). Not bad, but it’s worth noting that it’s quite a bit lower than the 96.6% that we get if we pool individuals’ answers together with the plurality algorithm.
And we see that, indeed, volunteers who did more classifications tended to get a higher percentage of them correct (blue line). But there’s quite a lot of individual variation. You can see that despite doing 512 classifications in our set, one user had a score of only 81.4% (purple circle). This is a similar rate of success as you might expect for someone doing just 4 classifications! Similarly, it wasn’t the most prolific volunteer who scored the best; instead, the volunteer who did just 96 classifications got 95 correct, for a score of 99.0% (blue circle).
We have to be careful, though, because this set of images was drawn randomly from Season 4, and someone who has just one classification in our set could have already classified hundreds of images before this one. Counting the number of classifications done before the ones in this set will be my task for next time. Then I’ll be able to give a better sense of how the total number of classifications done on Snapshot Serengeti is related to how correct volunteers are. And that will give us a sense of whether people learn to identify animals better as they go along.
### Last week Craig spoke for Cafe Scientifique about lions and shared the research that Lion Project has been conducting for the last 45 years. Check out the video here. Peter and Faith, UMN undergrads conducting research in the Lion Lab, attended the talk and share their experiences as well. ####
Peter and Faith here! Last week we had the opportunity to attend the Bell Museum’s Cafe Scientifique. Cafe Scientifique allows scientists from all disciplines and specialties to share their research directly with the public in the form of a casual presentation given at the Bryant Lake Bowl in Minneapolis, MN. This past month’s talk was given by Snapshot Serengeti’s own Professor Craig Packer, giving a historic rundown of some of the highlights of the lion research conducted by the University of Minnesota’s Lion Research Center.
As prospective lion researchers ourselves, it was both interesting and valuable to hear the conclusions of past research from the perspective of the researcher. Not to mention having it be told in a casual and humorous way, which is a refreshing break from the stack of scientific papers we are usually reading! The audience, which was made up of local community members, was also engaged in the talk. Even though Dr. Packer presented complex graphs and maps, he explained the research in a way that was accessible to everyone. The studies that were discussed during the talk included the lion’s mane study, why lions form prides, and even a bit about lion conservation and the potential use of fences to protect vulnerable populations. In addition to reviewing past research, Dr. Packer also talked about the lion project’s current research–Snapshot Serengeti. The audience was amazed by how fast volunteers sorted through the millions of images on Snapshot Serengeti. (To all of you that have contributed to the success of “Snapshot”, cheers to you!) By the end of the talk, the entire audience, (including us!) had loads of insightful questions, and left with a piqued interest in the world of lion research.
Three weeks into graduate school and I’d have to say that it’s been an overwhelming and exciting time thus far. The coursework is intense, the lectures intriguing, and it’s certainly been interesting getting to know the diverse array of people who populate the Ecology, Evolution, and Behavior department. Accomplishment of the week, however, would have to be rounding up some IT guys to fix the lab printer so that I can finally make copies of all the papers I need to be reading! Score.
Even though I’ve just begun at UMN, I had been developing potential research project to do in the Lion Lab over the last several months. While Ali focuses on interactions between different predator species, I will be diving into the interspecies interactions that compose the Serengeti’s predator-prey dynamics. Specifically, I want to look into how physical predation along with the fear of potential predation influence how and where herbivores move throughout the day. Snapshot Serengeti is essential to my research because the camera traps are collecting data on where herbivores are congregating 24/7. Most other studies have been limited to looking at large-scale distributions during the day, whereas we can pick apart fine-scale distribution patterns even during the hours of darkness.
Now the Serengeti and the creatures in it are not static, but move around and prioritize different activities throughout the day. Herbivores are active during the day (diurnal), whereas the major savanna predators are most active in the twilights (crepuscular) and evenings (nocturnal). To avoid predators and maximize resource intake, herbivores could be strategizing about what they do and when they do it. I want to look at where the prey herbivores are during different times of the day and see if and how this changes throughout the 24-hour cycle in. If it does, we can then move into examining different hypotheses and motivating factors for these particular movement patterns.
One thing I would like to do is use lion behavioral data from the Serengeti Lion Project to construct a map of predator “attack risk” – a diagram showing the areas which include landscape features that are known to increase predator attack success. Another kind of map I can construct is one highlighting areas of prime resources based off of information on different herbivore species’ primary diets. This would reveal where herbivores should be going if they were focused solely on resource acquisition. The camera traps provide yet another layer by showing us where the herbivores ARE actually spending their time, and we can compare this (actual) distribution to those predicted by the two mentioned models.
Studies like this would not be possible without the novel type of information being generated by the camera traps. Being able to pull information from the pictures and add in additional data from the lion behavior projects, I have a good chance of being able to reveal something interesting about the dynamic interactions being hashed out in the Serengeti.
From last week’s post, we know that we can identify images that are particularly difficult using information about classification evenness and the fraction of “nothing here” votes cast. However, the algorithm (and really, all of you volunteers) get the right answer even on hard images most of the time. So we don’t necessary want to just throw out those difficult images. But can we?
Let’s think about two classes of species: (1) the common herbivores and (2) carnivores. We want to understand the relationship between the migratory and non-migratory herbivores. And Ali is researching carnivore coexistence. So these are important classes to get right.
First the herbivores. Here’s a table showing the most common herbivores and our algorithm’s results based on the expert-classified data of about 4,000 images. “Total” is the total number of images that our algorithm classified as that species, and “correct” is the number of those that our experts agreed with.
We see that we do quite well on the common herbivores. Perhaps we’d wish for Thomsons gazelles to be a bit higher (Grants gazelles are most commonly mis-classified as Thomsons), but these results look pretty good.
If we wanted to be conservative about our estimates of species ranges, we could throw out some of the images with high Pielou scores. Let’s say we threw out the 10% most questionable wildebeest images. Here’s how we would score. (Note that I didn’t do the zebra, since they’d be at 100% again, no matter how many we dropped.) The columns are the same as the above table, except this time, I’ve listed the threshold Pielou score used to throw out 10% of the images of that species.
|species||Pielou cutoff||total||correct||% correct|
We do quite a bit better with our Thomsons gazelle and increase the accuracy of all the other species at least a little. But do we sacrifice anything throwing out data like that? If wildebeest make up a third of our images and we have a million images, then we’re throwing away 33,000 images(!), but we still have another 300,000 left to do our analyses. One thing we will look at in the future is how much dropping the most questionable images affects estimates of species ranges. I’m guessing that for wildebeest it won’t be much.
What if we did the same thing for Thomsons gazelle or impala? We would expect about 50,000 images of each of those per million images. Throwing out 5,000 images still leaves us with 45,000, which seems like it might be enough for many analyses.
Now let’s look at the carnivore classifications from the expert-validated data set:
Wow! You guys sure know your carnivores. The two wrong answers were the supposed bat-eared fox that was really a jackal and the supposed striped hyena that was really an aardwolf. These two wrong answers had high Pielou scores: 0.77 and 0.83 respectively.
Judging by this data set, about 2.5% of all images are carnivores, which gives us about 25,000 carnivore images for every million we collect. That’s a lot of great data on these relatively rare animals! But it’s not so much that we want to throw any of it away. Fortunately, we won’t have to. We can use the Pielou score to have an expert look at the most difficult images.
Let’s say Ali wants to be very confident of her data. She can choose the 20% most difficult carnivore images — which is only about 5,000 per million images, and she can go through them herself. Five thousand images is nothing to sneeze at, of course, but the work can be done in a single day of intense effort.
In summary, we might be able to throw out some of the more difficult images (based on Pielou score) for the common herbivores without losing much coverage in our data. Further analyses are needed, though, to see if doing so is worthwhile and whether we lose anything by throwing out so many correct answers. For carnivores, the difficult images can be narrowed down sufficiently that an expert can double-check them by hand.
Back in June, I wrote about algorithms I was working on to take the volunteer data and spit out the “correct” classification of for each image. First, I made a simple majority-rules algorithm and compared its results to several thousand classifications done by experts. Then, when the algorithm came up with no answer for some of the images (because there were no answers in the majority), I tried a plurality algorithm, which just looked to see which species got the most votes, even if it didn’t get more than half the votes. It worked well, so I’m using the plurality algorithm going forward.
One of the things I’ve been curious about is whether we can detect when particular images are “hard.” You know what I mean by hard: animals smack up in front of the camera lens, animals way back on the horizon, animals with just a tip of the ear or a tuft of tail peeking onto the image from one side, animals obfuscated by trees or the dark of night.
So how can we judge “hard”? One way is to look at the “evenness” of the volunteer votes. Luckily, in ecology, we deal with evenness a lot. We frequently want to know what species are present in a given area. But we also want to know more than that. We want to know if some species are very dominant in that area or if species are fairly evenly distributed. For example, in a famous agricultural ecology paper*, Cornell entomologist Richard Root found that insect herbivore (pest) species on collard greens were less even on collards grown in a big plot with only other collards around versus on those grown in a row surrounded by meadow plants. In other words, the insect species in the big plot were skewed toward many individuals of just a few species, whereas in the the meadow rows, there were a lot more species with fewer individuals of each species.
We can adopt a species evenness metric called “Pielou’s evenness index” (which, for you information theorists, is closely related to Shannon entropy.)
[An aside: I was surprised to learn that this index is named for a woman: Dr. Evelyn Chrystalla Pielou. Upon reflection, this is the first time in my 22 years of formal education (in math, computer science, and ecology) that I have come across a mathematical term named for a woman. Jacqueline Gill, who writes a great paleo-ecology blog, has a nice piece honoring Dr. Pielou and her accomplishments.]
Okay, back to the Pielou index: we can use it to judge how even the votes are. If all the votes are for the same species, we can have high confidence. But if we have 3 votes for elephant and 3 votes for rhino and 3 votes for wildebeest and 3 votes for hippo, then we have very low confidence. The way the Pielou index works out, a 0 means all the votes are for the same species (high skew, high confidence) and a 1 means there are at least two species and they all got the same number of votes (high evenness, low confidence). Numbers in between 0 and 1 are somewhere between highly skewed (e.g. 0.2) and really even (e.g. 0.9).
Another way we could measure the difficulty of an image is to look at how many people click “nothing here.” I don’t like it, but I suspect that some people use “nothing here” as an “I don’t know” button. Alternatively, if animals are really far away, “nothing here” is a reasonable choice. We might assume that the percentage of “nothing here” votes correlates with the difficulty of the image.
I calculated the Pielou evenness index (after excluding “nothing here” votes) and the fraction of “nothing here” votes for the single-species images that were classified by experts. And then I plotted them. Here I have the Pielou index on the x-axis and the fraction of “nothing here” votes on the y-axis. The small pink dots are the 3,775 images that the algorithm and the experts agreed on, the big blue dots are the 84 images that the plurality algorithm got wrong, and the open circles are the 29 images that the experts marked as “impossible.” (Click to enlarge.)
And sure enough, we see that the images the algorithm got wrong had relatively high Pielou scores. And the images that were “impossible” had either high Pielou scores or a high fraction of “nothing here” votes (or both). I checked out the four anomalies over on the left with a Pielou score of zero. All four were unanimously voted as wildebeest. For the three “impossibles,” both Ali and I agree that wildebeest is a reasonable answer. But Ali contends that the image the algorithm got wrong is almost certainly a buffalo. (It IS a hard image, though — right up near the camera, and at night.)
So we do seem to be able to get an idea of which images are hardest. But note that there are a lot more correct answers with high Pielou scores and high “nothing here” fractions than errors or “impossibles”. We don’t want to throw out good data, so we can’t just ignore the high-scorers. But we can attach a measure of certainty to each of our algorithm’s answers.