Hello! I haven’t written in a while. After I defended my dissertation in December, I’ve been busy getting ready to move to the Boston area. I have now started a research position (technically called a “postdoctoral fellowship”) at Harvard University.
In this new job, I am putting together a new citizen science project. This project will help scientists better understand and forecast the effects of climate change on North American trees and plants. We have cameras up throughout the U.S. and Canada taking automatic pictures of forests, grasslands, shrublands, desserts, and even tundra. There are already several years of images recorded, so it’s a great data set to play with.
In order to understand the seasonality of trees and plants, we talk about “phenology,” which is the timing of when trees and plants go through their various life stages. You can think about a maple tree, for example, which puts out leaf buds in the spring, grows those leaves into a full green canopy, then those leaves start to change color, and eventually they all fall off the tree in the autumn. These phenology events define all sorts of processes that are important to people – ranging from how much carbon trees and plants take out of the air to the timing of seasonal pollen release (which you might care about if you have allergies).
Of course, computer algorithms can only do so much, which is where citizen science comes in. The human eye is great at looking at fine details in images and figuring out what’s going on in strange images. For example, one of my colleagues was looking at a measure of greenness in grassland images from Hawaii. This measure was calculated automatically from the images. But something seemed strange. When he went and looked at the individual images themselves, he discovered that there was a common plant that flowered yellow all at once, which changed the greenness in a surprising way.
I’m excited about this new job, but I’m still involved with Snapshot Serengeti. These past couple months, Ali and I have been training Meredith on all the behind-the-scenes image and data processing that goes on both before you see the images and after you’ve classified them. This has slowed down the release of Season 7 (sorry), but ensuring continuity means fewer problems down the line. (By the way, Meredith is a fast learner – it’s just that there’s a lot to learn!) And I’ll still be blogging here periodically.
I’ve had a couple people ask about my dissertation. It’s now published and available online. Note, though, that it doesn’t contain any Snapshot Serengeti content. I was already rather far along in writing it when Snapshot Serengeti launched, so I didn’t have time to include it. We’re working on the first Snapshot Serengeti papers now, though, and we will be sure to let you know when they’re ready to read.
In processing Seasons 5 and 6, I recently stumbled upon a bunch of video files amongst the stills. You may recall that while we have our cameras set to take still images, every once in a while a camera gets accidentally switched to video mode. Then it takes 10-second (silent) clips. Most of these are “blanks” triggered by grass waving in the wind. But every once in a while, we get ten seconds of animal footage. Here are some from Season 5.
And, what do you think this is?
Apologies for such sporadic blog posts recently. We’ve all been quite busy. I successfully defended my dissertation last week. And then I enjoyed the true spirit of Minnesota for the next couple of snowy days, getting to catch up with friends and colleagues whom I haven’t seen in quite some time. But I’m not quite done! I need to make some minor revisions to the dissertation text before submitting it, and this has been occupying much of my time this week, as I need to get it all done before the end of the month – and preferably earlier if I want to enjoy the holidays.
Ali, meanwhile, is deep in analyses of the Snapshot Serengeti data gathered to date. We’re still working on the time issues. If you’ve got crazy Python and/or SQL skills and some free time in the next few weeks, drop us a note. A little help would accelerate Ali’s research while I’m busy finishing up my dissertation work.
And Craig’s diving into the next round of National Science Foundation proposals. The preliminary proposals are due in mid-January and an accepted proposal would restart long-term funding for Snapshot Serengeti starting in 2015. The preliminary proposals are relatively short, but in some ways that makes them harder than the longer ones – we not only have to concisely describe the research, but also convince the reviewers that citizen science yields high-quality data.
While some ecologists are still skeptical of citizen science, more and more are coming to accept it as a valid and valuable way to gather and analyze science data. The astronomy field may be a bit ahead of ecology in this respect, but we’re glad they’re paving the way. And did you hear? The Zooniverse was awarded a $1.8 million Global Impact Award by Google that’s going to allow them to scale up their citizen science platform to host many more projects. I only wonder what citizen scientists will do in the (perhaps not too distant) future, when they have hundreds of citizen science projects to select among. How will you choose which ones to try?
Tomorrow is American Thanksgiving. Whether or not you celebrate Thanksgiving this week, I hope you’re able to spend time with family,
hang out with friends,
find some tasty food to eat,
get some rest,
and give thanks for the good things in life.
And a special thank you from us to you for all the work you’ve put in to classifying animals on Snapshot Serengeti.
Maybe from time to time you’ve wondered: Who are these scientists running Snapshot Serengeti? How did they get where they are? (And why am I sitting here instead of traipsing across the Serengeti myself?)
Ali and I are both graduate students at the University of Minnesota. What that means is that a while ago (seven years for me!) we filled out an application and wrote some essays for admission to the University of Minnesota’s graduate school — just like you would do for college admissions. The difference is that for graduate school, you also need to identify an advisor — a faculty member who will become both your mentor and your judge — and an area of research that you want to pursue. And while the admissions materials matter, it’s very important that your future advisor want to take you on as a student and that your area of research interest meshes well with hers or his.
In the U.S., you can apply for a Masters program or a Ph.D. program. In some places you can get a Masters on the way to a Ph.D., but that’s not the case at Minnesota. So I applied for the Ph.D., got admitted and started as a Ph.D. student in the fall of 2007. I’m pretty much only going to talk about Ph.D.s from here on out. And I should point out that graduate school systems vary from country to country. I’m just going to talk about how it works in the U.S. because I’m not terribly familiar with what happens in other countries.
For the first 2-3 years in our program, students spend much of their time taking classes. These are mostly higher level classes that assume you already took college-level classes in basic biology, math, etc. I came in with an college degree in computer science, and so a bunch of the classes I took were actually more fundamental ecology and evolution classes so I could get caught up. But many classes are reserved for just graduate students or for grad students plus motivated seniors.
At the same time as taking these classes, students are expected to come up with a research plan to pursue. The first couple years are filled with a lot of anxiety about what exactly to do, and there are plenty of missteps. My first attempt at a research project involved tracking the movement of wildebeest in the Serengeti using satellites and airplane surveys. (Yes, you can see individual wildebeest in Google Earth if you hunt around!) But it turned out not to be a logically or financially feasible project, so I discarded it — after a lot of time and energy investment.
Around the end of the second year and beginning of the third year, grad students in the U.S. take what are called “preliminary” or “comprehensive” exams. These vary from school to school and from department to department. But they usually consist of both a written and oral component. In some places the goal of these exams to to assess whether you know enough about the broad discipline to be allowed to proceed. In other places, the goal is to judge whether or not you’ve put together a reasonable research plan. The program Ali and I are in leans more toward the latter. It requires a written proposal about what you plan to do for research. This proposal is reviewed by several faculty who decide whether it passes or not.
If you pass your written component, you then give a public talk on your proposed research followed by a grueling two to three hour interview with your committee. In our program, students choose their committee members, following a few sets of rules about who can be on it. My committee had five people, including my two advisors. They took turns asking me questions about my proposed research, how I would collect data, analyze it, how I would deal with adversity. The committee then met without me to decide whether I passed or not. (spoiler: I passed)
So, assuming a student passes the preliminary exams, she or he is then considered a “Ph.D. Candidate,” which basically means that all requirements except the actual dissertation itself have been fulfilled. If you’ve ever heard the term “A.B.D.” or “All But Dissertation,” that is what this means. The student got through the first hurdles, but never got a dissertation done (or accepted).
Now it’s time for the research. With luck, persistence, motivation, and lack of confounding factors, a student can do the research and write the dissertation in about three years. Doing research at first is slow because, like learning anything new, you make mistakes. I spent a lot of time gathering data that I’m not going to end up using. Now that I’ve been doing research for a few years, I can better estimate which data is worth collecting and which is not. And so I’m more efficient. While doing research, the student is also reading other people’s related research, and often picking up a side-project or two.
Eventually, the student, together with the advisor(s) and committee members, decides that she or he has done enough research to prove that she or he is a capable professional scientist. All the research gets written up into a massive tome called the dissertation. These days, it’s not uncommon for graduate students in the sciences to write up their dissertation chapters as formal papers that then get published in scientific journals. Sometimes one or more chapters is already published by the time the dissertation is submitted.
When the writing of the dissertation is finished, it gets sent to the committee to read. The student then gives a formal, public talk on the results of the dissertation research, followed by another two to three hour interview with the committee. This time it’s called the “Dissertation Defense,” and the committee asks questions about the research results (and possibly asks the student to fight a snake). The committee then meets without the student and comes up with a decision of whether the student passes or not. There is also often a conditional part of this decision that requires some portion of the dissertation to be revised or added to. So, a decision of “pass, conditional on the following revisions:” is pretty common.
I should mention that while being a grad student has been mostly quite fun, you may not want to drop your day job and run off to academia just yet. There’s the issue of funding. On the plus side, you can acquire funding in the sciences so that you don’t have to take on debt to do your degree (which is not so true in the humanities). Ali and I have both applied for and received fellowships that have allowed us to do most of our graduate program without having to work. But many — maybe most — grad students in the sciences work essentially part-time jobs (20 hours/week) as teaching assistants for faculty. This can really slow down research progress, as well as making some types of research impossible (for example, those that require lengthy trips to the Serengeti). Whether working or on fellowship, students typically gross no more than $30,000 annually, and often less than $25,000, which can be quite reasonable (single person living in a low-cost-of-living area) or prohibitive (person supporting a family living in a high-cost-of-living area). Benefits are pretty much non-existent, with the exception of health coverage, which can range from great (thanks, Minnesota!) to really bad to non-existent.
I mention all this this because I am about to defend my dissertation! In a little less than two weeks I will give a talk, sit down with my committee, and try to convince them I’m a decent scientist. Wish me luck.
In the winter months (northern hemisphere winter, that is), we catch white storks on camera. They’re taking their winter vacation in the Serengeti — and across eastern and southern Africa.
White storks are carnivorous, eating insects, worms, reptiles, and small mammals. A flock of them like this makes me wonder about the diversity of small critters that they eat that we don’t catch on camera. Because they eat small animals, they can sometimes be seen near fires, ready to gobble up those creatures trying to escape flames and smoke.
The white stork has a favorable reputation with people in both Africa and in Europe, because it feeds on crop pests. In the spring, storks leave their wintering grounds and head north to Europe to breed. They build large nests out of sticks and are happy to do so on buildings and other structures with wide, unencumbered supports. And because they are considered useful — and sometime good luck — people allow them to build their nests on buildings. These nests are then frequently re-used year after year.
Several years ago I went to Poland to find my grandmother’s childhood home. This was made a bit challenging because when my grandmother was a child, the area was part of the Austro-Hungarian Empire and the names of everything — towns, streets — were in German. These days, of course, all the names are in Polish. After finding a list of place name translations, I set out to see if I could locate some buildings my grandmother described in her memoirs in a small town in the countryside outside of what is now Wrocław and was then Breslau. One of these was “Grandfather’s [my great-great-grandfather's] water mill with its stork nest on the roof.” Sure enough, I found a large old building in the middle of town right by the stream. It no longer sported a water wheel, but there on the roof: a stork’s nest, complete with stork.
A few weeks ago, Snapshot Serengeti volunteers spotted a Pangolin in Season 6. This is the best pangolin shot we’ve ever seen in this project.
Pangolins are rare and nocturnal, so you don’t see them often out in the field. The pangolin species we have in the Serengeti is called the ground pangolin (Manis temmincki), and it ranges from East Africa though much of Southern Africa.
I once went to Kruger National Park in South Africa for a conference and went on a guided tour in my free time; the tour leader asked what we wanted to see, and I shouted out “pangolin!” The tour leader gave me a withering look and we then went out to see the elephants and giraffes and buffalo that the other tourists were eager to see. I really did want to see a pangolin, though. I’ve never seen one in real life.
Pangolins have scales all along their back and curl up into balls like pillbugs when they feel threatened. They hang out in burrows that they either dig themselves or appropriate from other animals. And they have super long tongues that they use to get to ants and termites, their primary food. Pangolins have one baby at a time, and young pangolins travel by clinging to the base of their mother’s tail.
Pangolins don’t have any close living relatives. In fact, they have an order all to themselves (Pholidota). Because of how they look, scientists used to think they were most closely related to anteaters and armadillos. But now with genetic tools they’ve discovered that pangolins are more closely related to the order Carnivora, which includes all cats and dogs. It’s a bit strange to think that pangolins, which are sometimes called “scaly anteaters” have more in common genetically with lions than with actual anteaters, but that’s what the science tells us.
Many thanks to all of you who marked the new pangolin image in the Talk forum. That lets us make sure it gets classified correctly. ‘Pangolin’ was just one of those rare animals that didn’t make it onto the list of animals you can choose from, so our algorithm will classify it as something else, which we will fix by hand.
Last week, william garner asked me in the comments to my post ‘Better with experience’ how well the experts did on the about 4,000 images that I’ve been using as the expert-identified data set. How do we know that those expert-identifications are correct?
Here’s how I put together that expert data set. I asked a set of experts to classify images on snapshotserengeti.org — just like you do — but I asked them to keep track of how many they had done and any that they found particularly difficult. When I had reports back that we had 4,000 done, I told them that they could stop. Since the experts were reporting back at different times, we actually ended up doing more than 4,000. In fact, we’d done 4,149 sets of images (captures), and we had 4,428 total classifications of those 4,149 captures. This is because some experts got the same capture.
Once I had those expert classifications, I compared them with the majority algorithm. (I hadn’t yet figured out the plurality algorithm.) Then I marked (1) those captures where experts and the algorithm disagreed, and (2) those captures that experts had said were particularly tricky. For these marked captures, I went through to catch any obvious blunders. For example, in one expert-classified capture, the expert classified the otherBirds in the images, but forgot to classify the giraffe the birds were on! The rest of these marked images I sent to Ali to look at. I didn’t tell her what the expert had marked or what the algorithm said. I just asked her to give me a new classification. If Ali’s classification matched with either the algorithm or the expert, I set hers as the official classification. If it didn’t, then she, and Craig, and I examined the capture further together — there were very few of these.
And that is how I came up with the expert data set. I went back this week to tally how the experts did on their first attempt versus the final expert data set. Out of the 4,428 classifications, 30 were marked as ‘impossible’ by Ali, 1 was the duiker (which the experts couldn’t get right by using the website), and 101 mistakes were made. That makes for a 97.7% rate of success for the experts. (If you look at last week’s graph, you can see that some of you qualify as experts too!)
Okay, and what did the experts get wrong? About 30% of the mistakes were what I call wildebeest-zebra errors. That is, there are wildebeest and zebra, but someone just marks the wildebeest. Or there are only zebra, and someone marks both wildebeest and zebra. Many of the wildebeest and zebra herd pictures are plain difficult to figure out, especially if animals are in the distance. Another 10% of the mistakes were otherBird errors — either someone marked an otherBird when there wasn’t really one there, or (more commonly) forgot to note an otherBird. About 10% of the time, experts listed an extra animal that wasn’t there. And another 10% of the time, they missed an animal that was there. Some of these were obvious blunders, like missing a giraffe or eland; other times it was more subtle, like a bird or rodent hidden in the grass.
The other 40% of the time were mis-identifications of the species. I didn’t find any obvious patterns to where the mistakes were; here are the species that were mis-identified:
|wildebeest||6||buffalo, hartebeest, elephant, lionFemale|
|hartebeest||5||gazelleThomsons, impala, topi, lionFemale|
|gazelleGrants||4||impala, gazelleThomsons, hartebeest|
|reedbuck||3||dikDik, gazelleThomsons, impala|
Does experience help with identifying Snapshot Serengeti images? I’ve started an analysis to find out.
I’m using the set of about 4,000 expert-classified images for this analysis. I’ve selected all the classifications that were done by logged-in volunteers on the images that had just one species in them. (It’s easier to work with images with just one species.) And I’ve thrown out all the images that experts said were “impossible.” That leaves me with 68,535 classifications for 4,084 images done by 5,096 different logged-in volunteers.
I’ve counted the number of total classifications each volunteer has done and given them a score based on those classifications. And then I’ve averaged the scores for each group of volunteers who did the same number of classifications. And here are the results:
Here we have the number of classifications done on the bottom. Note that the scale is a log scale, which means that higher numbers get grouped closer together. We do this so we can more easily look at all the data on one graph. Also, we expect someone to improve more quickly with each additional classification at lower numbers of classifications.
On the left, we have the average score for each group of volunteers who did that many classifications. So, for example, the group of people who did just one classification in our set had an average score of 78.4% (black square on the graph). The group of people who did two classifications had an average score of 78.5%, and the group of people who did three classifications had an average score of 81.6%.
Overall, the five thousand volunteers got an average score of 88.6% correct (orange dotted line). Not bad, but it’s worth noting that it’s quite a bit lower than the 96.6% that we get if we pool individuals’ answers together with the plurality algorithm.
And we see that, indeed, volunteers who did more classifications tended to get a higher percentage of them correct (blue line). But there’s quite a lot of individual variation. You can see that despite doing 512 classifications in our set, one user had a score of only 81.4% (purple circle). This is a similar rate of success as you might expect for someone doing just 4 classifications! Similarly, it wasn’t the most prolific volunteer who scored the best; instead, the volunteer who did just 96 classifications got 95 correct, for a score of 99.0% (blue circle).
We have to be careful, though, because this set of images was drawn randomly from Season 4, and someone who has just one classification in our set could have already classified hundreds of images before this one. Counting the number of classifications done before the ones in this set will be my task for next time. Then I’ll be able to give a better sense of how the total number of classifications done on Snapshot Serengeti is related to how correct volunteers are. And that will give us a sense of whether people learn to identify animals better as they go along.