The Wrong Answers
Ever since I started looking into the results from Season 4, I’ve been interested in those classifications that are wrong. Now, when I say “wrong,” I really mean the classifications that don’t agree with the majority of volunteers’ classifications. And technically, that doesn’t mean that these classifications are wrong in an absolute sense — it’s possible that two people classified something correctly and ten people classified it wrong, but all happened to classify it wrong the same way. This distinction between disagreement with the majority and wrong in an absolute sense is important, and is something I’m continuing to explore.
But for right now, let’s just talk about those classifications that don’t agree with the majority. To first look at these “wrong” classifications, I created what’s called a heat map. (Click to make it bigger.)
This map shows all the classifications made in Season 4 for images with just one species in it. (More details on how it’s made at the end, for those who want to know.) The species across the bottom of the map are the “right” answers for each image, and the species along the left side are all the classifications made. Each square represents the number of votes for the species along the left side in an image where the majority voted for the species across the bottom. Darker squares mean more votes.
So, for example, if you find aardvark on the bottom and look at the squares in the column above it, you’ll see that the darkest square corresponds to where there is also aardvark on the left side. This means that for all images in which the majority votes was for aardvark, the most votes went to aardvark — which isn’t any surprise at all. In fact, it’s the reason we see that strong diagonal line from top left to bottom right. But we can can also see that in these majority-aardvark images, some people voted for aardwolf, bat-eared fox, dik-dik, hare, striped hyena, and reedbuck.
If we look at the heat map for dark squares other than the diagonal ones, we can see which animals are most likely confused. I’ve circled in red some of the confusions that aren’t too surprising: wildebeest vs. buffalo, Grant’s gazelle vs. Thomson’s gazelle, male lion vs. female lion (probably when only the back part of the animal can be seen), topi vs. hartebeest, hartebeest vs. impala and eland(!), and impala vs. Grant’s and Thomson’s gazelle.
In light blue, I’ve also circled a couple other interesting dark spots: other-birds being confused with buffalo and hartebeest? Unlikely. I think what’s going on here is that there is likely a bird riding along with the large mammal. Not enough people classified the bird for the image to make it into my two-species group, and so we’re left with these extra classifications for a second species.
It’s also interesting to look at the white space. If you look at the column above reptiles, you see all white except for where it matches itself on the diagonal. That means that if the image was of a reptile, everyone got it. There was no confusing reptiles for anything else. Part of this is that there are so few reptile images to get wrong. You can see that wildebeest have been misclassified as everything. I think that has more to do with there being over 17,000 wildebeest images to get wrong, rather than wildebeest being particularly difficult to identify.
What interesting things do you see in this heat map?
(Read on for the nitty gritty or stop here if you’ve had enough.)
I learned last week that some of you enjoy all the little details. So here’s what I did this week: I used last week’s analysis to pull out the 44,471 capture events containing just one species (according to the majority). I then pared this down to just the ones that had gathered enough classifications to have a consensus — something I didn’t do last week that I probably should have. I labeled each classification with the “right” species based on the majority rules I explained last week. I left out any capture events where the answer was “hard to figure out.” That left me with 447,901 classifications from 38,450 capture events. And then I just tallied everything up in a big table, counting how many classifications of species X there were for a capture event that had the majority classification of Y.
To make the heat map, I used a spiffy little program called JMP that makes quick data analysis easy. But you could make the same sort of map using open software like R. The first time I made the map, I used the raw number of classifications as the gradient from white-to-black in the heat map. But there are so many wildebeest capture events that they completely swamped everything else out; there was just a black square for wildebeest-wildebeest and a gray one for zebra-zebra, and everything else was white. So I did what scientists often do when confronted with this sort of scale problem: I took the natural log of the number of classifications. This has the effect of making large numbers appear closer to one another and smaller numbers appear to spread out more. That did the trick, and the result is the heat map you see above.