Tuesday, February 19, 2008

playing a mean harmonica

(yes -- the punny blog titles will just keep getting worse until someone starts posting comments!!)

as promised: part II of my engrossing saga of the two lesser-known cousins of the arithmetic mean. previously, we considered the geometric mean, and a fairly obscure use it might be put to in (slightly) more accurately summarizing population change over time. i'd be interested to hear (comments, anyone?) about any other uses of a biological nature to which it can be put.

so, on to the harmonic mean: S&R show you how to calculate it (p. 44) and if you google "harmonic mean + use" the interwebs will tell you that it might be useful for figuring out how fast you went on average under certain very unnatural driving conditions. evidently it also has some uses in calculating electrical resistance and maybe in petroleum geology as well. but -- we're all biologists... why should we care?

as it turns out, this is a fairly important measure in conservation biology as well, used in calculating effective population size over time. a number of papers and books (including Gotelli and Ellison, 2004, referenced in my previous blog post) outline or advocate for its use in 'averaging' population sizes over time.

as a hypothetical example (modified from Gotelli and Ellison, 2004): over a decade, a population has the following sizes: 986, 1067, 95, 221, 489, 821, 961, 1017, 1039, 1126. obviously something pretty bad happened there in year #3, from which it took several years to recover.

the arithmetic mean population size for the decade is still a pretty high 782.2.


> x = c(986, 1067, 95, 221, 489, 821, 961, 1017, 1039, 1126)
> mean(x)
[1] 782.2


however, the scenario i've laid out above, most of you will immediately recognize is a "bottleneck" of the type you learned about in reference to genetic drift. in terms of genetic diversity, the presence of such an event has a pronounced negative effect. the harmonic mean, not coincidentally, emphasizes the smaller values in a series, and gives them greater weight:


> 1/mean(1/x)
[1] 414.2493


most of the references that i consulted don't actually provide a citation to the original use of the harmonic mean for this purpose, however, using my amazing sleuthing skills, i managed to trace it back to at least the 1930s (Wright, 1938). i'd be curious if there were any references that pre-date this.

Reference

Wright, S. 1938. Size of population and breeding structure in relation to evolution. Science 87:430-431.

Tuesday, February 12, 2008

don't be mean :)

as promised in class today, a brief discursion into the realm of the 'alternate' means: geometric and harmonic. S&R do a fine job of explaining how to calculate these values, but as to why one might want to -- eh, not so good (imho). also of note is that -- at least according to the index -- S&R only mention geometric means once more (and even then just in passing) and don't seem to bring up harmonic means again at all. i'm not sure why (other than historical inertia) these statistics are almost always introduced, other than maybe to keep students on their toes. perhaps noteworthy is that collectively the arithmetic, geometric, and harmonic means are known as the 'Pythagorean' means.

anyway, there are a couple of very particular circumstances in which you might use one of these in a biological context, and it would be arguably superior to the arithmetic mean. a book which i think does a pretty good job of laying these out this is A Primer of Ecological Statistics by N.J. Gotelli and A.M. Ellison (2004), pp. 61-63. (if you're really into this stuff, you're welcome to borrow my copy and read it for yourself).

so here i expand somewhat on one of their hypothetical examples to illustrate the use of geometric mean in summarizing population growth rates: assume an initial population of 1000 individuals and, for simplicity's sake, a growth rate of 10% the first year, increasing by 1% per year up to 20% in the eleventh year. so, in the second year, the population is (1000 * 1.10) = 1100. likewise, in the third year, the population grows by 11% to (1100 * 1.11) = 1221. in the eleventh year, the population reaches 4633.07 (you'll have to forgive the biologically unrealistic fractional individuals).

now, if you wanted to summarize the growth over these eleven years, you'd be tempted to just 'average' them -- that is, take the arithmetic mean of 10%, 11%, 12% ... up to 20%, which -- as you can probably do in your head -- is exactly 15%. in other words, on average, you'd say, there was 15% growth per year for those eleven years. it makes sense, but, as it turns out, it's not quite exactly precisely right: 1000 * 1.15 = 1150 (1st year); 1150 * 1.15 = 1322.5 (2nd year) ... ending with 4652.39, which is almost 20 greater than it should be (4633.07; from previous paragraph).

so, the arithmetic mean overestimates the average growth. as it turns out, you get the right answer if you instead use the geometric mean of the eleven values (1.10, 1.11, 1.12 ... 1.20), which is only a little bit smaller: 1.149565... (as opposed to 1.15).

as a side-note: "R" doesn't have a built-in function for calcualting geometric means, but it's nevertheless fairly easy to do:

> Y = c(1.10, 1.11, 1.12, 1.13, 1.14, 1.15, 1.16 1.17, 1.18, 1.19, 1.20)
> mean(Y) ## regular old arithmetic mean
[1] 1.15
> exp(mean(log(Y))) ## geometric mean using base "e"
[1] 1.149565
> 10^(mean(log(Y, base=10))) ## same answer in base 10
[1] 1.149565


this blog entry has already turned out much longer than i anticipated, so i'll leave it as an exercise to the reader (if there are any of you left by now) to work through the calculations. given that you haven't yet been introduced to R's 'looping' functions, it would probably make more sense to do the calculations using a spreadsheet. (i know, i know; i warned you away from them for statistical work, but they nevertheless have their uses for quick-and-dirty calculations).

let me know if you're interested in doing this, and i'm happy to help you get started.

i'll pick up with harmonic means in my next entry! (i know you can hardly wait!)

Monday, January 28, 2008

let's blogroll

over on the right side of this page now resides a list ("blogroll") of those students' blogs who've sent me their addresses so far; this list will increase as more people get on board. although you can click on the "Read More" link at the bottom to go to a page that pulls together everyone's most recent posts (an "aggregator"), it's still worth visiting particular blogs individually, both to see some of the impressive design jobs that your fellow students have done (very artistic!) as well as to read and participate in the commenting that follows on the various posts.

Thursday, January 24, 2008

*cough* who knew blogs could get so dusty? *wheeze*

well, i'm back. please, no applause. thank you.

hereby i shall resurrect my old blog from last year to serve as a model, inspiration, and touchstone for you, my class, whom i have again tasked with starting and keeping your own blogs, where you will comment on your readings, thinkings, analyses, and general development as statisticians. and probably crack a few corny jokes.

i have kept the links to last year's blogs (over on the right side of the page) for the time being so that -- browsing through them -- you can get a sense of what was attempted by last year's students. some were quite successful.

my goals for this project this time around are twofold: first -- to foster introspection, or, as the educational psychologists call it, metacognition. in short: if you have to think about what you're thinking about, you're likely to get more out of thinking about it. that's the idea, anyways. ymmv.

second -- i want to foster discussion. again, for pedagogical reasons, this has important benefits: it builds a sense of community (which is especially important in a challenging class such as this one), and it gives each of you the opportunity to share what you've figured out. you never really know a subject so well as when you've had to teach it to someone else.

at any rate, even if all that fails, it's still better than quizzes.

and you get to crack jokes. e.g.: "97.3% of all statistics are made up."