multiplying entities unnecessarily: February 2007

Monday, February 26, 2007

G&E CH 5

I like the framework that G&E have laid out in this chapter on the several different general approaches to statistical analysis, and I do think it is all worth reading fairly closely. That said, I think the simple example that they use (ant nests in forests and fields) to illustrate the different approaches (an excellent pedagogical approach, IMHO) is telling: Their descriptions of how one would go about implementing their "monte carlo" approach is clear and I expect would be easy (if tedious) for most any one at your level to implement using a spreadsheet. Their description of the standard parametric analysis is -- I think -- a reasonable compromise between overview and detail (which you'll get a a little later in the semester); after reading it I think you should have some sense of what F represents in an ANOVA (although not the ability to calculate it yet). As to Bayesian analysis -- I'll keep my opinion to myself for now, but I will prompt you with the following: after reading through this section, ask yourself if you could begin to put together the approach that you would need to follow in order to repeat the authors' analysis.

I do think they do a bit of a disservice to non-parametric statistics, and, given their ubiquity, maybe should have spent a bit more time on them. We will, ultimately, come back to some of the more popular of these approaches (e.g. chi square) in later chapters.

Wednesday, February 14, 2007

JV ch. 4

although this chapter is titled 'multivariate data', most of it is spent filling in the gaps and expanding your understanding of how R deals with data in the form of lists and data frames. [although we haven't really talked about it, you've been using data frames since you first started using attach().] also of note will be the additional practice you will get (and skills you will develop) in making plots. although it may seem insanely hard at first, once you get the hang of it, R will allow you to make some really nice plots with comparably little effort (at least in comparison to at least some other statistical graphing packages that i'm familiar with).

as to what to focus on -- at the beginning of the chapter the author again spends some time showing you how to make various tables, which, as i've indicated before, i think may be something better left to spreadsheets. at least at the beginning.

the end of section 4.1 gives you a nice explanation of high- versus low- level plotting features, and some examples of additional plotting options.

section 4.2 is a tedious but useful (and necessary) breakdown of some of the details of data frames and lists, whereas section 4.3 is, in my opinion, a little on the tangiential side. if you're reading along about xtabs(), split(), and stack(), and you're zoning out, don't worry too much. you can come back to these things when you find a problem that necessitate them.

lattice graphics (section 4.4) are pretty cool when your data are appropriate to be shown in this fashion, so this section is worth a read, whereas, possibly with the exception of 'factors', most of section 4.5 can be safely skimmed or skipped at this point (as JV himself indicates).

Monday, February 12, 2007

so i finally found my book...

after searching all day, it turns out my copy of G&E was in the back of the car which my wife had at work...

anyway, by the minute it's becoming too late for me to give you any real guidance on reading ch. 4, so i'm going to suggest something different -- we'll turn the tables and i'm going to ask you to bring a list of a dozen or so things YOU thought were most important about hypothesis testing, and we'll compare notes before we begin going through the chapter together. oh -- and try not to bunch them all up at the beginning of the chapter.

if you want to REALLY impress me, you can assemble your list in the form of questions. :)

Wednesday, February 7, 2007

jv ch. 3

hereby my hopefully helpful but nevertheless random thoughts as i read back through JV chapter 3. unfortunately our two texts will begin to diverge for a bit at this point. it was a nice bit of synchronicity that JV chapter 2 and G&E chapter 3 largely overlapped in terms of summary statistics, but, for the next little while, each book's author(s) take a bit of a different tack. in a way, i think this is good, because, on their own, i think G&E would be a bit too theoretical, whereas JV would be a bit too pragmatic. i think the two balance each other nicely in this regard, although i do wish the content covered meshed a bit more consistently. looking ahead, we'll return to synchronous treatments of the fairly detailed topics of regression (which we touch on in this chapter), and ANOVA.

imho, some things are better done in spreadsheets, at least until you get the hang of the R way of doing things, so if you find yourself getting bogged down with binding vectors and adding margins in the early part of the chapter, i'd say you can safely skim it, and just be aware that you can do such things. in general, spreadsheets (such as microsoft excel) do this more easily and intuitively, and may be the tool of choice if you wish to do this for a big data set.

i do think it's pretty cool when JV shows you how to produce the side-by-side boxplots and overlapping density plots, and that skill will be useful in the future.

the q-q plots are a bit arcane, and i wouldn't spend too long on them. imho, there are better (if less visual) ways of checking the normality of your data.

scatterplots are particularly important, as are correlation and regression. be aware, though, that we'll be coming back to regression later in the semester. it does make good sense to at least introduce it here, though. i think the short bits on transformations and outliers are worth reading closely, too. though, again, we'll be coming back to them.

multiplying entities unnecessarily