Showing posts with label analogies. Show all posts
Showing posts with label analogies. Show all posts

Thursday, August 25, 2011

How is a species like a baseball player?

Biomass is to runs as species is to player, and as ecologist is to Brad Pitt.

Community ecology and major league baseball have a lot to learn from each other.

Let's back up. As a community ecologist, I think about how species assemble into communities, and the consequences for ecosystems when species disappear. I'm especially interested using traits of species to address these issues. For the grassland plants that I often work with, the traits are morphological (for example, plant height and leaf thickness), physiological (leaf nitrogen concentration, photosynthetic rate), and life history (timing and mode of reproduction).

As a baseball fan, I spend a lot of time watching baseball. Actually, I'm watching my Red Sox now (multitasking as usual; I freely admit there's a lot of down time in between pitches). I care about how the team does, mostly in terms of beating the Yankees. I'm especially interested in how individual players are doing at any time; for fielders I care about their batting average and defensive skills, and for the pitchers I care about how few runs they allow and how many strikeouts they get.

So my vocation and avocation have some similarities. Both ecology and baseball have changed in the last decade or so to become more focused on 'granular' data at the individual level. In ecology this has been touted as a revolutionary shift in perspective, but is really a return to the important aspects of what roles organisms play in ecosystems, and how ecosystems are shaped by the organisms in them. This trait-based approach has shifted the collection and sharing of data on organism morphology, physiology, and life history into warp speed, to the great benefit of quantitatively-minded ecologists everywhere.

In baseball, the ability to collate and analyze data on every pitch and every play has lead to an explosion of new metrics to evaluate players. One of the simplest of these new metrics, which even the traditionalists in baseball now value, is "on base plus slugging" (OPS, see all the details here). This data-intensive approach to analyzing player performance was most famously championed by the manager of the Oakland Athletics in the late 1990's, now being played by Brad Pitt in the upcoming movie Moneyball.

There is no one ecologist in particular who can claim credit for popularizing trait-based approaches in community ecology, but for the sake of laughs let's make Owen Petchey the Brad Pitt analogue.

What can we do with this analogy? For pure nerd fun, we can think about what these two worlds can learn from each other.

What can baseball learn from community ecology?

One of the most notable trait-centric innovations in community ecology has been the use of functional diversity (FD), which represents how varied the species in a community are in terms of their functional traits. Many flavors of FD exist (one of which was authored by Owen Petchey, above), but the goal is to use one value to summarize the variation in functional traits of species in a community. A high value for a set of communities indicates greater distinctiveness among the community members, and is taken to represent greater niche complementarity.

For fun, I've taken stats from a fantastic baseball database[i] and calculated the FD of all baseball teams from 1871 to 2010. I used a select set of batting, fielding, and pitching statistics[ii], and you can see the data here. For the two teams that I pay the most attention to, I plotted their FD against wins, with World Series victories highlighted:

Given that these FD values represent how different the members of a team are, it's surprising that there is much of a pattern at all. But the negative relationship between wins and FD is strong and significant by several measures[iii]. So: the more similar a team is in terms of player statistics, the better the team does!

This pattern of less dissimilarity among players correlating with better performance at the team level has apparently been noticed before, by Stephen Jay Gould, who extrapolated this pattern also across teams to explain the gradual shrinking of differences among players over time:

"if general play has improved, with less variation among a group of consistently better payers, then disparity among teams should also decrease"

and so:

"As play improves and bell curves march towards right walls, variation must shrink at the right tail." (from "Full House", thanks to Marc for this quote!).

Interesting, but is it useful? One obvious drawback in this approach of examining variation in individual performance is that it ignores the fact that in baseball, we know that a high number of earned runs allowed is bad for a pitcher, and a low number for hits is bad for a hitter. In contrast, a high value for specific leaf area is neither good nor bad for a plant, just an indication of its nutrient acquisition strategy.

There are many exponentially more nerdy avenues to go with applying community ecology tools to baseball data, but I'll spare you from that for now!

What can community ecology learn from baseball?

One new baseball stat that gets a lot of attention during trades is 'wins above replacement'. This is such a complicated statistic to calculate that the "simple" definition is that for fielders, you add together wRAA and UZR, while for pitchers it is based off of FIP. I hope that cleared things up.

The point in the end is to say how many wins a player is worth, when compared to the average player. In ecology, the concept of 'wins above replacement' has at least two analogies.

First, community ecologists have been doing competition experiments since the dawn of time. The goal is to figure out what the effect of a species is at the community level, although fully factorial competition experiments at the community level are challenging to carry out. For example, Weigelt and colleagues showed that there can be non-additive effects of competitor plant species on a target species, but could rank the effect of competitors. This result allowed them to predict the effect of adding or removing a competitor species from a mixture, in a roughly similar way to how a general manager would want to know how a trade would change his or her team's performance.

Second, ecologists have shown that both niche complementarity and a 'sampling effect' are responsible for driving the positive relationship between biodiversity and ecosystem functioning. The sampling effect refers to the increasing chance of including a particularly influential species when the number of species increases. Large-scale experiments in grasslands have been carried out where plants are grown in monoculture and then many combinations, up to 60 species. The use of the monocultures allows an analysis similar in spirit to 'wins above replacement', by testing how much the presence of a particular species, versus the number of species, alters the community performance.

We could take this analogy further, and think of communities more like teams. A restoration ecologist might calculate 'wins above replacement' for all the species in a set of communities, and then create All Star communities from the top performers.

Lessons learned

A. Shockingly, there are baseball nerds, and there are ecology nerds, and there are even double-whammy basebology nerds.

B. There are quantitative approaches to analyzing individual performance in these crazily disparate realms which might be useful to each other.

C. I might need to spend more time writing papers and less time geeking out about baseball!

More analogies to consider:

Reciprocal transplants: trades?

Trophic levels: minor league system?

Nitrogen fertilization: steroids?


[i] One of the most astonishing databases around: complete downloadable stats for every player since 1871. This database is what NEON should aspire to be, except that this one was compiled completely privately by some single-minded and visionary baseball geeks!

[ii] Batting: Hits, at bats, runs batted in, stolen bases, walks, home runs

Fielding: Put outs, assists, errors, zone rating

Pitching: Earned run average, home runs allowed, walks, strike outs.

[iii] E.g. even after taking into account other more typical measures of success in offense (runs, R) and defense (runs allowed, RA), within years, there is still a negative slope for FD on wins:

lme(win ~ R + RA + FD, random = ~1|yearID, data = team)

Value Std.Err DF t-value p-value

(Intercept) 80.289 0.7411 2159 108.3 <0.001

R 0.107 0.0009 2159 116.8 <0.001

RA -0.105 0.0009 2159 -115.6 <0.001

FD -1.729 0.8083 2159 -2.1 0.0325