Download Dear Dave, First off, my apologies for being slow to return my

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Dear Dave,
First off, my apologies for being slow to return my comments to you. I've spent a lot of
time over the past months playing with the parts of the Neptune data that I have, and so
your paper felt very relevant and timely to me—and I needed a bit of time to absorb the
wealth of ideas in it (and the shock to the system of some of it!). I clearly did not have a
complete understanding of nature of the Neptune data, and I'm very grateful that you
have written this paper to clarify things for the community at large.
I wasn't sure how much general vs. specific feedback you were after, so I'll start with the
extremes! On the most specific front, I am returning the PDF you sent me with
annotations of typos and other small (e.g. wording) suggestions I came across. The
Preview app doesn't have the greatest annotation tools, but (in case you haven't used
them before) if you choose the menu item View -> Sidebar -> Annotations, you'll see a
list of them so you can quickly find them.
Jumping right from the specific to the general, if I could make just one overall critical
comment about the paper it would be that I think it sometimes paints a more pessimistic
view of the record than I would put forward. One of my favorite sentences in the paper
is: It may be time to acknowledge a broader view of life and establish the evolution of
unicellular organisms as one of the 'main' subjects of paleobiologic research. My
impression upon first reading was that the focus in the paper on the—clearly very real
and important—shortcomings of the microfossil record left it looking less attractive "as
one of the 'main' subjects of paleobiology" next to the rest of the fossil record. Now, I am
well aware that this may well just be based in ignorance and naiveté on my part, and
because I know you have many more decades of experience with deep sea
microfossils, I did think long and hard about my comments, and I've tried to provide
below specific instances where I think the paper might benefit from highlighting the
advantages that working with microfossil data provides over the kind of macrofossil stuff
that populates the PBDB, for example. I keep comparing Neptune to PBDB in my mind
throughout your paper, as I think that's the implicit comparison (though you don't
explicitly say so), since that's where so much of the macroevolutionary paleobiological
research focus has been in recent years.
Pages 1-3 read well; I really like the distinction you set up here between the marine
animal record and the microfossil record in terms of proportion of high-level groups
preserved vs. proportion of species preserved.
On page 5 you refer to table X. I thought it might be helpful to provide absolute numbers
(and references) for the preservable species, rather than just %ages.
At the bottom of page 6 you make the point that it's possible that unpreservable
microfossil taxa evolved and went extinct again. While I agree it's completely possible, it
doesn't strike me as a significant worry—and it certainly isn't something that
distinguishes the microfossil record from any other sorts of fossil record.
The same thing I think applies to the points you make on page 7. Regarding
time-averaging: While it's true that you do often get bed-level ecological structure of
animals preserved on the shelves in obrution deposits (and I think it's a fair distinction to
make that these sorts of deposits are absent from the microfossil record) most deposits
of marine invertebrates are just as time-averaged, if not more so. (E.g. Kidwell, S.M.,
1998, Time-averaging in the marine fossil record: overview of strategies and
uncertainties: Geobios, v. 30, p. 977– 995; or a recent book chapter by Kowalewski and
Bambach, The Limits of Paleontological Resolution, in Topics in Geobiology, 2008,
Volume 21, 1-48. The latter places shelly shelf invertebrate assemblage time-averaging
at 10s to 100,000s of years) Regarding the recognition of hiatuses: again, I think you're
absolutely right to point out the major difference between deep-sea and shallow-sea
sediments in that the latter rarely, if ever, sees variations in lithology. However, I'm not
sure this necessarily makes the recognition of hiatuses much easier in shallower-water
sediments… I don't have any good references at hand to back this up, but I'm thinking
e.g. of the "coordinated stasis" debate in the 90s, where what had been previously
identified as pulsed turnover events turned out to be stratigraphic artifacts due to
hiatuses/sequence boundaries (maybe a Holland paper?). Even changes in lithology
can either represent rather little or very much time; and similarly you do get sometimes
very significant hiatuses within homogenous lithologies in shelf sediments, too.
If found your subsection "Incomplete Data" to be particularly interesting. From what
you've written, my understanding is that most faunal lists in Neptune consist of a "model
A" component, some more or less randomly chosen taxa, and a "model B" component,
those taxa on a pre-determined list of biostratigraphically relevant taxa also found to be
present in the sample. There are a couple of places where I'm not entirely clear on what
you're saying, though. On page 11, you write that "the differences in the average
reported diversity per sample/study simply reflect the average practical size of a
taxonomic list, and do not have a necessary relationship to actual real sample diversity."
Now does this mean that each study has a different taxonomic list, and that's what
determines list length—more so than underlying diversity?
If so, this should be a fairly easy prediction to test (and would back up what you're
saying/be an interesting thing to add to the paper if you have time). If the Neptune
database has publication information (I imagine it does), you should be able to parse
the data by time bins as well as by publication, and see if the variability is better
described by what time bin the taxon is in or by what publication they're from. If you
wanted to get statistical it seems like testing model support for the two predictors with
something like Akaike weights would be a good way to go.
In the next paragraph, I got confused. You write that "data collected under model C will
generally show a good correlation between sample availability and total diversity, but
this is due to the strong correlation, at least in deep-sea drilling material, between
taxonomic effort (and thus total reported diversity) and sample availability". What I take
this to mean is that in stratigraphic sections with more samples, the observed diversity
goes up, but not because of the reason we might think (i.e. pushing up along a collector
curve, the more things you look at, the more different kinds of things you see), but
rather because sample availability is correlated with taxonomic effort—meaning there
are longer (biostratigraphic) taxonomic lists used to check presence/absence than in
stratigraphic sections with fewer samples. If I've understood that right (and I'm not sure I
have!) it seems to me that this reduces down to essentially the same thing as a collector
curve, albeit via the detour of constructing a taxonomic list: the more diverse-seeming
assemblages seem thus because they have longer "model B" lists, not because they've
had more random samples taken. But the reason they have longer taxonomic lists is
because there is more "sample availability", as you put it, which I think means… they
have been more extensively—(randomly?!)—sampled.
At the bottom of page 11 you refer to Figure 8, which I found a little hard to read. It was
not clear to me what the inset vs. the main plot are, and whether they show the same
data. If they do, it strikes me that the main plot (i.e. not the inset one) must be missing
some data, because the largest number of samples in which a taxon appears on that
plot is <100, whereas there are several % of taxa in the inset plot which appear in
100-600 samples. Maybe you could indicate that some data is left out if this is true;
either way, it might be helpful to label and describe in the legend what the two plots
show. It would also be helpful to note on the plot (or in the legend) how many taxa there
are in total. I also couldn't quite follow the calculation you make in the figure legend.
What I thought I understood it to say was that—given what is known about preservation
and species durations—you would expect each taxon to appear in more samples than it
does, lending support to the idea that most taxa are undersampled (because of the
predominance of "model B" data collection). But I couldn't understand why you expected
the mode in plot 8a to be 300-400 taxa—from my reading of the plot (the y-axis is the
number of rad taxa which appear in the x-axis' number of samples, right?) a higher
value for the mode would imply that there are even more taxa that appear in only a few
samples, which isn't what I'd expect to see. But maybe you actually mean "mean", not
mode—but isn't the y-average on the histogram just a function of the total number of
taxa, i.e. it won't change with the shape of the distribution? Maybe I'm being stupid here
and you're talking about the x-average, although it just doesn't look like the main plot
has an average anywhere near 100, since that's the maximum value… I think I may be
missing the point here.
A more general observation here with regard to the characteristics of the existing
microfossil dataset is that, while what you describe is undoubtedly true and possibly
quite problematic for data analysis, I'm not sure this really distinguishes the data from
the macrofossil record in any deep way. While the published studies that go into PBDB
are probably rarely as systematic in recording taxa as the ODP/DSDP reports (go
through the "model B" list, then add a few random "model A" taxa), I don't think the
record as a whole is all that different. In some sense the vast majority of workers in the
macrofossil record are following some sort of "model B" type list-checking, in many
cases for the same reasons as it's done in Neptune (report ammonites X, Y, and Z,
following the taxon list for Upper Jurassic index fossils, or report trilobites X and Y which
tells us the formation is of Ordovician stage Z); other studies will focus in on one very
specific group (turritellid gastropods, say, or some specific group of vertebrates) and
describe everything they find in that group, ignoring everything else—also essentially a
"model B" sort of process. Then there's some "model A" type of papers, monographs
etc. describing more exhaustively what's found, (but often with a distinct and
problematic bias for new things, rather than occurrences of common or
already-described taxa). Either way, separate additions of "model A" and "model B" lists
I think would lead to a record that looks fairly similar to a record consisting of "model C"
lists.
My main point here is that while I wouldn't for a minute suggest that the incompleteness
of the Neptune data isn't real or problematic, but that the situation is no better in the
macrofossil record, and I think that bears mentioning.
It might be worth mentioning in your section on reworking that this sort of long-timescale
process is known to occur in the macrofossil record, too, although I understand it's
referred to as remanié there, and it isn't known how common it is (though I agree most
'reworking' for macrofossil people is on the 1,000s-100,000 yr timescale). Useful
references might be Craig, G.Y. 1966. Concepts in Palaeoecology, Earth-Science Rev.
2:127-155, and for an example of multi-myr reworking, Flessa, K.W. Time-averaging
and temporal resolution in Recent marine shelly faunas, Paleo Soc Short Course #6.
At the risk of sounding like a broken record… It struck me when reading the section
"Age Model Problems" that, again, this isn't something that's unique to Neptune data,
and that the salient characteristic distinguishing dating of deep-sea paleontological data
is that it's much more tightly constrained than for the macrofossil world. Alroy's time bins
in the PBDB publications are 10 my long for a reason—so if we consider our age model
errors to typically screw us up by 1 my, as you suggest, then we can be completely safe
and happy with 2 my time bins in Neptune for macroevolutionary studies… that's still
five times better. Also, if this source of error (unlike reworking) is unbiased, it shouldn't
matter much for macroevolutionary studies. As long as it affects samples equally, and
more or less evenly through time, we should be OK as long as the signal we're trying to
see is strong enough, and I'd hope that for at least some of the most important
paleobiological questions that can be addressed with this data, it would be.
On page 15 you suggest that the scarcity of most taxa in Neptune (the low modal
number of samples in figure 8a) suggests the use of range-through methods for
determining species ranges. You mention that range-through methods are susceptible
to range-extension errors, but I think it's also worth noting that the macroevolutionary
paleobiology field (i.e. the post-Sepkoski, PBDB crowd) have largely abandoned
range-through when used for studies of diversity, for (I think) a different set of reasons.
Firstly, range-through leads to fairly ugly edge effects. If you imagine a hypothetical
diversity history that has constant diversity, with some constant non-zero turnover rate,
and constant but imperfect preservation, you would get a convex diversity curve—while
you'd be able to range through things in the middle, you can't (by definition) range
through the beginning or end points of your time series. If you compare your fig. 10 plots
for in-bin vs range-through Neptune data, I think you can see this problem in action.
Secondly, range-through ignores the many biases, mostly related to uneven sampling
(in the broadest sense) through time. This goes back to Raup's 1976 criticism of the
earliest Sepkoski curves (here:
http://www.cornellcollege.edu/geology/courses/Greenstein/paleo/raup76.pdf), and I
think this is what ultimately spurred the development of the occurrence-level databases,
because they allowed for a correction of unequal sampling. [How well these corrections
actually work is another matter, and they all have their own biases, but range-through
ignores the problem altogether].
Also starting on page 15, you go through the example of comparing the range-through
Neptune diversity of forams in the 5-6 Ma bin to Plankrange list. While I think this is a
really interesting exercise, there is a number of reasons why I would be more hesitant to
interpret the results as a 65/140 species error rate in Neptune. [As a side note before I
say what those reasons are, I found it hard to keep the numbers of different categories
of taxa (in Plankrange, but misplaced; in Plankrange, in right bin; etc) in my head, so I
found myself scribbling down a little table as I was reading through this section. I still
didn't understand how you ended up with 65 "valid" taxa and 40 to 50 "invalid" taxa (that
makes 105-110, neither the total number of 140 you cite for range-through, nor the 102
that are actually in the bin). I think it would make it easier to read if the numbers were
also in a little table rather than only described in the main text. Ideally, I'd like to see a
big table listing all of the species in both the databases, lined up in adjacent columns so
you can see which ones match in which categories—maybe in the supplementary info, if
the journal has it. I think that would make it much easier to see what's going on.]
Firstly, I think it's conceptually problematic to use the Plankrange diversity as an
indicator of "known", or true, foram diversity, against which to measure Neptune. From
what I can see, that database is compiled overwhelmingly from biostratigraphic
publications, suffering from precisely the "model B" bias that you identified as being so
pernicious above. Each of those publications is, again, only going to record the minimal
set of stratigraphically useful taxa to build a zonation. Neptune, consisting by your
description of "model C" data, at least thus has some component of "model A" data in
addition to the "model B" data; so a priori I would expect Neptune to capture a greater
proportion of the true diversity on the slides than Plankrange. So the 30 taxa you
couldn't match to Plankrange at all, for example, could all be valid taxa that just haven't
found their way into biostratigraphic schemes. I don't know if this is reasonable or not,
but if it is, you could now turn around and rather than seeing this as 30/140 erroneous
taxa in Neptune, see 30/71 taxa missing from Plankrange.
Secondly, you do acknowledge that part of the Neptune occurrences in the 5-6 bin
could be legitimate, non-erroneous range extensions, but you suggest that most of them
are there as a result of reworking or age model errors. Maybe I'm just being hopelessly
optimistic here for Neptune, but rather than an assurance that "several of these … were
examined", I'd prefer to see a table with each of those taxa and a more convincing
assessment of whether they're really displaced or just new, real data. And I only say this
because I was very convinced (as well as troubled) by your description on page 12
(under "Reworking") of the difficulty of establishing the range of a species in the first
place—the trap of circular reasoning looms large here, to my mind.
In the "Solutions" section, I did find the promise of having an essentially perfect fossil
record as an attainable research goal quite inspiring. I would like to see a bit more
calculation—totally back of the envelope—of how much work you think that would entail.
Can it be done by one micropaleontologist in five years? By ten in ten years? Fifty
people in twenty years? Or, perhaps more appropriately, how many (wo-)man-hours
would it take? If it's not reasonable to guess for the other groups, I'd love to see that
number at least for rads. That said, I'm not convinced that simply because the data we
have are imperfect, and that perfect data are within a generation's reach, this means we
can't do any macroevolutionary studies with Neptune data at all. Perhaps that was not
the intended implication?
In the "Analytic methods" section (p. 17-19), I had a few more comments.
1) I'm not sure that the boundary-crosser method would eliminate the range extension
problem, since it essentially tallies range-through diversity at a boundary, i.e. it
eliminates singletons and taxa that may not have coexisted in a bin—erroneous range
extensions will still inflate boundary-crosser counts if they lie on the other side of a
boundary from the true LAD of the taxon.
2) I understand your logic here with regard to subsampling—randomly removing
occurrences is likely to trim off those relatively rarer instances of erroneous range
extension. However, I think this will work much better when subsampling is done by
occurrence (i.e. by classical rarefaction); when subsampling by list, as is the case with
most of the techniques Rabosky & Sorhannus used (UW, OW, O2W)—if the errors are
evenly distributed among lists—you may actually still be stuck with most of the problem,
because you won't be able to throw out the few erroneous occurrences without also
throwing out all of the valid occurrences that are on the same list. However, that's a bit
beside the point. My main issue here is that this paragraph seems to imply that
subsampling is carried out in order to remove these erroneous data points, which I think
is not accurate. As I understand it, the approach of Rabosky & Sorhannus was to try to
standardize for the exponential increase in the number of occurrences throughout the
Cenozoic, and thus to attempt to remove a potentially very powerful bias in sampling
intensity from the diversity data—not to trim off erroneous occurrences.
3) As regards losing diversity of rare taxa by subsampling strongly, this is undoubtedly
true—but I think advocates of sampling standardization would argue that it's necessary
to do in order to compare apples to apples. And while it's also true that the resulting
diversity curves give only relative changes in diversity, rather than absolute ones, I think
the Alroys of the world would again argue that it's better to have a reliable relative
diversity curve than an absolute one that is subject to strong sampling bias.
4) I also disagree with your wording that "resampling does not actually identify the
sources of error, i.e., the samples in the database from which the problematic data
come". It's true that it doesn't address the errors you outline in the iRAT section, but it
does address a source of error you describe on page 9 and in figure 6, namely the
uneven intensity of sampling with time. I think you're right that it's insufficient to
subsample without considering the RAT sources of error, but subsampling does
address a real source of error.
I really like the pacman method! I'm very pleased the paper builds up to such a practical
and positive end. To sum it all up, I think it's tremendously helpful to have such a
comprehensive summary of the nature of the Neptune data, the sources of error, and a
method of addressing those sources of error. The one source of error you don't really
talk about much beyond the mention in figure 6 and on page 9 is the uneven sampling
intensity, and I do think it's one that's worth addressing in addition to the RAT sources of
error. What occurred to me upon reading your paper was that a combination of data
filtering by a method like pacman (to address RAT error) followed by some form of
subsampling (to address uneven sampling) would be a comprehensive approach that
would address all sorts of errors. But that's just a thought.
Alright. My apologies again for having been so slow to get this back to you, but it really
got me thinking and I wanted to do it right. I hope some of my comments are useful. I
look forward to seeing the paper in print! What journal is it going to?
All the best,
- Ben.