Is the past recoverable from the data? Pseudoproxy modelling of uncertainties in palaeoecological data

Ecology
Modelling
Virtual Ecology
Proxy System Modelling
Uncertainties

This paper is about constructing a model that generates patterns of ecological change using population growth equations with underlying dynamics such as threshold points. We show how such a model can be used to assess uncertainties in palaeoecological data using a Virtual Ecological approach. The idea is to generate patterns with the same statistical properties that we see in empirical data, and not to recreate a given ecosystem (i.e., this is a phenomenological model). Why do this? We have no experimental manipulations or direct observations of the long time-scales necessary to understand ecosystem-level change, and even the most highly resolved palaeoecological data from laminated sediment cores contain many uncertainties from environmental processes, sampling, and lab processing. In empirical palaeoecology, we have no ‘known truth’ against which to assess our inferences. By using a virtual ecological approach, we can generate data under ‘known’ conditions.

In reality, we have imperfect knowledge of a perfect world. In simulation, we have perfect knowledge of an imperfect representation of the world.

Why is this all important? Palaeoecological data provide insight into past ecological and climatic change. This source of data is invaluable to understanding ecological change and potential future ecosystem states. However, the data are highly uncertain, relying on many layers of processing such as lab processing, radiocarbon dating, and age depth modelling. The data are often inconsistently spaced through time and may have large time-gaps between observations. Such sources of uncertainty can, at best, mask ‘true’ signals of change, or at worst, lead to false inferences. One way we can assess the influence of uncertainties on statistical inference is through simulation.

If we start with our perfect knowledge of a simulated world, we can also simulate observations from that world, in the same way that we would sample from the real world. This gives us a benchmark dataset, the original simulated data, and a reduced dataset, the sampled simulated data.

Here is my simulated world (Figure 1). Each species on the right (Figure 1 d) is simulated by a population growth equation. The growth rate of each species is their combined response to the two environmental drivers (Figure 1 b). Their optima and tolerance to each driver (Figure 1 c) determine whether the population growth rate of a species is positive or negative. If the environment is unfavourable, then a species population growth rate becomes negative and it will go locally extinct. Species have the chance to re-establish if conditions become favourable again.

Figure 1: Visualisation of a simulated core sample: the accumulation rate, time-span and length of the core (a); driver conditions over time (b); the niche of each species with respect to the driving environment (c); and the abundances of pseudoproxies in the archive (d). Each species niche comprises an optima and tolerance for each driver, the optima and tolerance curves in (c) are colour-coded to match each driver in (b). In the context of the proxy system model framework, environmental drivers (b) act on a sensor (c; in this case the response of the sensor is the populations’ growth rate changing in response to the environmental drivers) and records the response of the sensor in an archive (d). The simulation runs from past to present, the first time-step being the oldest.

The number of species in (Figure 1) is only a fraction of those simulated. There are actually around 200 potential species, and about 15-40 in existence at any given point in the simulation (depending on the replicate). I simulated a bunch of different driving environments (e.g., one where the environmental driver undergoes an abrupt shift), in the example above (Figure 1) the ‘slow linear’ driver will eventually drive species that existed at the beginning of the simulation locally extinct (unless they were very generalist species!), and new species would establish and thrive.

We then take those simulated species abundances (Figure 1 d), remember this is our ‘perfect’ benchmark data, and degrade them with mixing (i.e., a physical process that alters the core), sub-sampling (i.e., slicing up the core into 1 cm segments at given intervals for processing), and proxy counting (i.e., the counting of species under a microscope from the 1 cm segments after lab processing). Figure 2 shows how we start with absolute abundances of species, and recreate the observational process, ultimately resulting in data that look a lot more like what we see in reality from proxy-data.

Figure 2: Sample model output showing two species, visualising data from the error-free reference (a) and one treatment level from each: mixing (b); mixing combined with sub-sampling (c); and mixing combined with sub-sampling and proxy counting (d). Mixing over 10 time-steps is shown as a magnified section of species 56 displaying the error-free reference transitioning to the mixed data (b). The mixed data are then sub-sampled every 10 cm, with a thickness of 1 cm (c) and the sub-sampled data are counted at a resolution of 400 individuals per sub-sample (d). Red dashed lines indicate disturbance events that occur randomly throughout the simulation.

This is what a larger subset of the simulated species looks like before and after the virtual degradation and sub-sampling process (Figure 3).

Figure 3: Visualisation of the environmental drivers, the ‘error-free’ benchmark pseudoproxy archive, and pseudoproxies mixed over 10 time-steps, sub-sampled at 10 cm intervals and counted at a resolution of 400 individuals per sub-sample. A subset of 10 species from the pool of 200 is shown. The ‘error-free’ record shows the absolute abundances per time-step and represents ‘perfect’ knowledge of the system through time, the degraded and sampled record is converted to relative abundances and represents observations comparable to empirical proxy records. Accumulation rate and depth are not pictured in favour of a more complete species record. Note that the y-axis is in simulation time, the most recent section of core is time-step 5000.

Finally, we analyse both sets of data. In simulation the three sources of uncertainty (mixing, sub-sampling, and proxy counting) are applied individually, and in combination, at increasing levels of severity. So we end up with 1210 (per simulation replicate!) datasets from the ‘error-free’ benchmark to the most degraded. We analyse them all! But here is an example of the benchmark, and one level of degradation (Figure 4).

Figure 4: Fisher Information (a) and PrCs (b) of both the ‘error-free’ pseudoproxies, and the degraded and sub-sampled data for Scenario 1. The blue highlighted region in the FI indicates a ~700-year period of reversed directionality. The highlighted regions in the PrCs indicate, in the analyses of the degraded and sub-sampled data, two periods of different rates of change in the PrC. The same regions are highlighted in the ‘error-free’ analysis where no apparent change is visible. Red dashed lines indicate disturbance events for both FI and PrCs.

This figure shows two different multivariate community analyses (Fisher Information and principal curves), and how the results change between the ‘error-free’ data and the degraded and sub-sampled dataset. At the start, I mentioned that uncertainty can lead to incorrect inference. This example is only of one randomly selected model replicate, but it shows how interpretations can be influenced by uncertainties. The 700 year-long period of reversed directionality in Fisher Information (Figure 4 A) can change our inference from a system increasing in stability, to decreasing in stability (a decrease in Fisher Information is interpreted as a system becoming less predictable less stable). The principal curve changing from a gradual increase (a good representation of change driven by the primary driver system), to a sigmoidal shape (Figure 4 B), can alter our interpretation of the ‘true’ rate of species turnover in the system. This example demonstrates well how a Virtual Ecological approach can be used to assess uncertainty and its influence on statistical inference.