Optimal sampling and problematic likelihood functions in a simple population model

D.E. Pagendam and P.K. Pollett

[Full Text]

Abstract: Markov chains provide excellent statistical models for studying many natural phenomena that evolve with time. One particular class of continuous-time Markov chain, called birth-death processes, can be used for modelling population dynamics in fields such as ecology and microbiology. The challenge for the practitioner when fitting these models is to take measurements of a population size over time in order to estimate the model parameters, such as per capita birth and death rates. In many biological contexts, it is impractical to follow the fate of each individual in a population continuously in time, so the researcher is often limited to a fixed number of measurements of population size over the duration of the study. We show that for a simple birth-death process, with positive Malthusian growth rate, subject to common practical constraints (such as the number of samples and timeframes), there is an optimal schedule for measuring the population size that minimises the expected confidence region of the parameter estimates. This type of experimental design results in a more efficient use of experimental resources, which is often an important consideration.

Throughout our exposition of the optimal experimental design, we compare it to a simpler equidistant design, where the population is sampled at regular intervals. This is an experimental design worthy of comparison, since it can represent a much simpler design to implement in practice. We acknowledge that practitioners are likely to prefer using the simplest possible design and therefore focus upon the conditions under which the optimal design is expected to be particularly beneficial over the simpler alternative. We also examine the exact manner in which the optimal design acts to minimise the area of the confidence region compared to the alternative design. that incorporates both birth and death.

In order to find optimal experimental designs for our population model, we make use of a combination of useful statistical machinery. Firstly, we use a Gaussian diffusion approximation of the underlying discrete-state Markov process which allows us to obtain analytical expressions for Fisher's Information matrix (FIM), which is crucial to optimising the experimental design. We also make use of the Cross-Entropy method of stochastic optimisation for the purpose of maximising the determinant of FIM to obtain the optimal experimental designs. Our results show that the optimal schedule devised by Becker and Kersting (1983) for a simple model of population growth without death can be extended, for large populations, to the two-parameter model that incorporates both birth and death.

Population models have a history of creating problematic likelihood functions with high levels of dependence between model parameters (see Givens and Poole (2002)). For the simple birth-death process, we find that the likelihood surface is also problematic and poses serious problems for point estimation and easily defining confidence regions. There is also very high level of correlation between the estimates for the birth and death rates indicated by the likelihood surface having a long narrow ridge falling away steeply at either side.

We use simulation to examine the practical benefits of the optimal design over an equidistant design. Unless the period of time over which the population is observed is very long, the optimal design is only likely to provide a significant efficiency gain when the number of samples is relatively small. However, we find that, in general, confidence regions cannot be assumed to have elliptical contours. We therefore base our confidence regions on the asymptotic chi-square distribution of the generalized likelihood ratio, which restricts the region to the appropriate domain. Whilst the confidence regions of this type may be poor due to the atypical nature of the likelihood surface, we utilise these regions for comparative purposes, since they reflect the contours of the surface. We find that our optimal design remains optimal even though the likelihood contours are not elliptical. It is suggested that Bayesian inference and the use of an informative prior probability distribution could overcome problems associated with the likelihood surface, aiding point estimation and the understanding of parameter uncertainty for the model.

Keywords: Markov chain; optimal design; sampling; birth-death process; population model; bayesian inference

Acknowledgement: This worked was funded by the Australian Research Council.

The authors:

Back to Research Communications

Back to PKP's home page

Last modified: 31 July 2007