Corresponding author: James Benjamin Grace (
Academic editor: Gbenga Akomolafe
In this paper, we consider the problem of how to quantitatively characterise the degree to which a study object exhibits a generalised response. By generalised response, we mean a multivariate response where numerous individual properties change in concerted fashion due to some internal integration. In latent variable structural equation modelling (LVSEM), we would typically approach this situation using a latent variable to represent a general property of interest (e.g. performance) and multiple observed indicator variables that reflect the specific features associated with that general property. While ecologists have used LVSEM in a number of cases, there is substantial potential for its wider application. One obstacle is that LV models can be complex and easily over-specified, degrading their value as a means of generalisation. It can also be challenging to diagnose causes of misspecification and understand which model modifications are sensible. In this paper, we present a protocol, consisting of a series of questions, designed to guide the researchers through the evaluation process. These questions address: (1) theoretical development, (2) data requirements, (3) whether responses to perturbation are general, (4) unique reactions by individual measures and (5) how far generality can be extended. For this illustration, we reference a recent study considering the potential consequences of maintaining biodiversity as part of agricultural management on the overall quality of grapes used for wine-making. We extend our presentation to include the complexities that occur when there are multiple species with unique reactions.
The quest for generalisation in the ecological sciences is a fundamental challenge. One way that a general reaction by a system or organism can be detected is if there is a multivariate response where numerous individual properties change in concerted fashion. While such concerted reactions are often described using standard multivariate statistical analyses, causal investigations of the nature of integrated multivariate responses fall primarily into the purview of latent variable structural equation modelling (LVSEM,
Fig.
The hypothesis represented in Fig.
Fig.
Numerous complexities can be encountered when analysing models containing latent variables with multiple indicators. It is probably safe to say that the available literature may be inadequate for the beginning user of SEM to navigate the various diagnostics and decisions required for such models. Our primary objective in this paper is to provide a series of questions that can guide the investigator through the process. Our advice is targeted for the general objective outlined in Fig.
There exist many technical descriptions of the analytical machinery used to implement LVSEM. Here, we provide a non-technical summary and refer the reader to
Fig.
The classical approach to implementing SEM involves the analysis of covariances. For this, the rows of raw data are converted into a variance-covariance square matrix. Hypothesised models represent a set of expectations about the patterns of covariances that should be found in data. Typically, covariance modelling estimates the parameters of the causal diagram via maximum likelihood while respecting the assumed causal relationships specified in the causal graph. Covariance SEM also produces a statistic that summarises the differences between the observed covariances and those predicted while agreeing with the model structure and tests the null hypothesis that the observed and predicted covariances are equal, except for random sampling variation. Failure to reject this null hypothesis is evidence that the assumed causal structure is correct.
For LVSEM, model structure is described using equations representing the relationships between latent variables and their indicators and equations describing relationships amongst latent variables (Suppl. material
The overall study objectives are summarised in Fig.
We learn about latent variables indirectly. More specifically, we learn about them through theorising and empirical investigations, rather than direct measurement. It is important, therefore, that we consider the theoretical meaning of constructs carefully and explicitly. Most ecologists are accustomed to using descriptive procedures, such as principal components analysis (PCA), when faced with a set of related measurements. PCA seeks to reduce a set of variables to some smaller number of composite variables (aka components) that contain most of the information in the set. PCA is purely a data-reduction method and there is no basis for drawing causal interpretations of the resulting components (
With LVSEM, we might pose a hypothesis, such as the one shown in Fig.
In thinking about our theoretical constructs, of fundamental importance is whether we think the concept is unidimensional (behaves like it is one thing) or multidimensional (behaves like a collection of different things). Taken literally (which software estimation will do), the hypothesis being evaluated in Fig.
There are many other possibilities that might be supported by theory. The most common alternative is that a theoretical “construct” or concept may be a collection of independent or semi-independent causes. The details that accompany this situation are beyond our purpose in this paper and the reader is referred to
When interested in a general property of a study system, it is recommended that one gives careful consideration to the previous question about expected attributes when designing the sampling scheme. This is one of those interesting differences between science practice in the social sciences versus the ecological sciences. In the social sciences, particularly when studies involve human behaviour, the default assumption is that the latent properties are of primary interest. Studies may involve human attitudes and motivations, which are assumed from the outset to be “deeply latent” and only discernible indirectly. This has led to the development of a process for careful consideration of the development of proper measures for the constructs of interest. For example, the American Association of Psychology Dictionary (
“The process of creating a new instrument [a set of specific measurements] for measuring an unobserved or latent construct, such as depression, sociability, or fourth-grade mathematics ability. The process includes defining the construct and test specifications, generating items and response scales, piloting the items in a large sample, conducting analyses to fine-tune the measure, and then readministering the refined measure to develop norms (if applicable) and to assess aspects of reliability and validity.”
Our purpose here is to raise awareness of the fact that there has been substantial development of methodologies in other scientific disciplines that could be of interest to natural scientists, but that has been systematically ignored to the detriment of our scientific studies. It is beyond the scope of the present paper to consider this body of knowledge in detail, though the expected requirements for a set of indicators to represent a theoretical construct will be illustrated via our presentation. For a more general introduction to scale development, one can refer to
When one wishes to develop a latent variable SE model, it is possible to proceed by having one or more indicator measurements. Having only a single measure provides limited opportunities. The most commonly adopted approach is to simply assume that the measured variable is a perfect representation of the latent property. The main accomplishment achieved in such a model is to make a conceptual distinction between the concept of interest and the observed measure. When we have some estimate for the reliability (repeatability) of a measurement process, we can insert that information into our model and remove bias due to measurement error. Once we have two or more indicators, it is possible to confirm or not the presence of a latent cause. This is the example situation we address in the current paper.
It is one thing to conceptualise a set of observed variables as reflections of a concept of interest, but it is another thing for the data to agree with one’s conceptualisation. A simple first approach to this problem is to construct a correlation matrix to see if the patterns of correlations amongst indicator variables are roughly consistent with theoretical expectations. For this exercise, we focus on the sub-model shown in Fig.
Fig.
When one starts working with LVSEM, it is found that there are many ways that data may deviate from showing equal correlation strengths amongst indicators, aside from error correlations, some of which are suggested in Fig.
It is customary in SEM practice to analyse latent variable models in two stages, first evaluating the fit between latent variables and indicators (Fig.
Table 3 presents the code used to conduct a CFA examination of the model shown in Fig.
Tables of results for all models run in the paper are provided in Suppl. material
Examination of results focuses initially on overall model fit (Suppl. material
Results show strong support for our initial model (Table S2.1). A test statistic (Model Chi-square) value of 0.808 with an associated
Having assessed the global model fit, we turn attention to the parameter estimates (Table S2.1). Again, we do not treat
The complexity of SE models and the variety of inferences we typically wish to make lead us to move through the evaluation of our overall hypothesis in stages. It is important to keep in mind that conclusions one might draw, based on the analysis of sub-models, may need to be reconsidered once the full model is examined. Having examined the latent response sub-model, we now move to a pair of competing models shown in Fig.
In Fig.
Results for the initial model (Fig.
As illustrated in Fig.
Our second question, represented in Fig.
Since SE models are used for explanatory representations of scientist's understanding of systems (
Regarding our example, we next turn to an examination of individual parameter estimates to determine whether model simplification of model LVmed2 is possible (Table S2.4).
LVSEM has the capacity to formally evaluate parameter equality across groups. Referred to as
If multigroup models are specified without constraints, all parameters will be independently estimated for each group by default. One way to set equality constraints across groups is to add labels to the code. In this case, one first uses the format c(“label1”, “label2”) to create names for the parameters where there are two groups. This example will generate two independent parameter estimates, one for each group, since the labels are unique. If we specify c(“lambda1”, “lambda1”), the repeated use of a common label means a single value will be estimated for both groups (Table
Using the approach in Table
It is important to be able to judge whether a system exhibits a generalised multivariate response to environmental change rather than an independent collection of uncoordinated responses. This paper presents an approach to addressing that question. A particular aspect of the approach demonstrated is that it invokes causal reasoning. We ask if suites of observed properties behave as if they are jointly influenced by a “hidden hand” or integrative cause.
Studying generalised responses is inherently challenging. Our objective is to focus our attention on the general, while moving the specifics to the background – at least initially. The sequence of operations described support a “general first, specifics second” perspective. Ultimately, SEM forces us to address both. Along the way, we must confront the large number of possible explanations that can exist for the actual functioning of the system being studied. This complexity means one cannot take a rigid approach, but must follow clues along a path to selecting a final model to use for interpretation. We suggest a series of questions that can guide investigators through several critical steps in model evaluation. In addition, we recognise that the research context matters, so the list may need to be modified for particular applications.
Success in applying a flexible, adaptive approach requires a solid understanding of how the analytical system ‘thinks’ about things. Within LVSEM, latent variables represent the common variance or overlapping information for a set of measures. They represent, in essence, the consensus opinion about the latent factor that functions as their common causal connection. There will, of course, be unique information associated with the individual measures, particularly if they are selected to represent multiple facets of a theoretical construct. Our core challenge is to capture the general opinions of the data without becoming overly distracted by the unique responses.
Fig.
A number of mysteries are exposed in our multigroup model (Fig.
It is our hope that this paper demonstrates both how to approach using LVSEM to investigate multivariate responses and also to hint at the variety of scientific insights that can be gleaned from the effort. We believe there is an important opportunity for LVSEM to play a greater role in our quantitative understanding of ecological responses to environmental change.
We thank two anonymous reviewers for helpful comments and suggestions. This work was supported by the USGS Ecosystems and Land Change Science Climate Research and Development Programs. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Structural equation meta-model representing the general modelling goal. Note that a meta-model is a generalisation that defines a finite set of possible fully-specified models. Dotted outlines are used to convey that the entities represented are general concepts rather than specific variables.
Structural equation model representing the hypothesis that observed intercorrelations amongst response indicator variables (1 – 4) can be explained by a common cause, the generalised response. In contrast to Fig.
Example SE model for the study of a general response to a specific perturbation and the role of a specific hypothesised mediator variable. See text for a discussion of notation.
Meta-model for the ecological example referenced in this paper (modified from
(A) The initial hypothesis evaluated by
(A) The Net Effect Model. (B) The Mediated Effect Model.
Standardised results for the final full model showing both groups.
Concepts related to the structural equation meta-model in Figure 4 and their relationships to measured variables (from
|
|
|
Management intensity | Intensity is a three-level index {1,2,3}. |
The primary purpose of management is to reduce competitive effects of non-crop plants on grape plants. It is assumed that competition primarily acts through reductions in soil water and nutrients, but other forms of interference could be possible. |
Non-crop vegetation properties | Plant species richness (numbers), |
One possibility we wished to consider was a general beneficial effect of plant richness on grape qualities due to complementarity. |
Soil nitrogen | Total soil N content (%) | We considered it possible that variations in total soil N might help explain variations in grape N. Such an effect either might or might not be indirectly related to management intensity. |
Grape qualities | Nitrogen concentration |
We measured a suite of standard grape chemical parameters of importance for wine-making. While all of these parameters determine the character of wine, N concentration is perhaps of primary concern because of its critical role in the fermentation process (Bell and Henschke 2005) |
Correlations amongst simulated indicators of Grape Qualities.
Nitrogen | Sugars | Tartric Acid | Malic Acid | |
Nitrogen | 1.00 | |||
Sugars | -0.53 | 1.00 | ||
Tartric Acid | 0.40 | -0.36 | 1.00 | |
Malic Acid | 0.61 | -0.41 | 0.27 | 1.00 |
R code for the Latent Response Model (Fig.
library(lavaan) | |||||
input.cov <- ' | |||||
2.602 | |||||
-1.187 | 1.896 | ||||
1.038 | -0.781 | 2.536 | |||
1.270 | -0.726 | 0.559 | 1.688 | ||
-0.592 | 0.451 | 0.147 | -0.219 | 1.670 | |
0.821 | -0.364 | -0.455 | 0.578 | -0.864 | 1.366 ' |
cov.dat <- getCov(input.com, names = c("N", "Sugars", "Tart", "Malic", "Nfixers", "Mgt")) | |||||
cfa1 <- 'GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ' | |||||
cfa1.fit <- sem(cfa1, sample.cov = cfa.cov.dat, sample.nobs = 50) |
R code for examining the Net Effect Model (Fig.
## |
LVNet1 <- ' |
LVNet1.fit <- sem(LVNet1, sample.cov=cov.dat, sample.nobs=50 |
show(LVNet1.fit); fitMeasures(LVNet1.fit, "cfi") |
## |
LVNet2 <- ' |
R code for examining the Mediated Effect Model (Fig.
### |
LVmed1 <- ' |
### |
LVmed2 <- ' |
Example R code for multigroup analysis of full model.
## |
mg.mod0 <- ' |
## |
mg.mod1 <- ' |
R code for final multigroup analysis of full model.
mg.mod4 <- ' |
A protocol for modelling generalised biological responses using latent variables in structural equation models
Mathematical equations and notation for latent variable structural equation modelling.
This text file contains the equations and notation mentioned in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem
File: oo_559472.pdf
A protocol for modelling generalised biological responses using latent variables in structural equation models
Results Tables
This file contains the results tables for the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.
File: oo_562638.pdf
A protocol for modelling generalised biological responses using latent variables in structural equation models
R code
: This text file contains the R code used to develop the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.
File: oo_559445.R