One Ecosystem :
Review Article
|
Corresponding author: James Benjamin Grace (gracej@usgs.gov)
Academic editor: Gbenga Akomolafe
Received: 14 Apr 2021 | Accepted: 03 Jul 2021 | Published: 08 Jul 2021
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem 6: e67320. https://doi.org/10.3897/oneeco.6.e67320
|
|
In this paper, we consider the problem of how to quantitatively characterise the degree to which a study object exhibits a generalised response. By generalised response, we mean a multivariate response where numerous individual properties change in concerted fashion due to some internal integration. In latent variable structural equation modelling (LVSEM), we would typically approach this situation using a latent variable to represent a general property of interest (e.g. performance) and multiple observed indicator variables that reflect the specific features associated with that general property. While ecologists have used LVSEM in a number of cases, there is substantial potential for its wider application. One obstacle is that LV models can be complex and easily over-specified, degrading their value as a means of generalisation. It can also be challenging to diagnose causes of misspecification and understand which model modifications are sensible. In this paper, we present a protocol, consisting of a series of questions, designed to guide the researchers through the evaluation process. These questions address: (1) theoretical development, (2) data requirements, (3) whether responses to perturbation are general, (4) unique reactions by individual measures and (5) how far generality can be extended. For this illustration, we reference a recent study considering the potential consequences of maintaining biodiversity as part of agricultural management on the overall quality of grapes used for wine-making. We extend our presentation to include the complexities that occur when there are multiple species with unique reactions.
generalised responses, latent variables, multivariate responses, statistical modelling, structural equation modelling
The quest for generalisation in the ecological sciences is a fundamental challenge. One way that a general reaction by a system or organism can be detected is if there is a multivariate response where numerous individual properties change in concerted fashion. While such concerted reactions are often described using standard multivariate statistical analyses, causal investigations of the nature of integrated multivariate responses fall primarily into the purview of latent variable structural equation modelling (LVSEM,
Fig.
Structural equation meta-model representing the general modelling goal. Note that a meta-model is a generalisation that defines a finite set of possible fully-specified models. Dotted outlines are used to convey that the entities represented are general concepts rather than specific variables.
The hypothesis represented in Fig.
Fig.
Structural equation model representing the hypothesis that observed intercorrelations amongst response indicator variables (1 – 4) can be explained by a common cause, the generalised response. In contrast to Fig.
Numerous complexities can be encountered when analysing models containing latent variables with multiple indicators. It is probably safe to say that the available literature may be inadequate for the beginning user of SEM to navigate the various diagnostics and decisions required for such models. Our primary objective in this paper is to provide a series of questions that can guide the investigator through the process. Our advice is targeted for the general objective outlined in Fig.
There exist many technical descriptions of the analytical machinery used to implement LVSEM. Here, we provide a non-technical summary and refer the reader to
Fig.
The classical approach to implementing SEM involves the analysis of covariances. For this, the rows of raw data are converted into a variance-covariance square matrix. Hypothesised models represent a set of expectations about the patterns of covariances that should be found in data. Typically, covariance modelling estimates the parameters of the causal diagram via maximum likelihood while respecting the assumed causal relationships specified in the causal graph. Covariance SEM also produces a statistic that summarises the differences between the observed covariances and those predicted while agreeing with the model structure and tests the null hypothesis that the observed and predicted covariances are equal, except for random sampling variation. Failure to reject this null hypothesis is evidence that the assumed causal structure is correct.
For LVSEM, model structure is described using equations representing the relationships between latent variables and their indicators and equations describing relationships amongst latent variables (Suppl. material
Concepts related to the structural equation meta-model in Figure 4 and their relationships to measured variables (from
Concept of interest |
Measurements |
Scientific rationale |
Management intensity |
Intensity is a three-level index {1,2,3}. 1 = minimal control of inter-row vegetation, 2 = vegetation removal in every other row between grape plants, 3 = vegetation removal in all rows between grape plants. |
The primary purpose of management is to reduce competitive effects of non-crop plants on grape plants. It is assumed that competition primarily acts through reductions in soil water and nutrients, but other forms of interference could be possible. |
Non-crop vegetation properties |
Plant species richness (numbers), abundance of N-fixing plants (% cover) |
One possibility we wished to consider was a general beneficial effect of plant richness on grape qualities due to complementarity. Another possibility of interest was a specific effect of the abundance of N-fixing plants on grape properties due to facilitation. |
Soil nitrogen |
Total soil N content (%) |
We considered it possible that variations in total soil N might help explain variations in grape N. Such an effect either might or might not be indirectly related to management intensity. |
Grape qualities |
Nitrogen concentration Sugar concentration Tartaric acid Malic acid |
We measured a suite of standard grape chemical parameters of importance for wine-making. While all of these parameters determine the character of wine, N concentration is perhaps of primary concern because of its critical role in the fermentation process (Bell and Henschke 2005) |
The overall study objectives are summarised in Fig.
Question #1: What are the Anticipated Characteristics of the Theoretical Construct(s) of Interest?
We learn about latent variables indirectly. More specifically, we learn about them through theorising and empirical investigations, rather than direct measurement. It is important, therefore, that we consider the theoretical meaning of constructs carefully and explicitly. Most ecologists are accustomed to using descriptive procedures, such as principal components analysis (PCA), when faced with a set of related measurements. PCA seeks to reduce a set of variables to some smaller number of composite variables (aka components) that contain most of the information in the set. PCA is purely a data-reduction method and there is no basis for drawing causal interpretations of the resulting components (
With LVSEM, we might pose a hypothesis, such as the one shown in Fig.
In thinking about our theoretical constructs, of fundamental importance is whether we think the concept is unidimensional (behaves like it is one thing) or multidimensional (behaves like a collection of different things). Taken literally (which software estimation will do), the hypothesis being evaluated in Fig.
There are many other possibilities that might be supported by theory. The most common alternative is that a theoretical “construct” or concept may be a collection of independent or semi-independent causes. The details that accompany this situation are beyond our purpose in this paper and the reader is referred to
Question #2: Are there Appropriate Measured Variables that can Serve as Indicators of the General Theoretical Constructs?
When interested in a general property of a study system, it is recommended that one gives careful consideration to the previous question about expected attributes when designing the sampling scheme. This is one of those interesting differences between science practice in the social sciences versus the ecological sciences. In the social sciences, particularly when studies involve human behaviour, the default assumption is that the latent properties are of primary interest. Studies may involve human attitudes and motivations, which are assumed from the outset to be “deeply latent” and only discernible indirectly. This has led to the development of a process for careful consideration of the development of proper measures for the constructs of interest. For example, the American Association of Psychology Dictionary (
“The process of creating a new instrument [a set of specific measurements] for measuring an unobserved or latent construct, such as depression, sociability, or fourth-grade mathematics ability. The process includes defining the construct and test specifications, generating items and response scales, piloting the items in a large sample, conducting analyses to fine-tune the measure, and then readministering the refined measure to develop norms (if applicable) and to assess aspects of reliability and validity.”
Our purpose here is to raise awareness of the fact that there has been substantial development of methodologies in other scientific disciplines that could be of interest to natural scientists, but that has been systematically ignored to the detriment of our scientific studies. It is beyond the scope of the present paper to consider this body of knowledge in detail, though the expected requirements for a set of indicators to represent a theoretical construct will be illustrated via our presentation. For a more general introduction to scale development, one can refer to
When one wishes to develop a latent variable SE model, it is possible to proceed by having one or more indicator measurements. Having only a single measure provides limited opportunities. The most commonly adopted approach is to simply assume that the measured variable is a perfect representation of the latent property. The main accomplishment achieved in such a model is to make a conceptual distinction between the concept of interest and the observed measure. When we have some estimate for the reliability (repeatability) of a measurement process, we can insert that information into our model and remove bias due to measurement error. Once we have two or more indicators, it is possible to confirm or not the presence of a latent cause. This is the example situation we address in the current paper.
Indicator validity refers to the requirement that measured variables are interpretable as measures of the concept of interest. This is a theoretical requirement, but one to not forget to address in a paper. We recommend the construction of a table such as Table 1 as a formal means of defining explicitly the basis for explaining the logic connecting indicators to latent variables.
Question #3: What do the Patterns of Intercorrelations Amongst Indicator Variables Suggest?
It is one thing to conceptualise a set of observed variables as reflections of a concept of interest, but it is another thing for the data to agree with one’s conceptualisation. A simple first approach to this problem is to construct a correlation matrix to see if the patterns of correlations amongst indicator variables are roughly consistent with theoretical expectations. For this exercise, we focus on the sub-model shown in Fig.
Fig.
When one starts working with LVSEM, it is found that there are many ways that data may deviate from showing equal correlation strengths amongst indicators, aside from error correlations, some of which are suggested in Fig.
Nitrogen |
Sugars |
Tartric Acid |
Malic Acid |
|
Nitrogen |
1.00 |
|||
Sugars |
-0.53 |
1.00 |
||
Tartric Acid |
0.40 |
-0.36 |
1.00 |
|
Malic Acid |
0.61 |
-0.41 |
0.27 |
1.00 |
Question #4: Do Analyses Support There Being a Generalised Response?
It is customary in SEM practice to analyse latent variable models in two stages, first evaluating the fit between latent variables and indicators (Fig.
Table 3 presents the code used to conduct a CFA examination of the model shown in Fig.
R code for the Latent Response Model (Fig.
library(lavaan) |
|||||
input.cov <- ' |
|||||
2.602 | |||||
-1.187 | 1.896 | ||||
1.038 | -0.781 | 2.536 | |||
1.270 | -0.726 | 0.559 | 1.688 | ||
-0.592 | 0.451 | 0.147 | -0.219 | 1.670 | |
0.821 | -0.364 | -0.455 | 0.578 | -0.864 | 1.366 ' |
cov.dat <- getCov(input.com, names = c("N", "Sugars", "Tart", "Malic", "Nfixers", "Mgt")) |
|||||
cfa1 <- 'GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ' |
|||||
cfa1.fit <- sem(cfa1, sample.cov = cfa.cov.dat, sample.nobs = 50) |
Tables of results for all models run in the paper are provided in Suppl. material
Examination of results focuses initially on overall model fit (Suppl. material
Results show strong support for our initial model (Table S2.1). A test statistic (Model Chi-square) value of 0.808 with an associated p-value of 0.668 was found. This p-value is well above the 0.05 criterion, providing strong support for there not being major model-data discrepancies. A Comparative Fit Index value of 1.000 further indicates a near-perfect explanation of the observed covariances by the model. Thus, it is extremely unlikely that additions to our model, such as shown in Fig.
Having assessed the global model fit, we turn attention to the parameter estimates (Table S2.1). Again, we do not treat p-values as absolute cutoffs, but instead as continuous measures of evidence that a parameter or model deviates from the default expectation (
Question #5: Does the Generalised Response Exhibit a Concerted Reaction to Perturbation? and
Question #6: Are there Unique Reactions by Specific Indicators?
The complexity of SE models and the variety of inferences we typically wish to make lead us to move through the evaluation of our overall hypothesis in stages. It is important to keep in mind that conclusions one might draw, based on the analysis of sub-models, may need to be reconsidered once the full model is examined. Having examined the latent response sub-model, we now move to a pair of competing models shown in Fig.
In Fig.
## Initial Net Effect Model |
LVNet1 <- ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt GrapeQual ~ gamma1*ManInten' |
LVNet1.fit <- sem(LVNet1, sample.cov=cov.dat, sample.nobs=50 |
show(LVNet1.fit); fitMeasures(LVNet1.fit, "cfi") subset(modindices(LVNet1.fit), mi>3) |
## Revised Net Effect Model |
LVNet2 <- ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt GrapeQual ~ gamma1*ManInten Tart ~ gamma3*ManInten' |
Results for the initial model (Fig.
As illustrated in Fig.
Our second question, represented in Fig.
### Initial Mediated Effect Model |
LVmed1 <- ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt NonCrop =~ lambda6*Nfixers GrapeQual ~ gamma1*ManInten + beta1*NonCrop NonCrop ~ gamma2*ManInten' |
### Revised Mediated Effect Model |
LVmed2 <- ' GrapeQual =~ lambda1*N +lambda2*Sugars +lambda3*Tart +lambda4*Malic ManInten =~ lambda5*Mgt NonCrop =~ lambda6*Nfixers GrapeQual ~ gamma1*ManInten + beta1*NonCrop NonCrop ~ gamma2*ManInten Tart ~ gamma3*ManInten' #added direct effect |
Question #7: Can we Simplify the Model, Thereby Increasing Generality?
Since SE models are used for explanatory representations of scientist's understanding of systems (
Regarding our example, we next turn to an examination of individual parameter estimates to determine whether model simplification of model LVmed2 is possible (Table S2.4). P-values provide strong support for all estimated lambdas (all < 0.001), as well as all other estimated parameters, except beta1 (p = 0.713), which is the effect of the mediator Non-Crop Vegetation on Grape Qualities. We estimated a simplified model (not shown) with beta1 set to zero (beta1 == 0) and determined that model fit was improved, as discrepancy increased very slightly while the number of estimated parameters was reduced by one (and fit is a measure of the amount of discrepancy prorata to the number of estimated parameters). We continue discussing ways to minimise the number of estimated parameters in the next section where we address the complexity that arises when there is more than one variety of grape being modelled.
Question #8: What About Generality Across Groups?
LVSEM has the capacity to formally evaluate parameter equality across groups. Referred to as multi-group analysis, the investigator can test hypotheses by asking whether models of the same general form apply beyond single groups. With regard to the Swiss grape study, the investigators sampled vineyards that cultivated two different varieties of grapes, Chasselas and Pinot noir. Suppl. material
If multigroup models are specified without constraints, all parameters will be independently estimated for each group by default. One way to set equality constraints across groups is to add labels to the code. In this case, one first uses the format c(“label1”, “label2”) to create names for the parameters where there are two groups. This example will generate two independent parameter estimates, one for each group, since the labels are unique. If we specify c(“lambda1”, “lambda1”), the repeated use of a common label means a single value will be estimated for both groups (Table
## CFA independence model with distinct labels for each group |
mg.mod0 <- ' GrapeQual =~ c("lambda1a","lambda1b")*N + c("lambda2a","lambda2b")*Sugars + c("lambda3a","lambda3b")*Tart + c("lambda4a","lambda4b")*Malic' |
## CFA model with parameters equal across groups (using repeat labels) |
mg.mod1 <- ' GrapeQual =~ c("lambda1","lambda1")*N + c("lambda2","lambda2")*Sugars + c("lambda3","lambda3")*Tart + c("lambda4","lambda4")*Malic' |
Using the approach in Table
mg.mod4 <- ' # declare latent variables GrapeQual =~ c("lambda1","lambda1")*N + c("lambda2","lambda2")*Sugars + c("lambda3","lambda3")*Tart + c("lambda4","lambda4")*Malic ManInten =~ c("lambda5","lambda5")*Mgt NonCrop =~ c("lambda6","lambda6")*Nfixers # regressions GrapeQual ~ c("gamma1a","gamma1b")*ManInten + c("beta1a","beta1b")*NonCrop NonCrop ~ c("gamma2a","gamma2b")*ManInten Tart ~ c("gamma3a","gamma3b")*ManInten # set constraints beta1a == 0 gamma3b == 0 gamma2a == gamma2b' |
It is important to be able to judge whether a system exhibits a generalised multivariate response to environmental change rather than an independent collection of uncoordinated responses. This paper presents an approach to addressing that question. A particular aspect of the approach demonstrated is that it invokes causal reasoning. We ask if suites of observed properties behave as if they are jointly influenced by a “hidden hand” or integrative cause.
Studying generalised responses is inherently challenging. Our objective is to focus our attention on the general, while moving the specifics to the background – at least initially. The sequence of operations described support a “general first, specifics second” perspective. Ultimately, SEM forces us to address both. Along the way, we must confront the large number of possible explanations that can exist for the actual functioning of the system being studied. This complexity means one cannot take a rigid approach, but must follow clues along a path to selecting a final model to use for interpretation. We suggest a series of questions that can guide investigators through several critical steps in model evaluation. In addition, we recognise that the research context matters, so the list may need to be modified for particular applications.
Success in applying a flexible, adaptive approach requires a solid understanding of how the analytical system ‘thinks’ about things. Within LVSEM, latent variables represent the common variance or overlapping information for a set of measures. They represent, in essence, the consensus opinion about the latent factor that functions as their common causal connection. There will, of course, be unique information associated with the individual measures, particularly if they are selected to represent multiple facets of a theoretical construct. Our core challenge is to capture the general opinions of the data without becoming overly distracted by the unique responses.
Fig.
A number of mysteries are exposed in our multigroup model (Fig.
It is our hope that this paper demonstrates both how to approach using LVSEM to investigate multivariate responses and also to hint at the variety of scientific insights that can be gleaned from the effort. We believe there is an important opportunity for LVSEM to play a greater role in our quantitative understanding of ecological responses to environmental change.
We thank two anonymous reviewers for helpful comments and suggestions. This work was supported by the USGS Ecosystems and Land Change Science Climate Research and Development Programs. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
This text file contains the equations and notation mentioned in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem
This file contains the results tables for the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.
: This text file contains the R code used to develop the demonstrations included in Grace JB, Steiner M (2021) A protocol for modelling generalised biological responses using latent variables in structural equation models. One Ecosystem.