May the matrix be with you! Guidelines for the application of expert-based matrix approach for ecosystem services assessment and mapping

Matrices or look-up tables are increasingly popular ﬂ exible tools for ecosystem services mapping and assessment. The matrix approach links ecosystem types or land cover types to ecosystem services by providing a score for ecosystem service (ES) capacity, supply, use, demand or other concepts. Using expert elicitation enables quick and integrative ES scoring that can meet general demand for validated ES mapping and assessment at di ﬀ erent scales. Nevertheless, guidance is needed on how to collect and integrate expert knowledge to address some of the biases and limits of the expert elicitation method. This paper aims to propose a set of guidelines to produce ES matrices based on expert knowledge. It builds on existing literature and experience acquired through the production of several ES matrices in several ES assessments carried out in France. We propose a 7-steps methodology for the expert-based matrix approach that aims to promote cogency in the method and coherency in the matrices produced. The aim here is to use collective knowledge to produce semi-quantitative estimates of ES quantities and not to analyse individual or societal preferences or importance of ES. The de ﬁ nition of the objectives and the preparation phase is particularly important in order to de ﬁ ne the components of capacity to demand ES chain to be addressed. The objectives and the ES components addressed will in ﬂ uence the composition of the expert panel. We recommend an individual


Introduction
Increasing demand for local and regional-scale ecosystem service (ES) mapping and assessment to support biodiversity management (Nagendra et al. 2013, Posner et al. 2016), land-use planning (Darvill andLindo 2015, Kopperoinen et al. 2014) and environmental impact assessment (Geneletti 2013) has driven a need for robust and scientifically sound methods for assessing ES capacities, demands and/or preferences (Harrison et al. 2017).
Various methods for assessing ES have been used in various studies (see Burkard andMaes 2017, Harrison et al. 2017), requiring various types and degrees of expertise from people implementing them and mobilising various amounts of data (Harrison et al. 2017).The right choice of method should articulate the goals of assessment and mapping (Jacobs et al. 2015) but also the applicability and appropriation of the method and the results expected by stakeholders and land managers.Beyond quantifications by expert-scientists, the ES concept can only be put into action if the assessments are understood and supported by the end-users (Harrison et al. 2017).The actual complexity and perceived difficulty in interpreting results is a limiting factor to be considered in any ES assessment and mapping exercises (Jacobs and Burkhard 2017).Furthermore, acquiring, compiling and processing multiple data sources can prove challenging if not intractable for many ES, particularly at finer scales.
Amongst the different methods available, spatial proxy models -which link land cover classes or ecosystem types to ES -are very flexible and readily adaptable to different data sources and modelling techniques.The matrix approach -particularly with participatory expert-based scoring as developed here -has proven to provide rapid and readily-appropriable ES assessments (Burkhard et al. 2011, Burkhard et al. 2012, Jacobs et al. 2015).End-users, principally land managers and decision-makers, can appropriate them quickly and apply them to implement land-use planning or resource management policies that need an understanding of the importance of specific land cover or ecosystem types rather than precise estimates of ES quantities (Grêt-Regamey et al. 2008).
The matrix approach is increasingly being applied (Fig. 1) in various contexts, which offers good feedback on the pros and cons of the method.It has been applied in many case studies (e.g.Hermann et al. 2013, Kroll et al. 2012, Stoll et al. 2015, Vihervaara et al. 2010), in countries from Austria and Hungary (Hermann et al. 2013) to China (Cai et al. 2017), the USA (Cottillon 2013), Thailand (Kaiser et al. 2013), France (Campagne et al. 2017) and Nepal (Paudyal et al. 2015) and at scales from local (Nedkov and Burkhard 2012) to national (Depellegrin et al. 2016) and continent-wide (Stoll et al. 2015).The diversity of application confirms that "the approach has the potential to integrate all kinds of ES-related data based on diverse scientific disciplines or ES quantification methods and of varying quality and quantity in illustrative matrix tables and maps" (Burkhard 2017).Grêt-Regamey et al. (2015) proposed a tiered approach to ES mapping that classifies ES mapping and assessment methods into 3 levels (tiers) of increasing complexity from tier 1 to tier 3.Under this tiered framework of ES assessment methods, the matrix approach and expert knowledge is classified as tier 1 or tier 2 depending on the complexity of the methods used to build the matrix (Grêt-Regamey et al. 2017).Several aspects of capacity and demand can be addressed, including biophysical capacity, economic demand and societal preferences.Burkhard (2017) propose a methodology of the matrix application for ES mapping based on ES indicators and a collection of suitable spatial data.Nevertheless, if source data is scarce, fragmented or not publicly available, expert knowledge can be the best available source or proxy for ES estimates (Jacobs and Burkhard 2017, Kienast et al. 2009).The Trends in the number of published studies mobilising the matrix approach to assess ecosystem service supply and demand between 2008 and 2017 (source: WoS, Google Scholar and Scopus).
expert knowledge is commonly used as a surrogate for empirical data in many fields of ecological research (Drescher et al. 2013).Experts use a combination of field observations, formal knowledge and mental models to generate quantitative information (Fazey et al. 2006.The use of expert elicitation methods is shown to be an efficient option to address the simplicity/complexity and precision/uncertainties trade-offs (Jacobs et al. 2015).Hou et al. (2013) andJacobs et al. (2015) addressed critiques of the matrix approach and some of the issues regarding consideration of experts' uncertainties have been address in Campagne et al. (2017).Expert-based assessment has many well-known biases (subjectivity, interpretations etc.) that some methods help to reduce (Martin et al. 2012, Mukherjee et al. 2018).
The methodology we propose here aims to reduce biases linked to interpretation and influence biases when using the expert-based matrix approach for ES scores elicited bestknowledge estimates, not individual preferences.Indeed, it is possible to estimate the biophysical capacity, economic or social value of ecosystem services of different ecosystem types by knowledge or preference elicitation methods based on individual interviews and/or collective deliberative workshops.This is the case for the widely-used multi-criteria approaches, ranking methods and life satisfaction approaches and also for other less frequently used methods, such as the Q method (Rodríguez-Vargas and Marburg 2011) or the Delphi prospective approaches (Martin et al. 2012).
The method presented here is a knowledge elicitation protocol aiming to evaluate the capacity of different ecosystem types to provide their best evaluation of the potential (or the use, the demand etc.) of each ecosystem type to provide ES.So the ES scores elicited are not individual preferences (i.e.Kopperoinen et al. 2017), but best-knowledge estimates (i.e.Stoll et al. 2015).We propose to adopt an explicitly structured and robust procedure, involving several steps linked with the study design, capacity building, scoring and expert accuracy and uncertainty assessment.

Methods
Guidance is needed on how expert knowledge should be collected and integrated to construct scientifically sound ES evaluations (Martin et al. 2012).Here we propose general guidelines on how to implement expert-based ES matrices based on published papers using the matrix approach and a series of local and regional assessments conducted in France with groups of experts.We propose a 7-steps methodology for producing semiquantitative estimates of ES quantities based on expert knowledge that starts by defining the goal that will characterise the precision in the definitions and lists of ES and ecosystem types and ends by the different outputs possible (Fig. 2).

Step 1 -Goal and preparation phase
Clarifying the goals and objectives of the assessment is the first step of the process.The objectives will determine the list and degree of detail of ES and ecosystem types (ET) considered which in turn will determine the size of the matrix and influence how the participatory approach is set (Jacobs and Burkhard 2017).For example, if the goal is ES mapping, the list of ET has to be based on a land cover (LC)/ET map where map scale will define the precision of the ES list.
The flexibility of the matrix approach means it can be readily adapted to assess different components of the ES supply-demand chain: supply (e.g.Stoll et al. 2015, Depellegrin et al. 2016, Egarter Vigl et al. 2017), flows (e.g.Burkhard et al. 2012) or demand (e.g.Tao et al. 2018).Nevertheless, it is crucial to define precisely which ES components are to be assessed and to adapt the definitions to achieve the best common understanding between the members of the expert panel.It should be stated that the objectives are not a cocreation of the expert panel but should be prepared with key stakeholders for the area considered.
(a) Ecosystem services and ecosystem types Linked to the objectives, the ES and ET evaluated have to be precisely defined with detailed and well defined definition and examples.Any list of ES could be used, but we do recommend using a standard list as reference, such as CICES V5.1 (Haines-Young and Detailed methodology of the proposed 7-stepped guideline for expert-based matrix approaches Potschin 2018) or MEA (e.g.Chaudhary et al. 2016).This list can be adapted to meet the assessment goal and be relevant for local stakeholders.Moreover, the vocabulary used to define ES as well as the ES examples of indicators has to be adapted to the assessment context.
ET are basically the ecosystem or LC types found in the case-study area which, depending on the goal, may be based on different typologies and on the LC data available.One limit would be the number of ET to be considered: a highly precise typology will lead to a very large matrix, which will create time management issues and difficulties for experts to provide estimates on closely-related ET.It is possible to reduce the number of ET by merging ET that are able to provide the same levels of ES.This was done in the application on the Scarpe-Escaut Regional National Park (RNP) in France (Campagne et al. 2017).ARCH (Assessing Regional Habitat Change, www.archnature.eu) is an ecosystem/land cover map typology with 64 ET that have been merged into 33 ET based on their considered ability to provide the same ES.
The preparation phase following the definition of the objectives should be done with key stakeholders with a helicopter view of important issues for the area considered in order to define a tailored ES and ET lists and adapt each title, definition and example to the ES and ET for the case study.In order to be efficient, we recommend that the ES and ET lists are definitively defined and validated before the workshop and not re-discussed during the scoring workshop.

(b) Expert panel
Our definition of 'expert' is a person with extensive knowledge or skills based on research, experience or occupation in a particular field who could give a reasonable evaluation of the ES associated with different ET within the area studied (Drescher et al. 2013).A stakeholder is defined as a person who affects or is affected by a decision or action (Reed et al. 2009, Freeman 1984).Some stakeholders could be experts and vice versa, but determining the local values of ES requires persons that have the expertise needed.This is an important point to be clarified during the preparation phase after the definition of the objectives in interaction with key stakeholders.The key stakeholders are also particularly important to identify and contact experts to be included in the panel based on the objective and the component of the ES capacity-demand chain considered.It is essential for transparency and consistency to invite experts that are credible for the ES component evaluated and the knowledge needed (Jacobs et al. 2015).As an example, in the case of a supply-and-demand ES assessment, the two matrices involve different sets of expertise and knowledge and consequently different panels of experts.The objective is to mobilise local and regional experts from various authorities, including natural resource managers, NGOs or any other group with insider knowledge of local and regional conditions (Kopperoinen et al. 2017).The number of experts to be invited is a complex question, but it is likely that biaises risk decreases when the panel size increases (Drescher et al. 2013).Based on statistical resampling of panel members, Campagne et al. (2017) proposed that the expert panel, no matter the experts' profile, should count at least 10 experts for a quorum and 15 to 20 for optimal size.
(c) The scoring The main scoring scheme used in the literature is based on Burkhard et al. (2009) with scores ranging from 0 to 5, where "0 = no relevant capacity, 1 = low relevant capacity, 2 = relevant capacity, 3 = moderate relevant capacity, 4 = high relevant capacity and 5 = very high relevant capacity" to produce estimates of ET capacity to provide ES.Some authors have used other rankings, such as a 0-100 scale (Koschke et al. 2012) or a 0-2 scale (Vihervaara et al. 2010).Unpublished results in France show that a larger scale (0 to 10 for example) takes more time for experts to score due to the finer shades of ES.The score estimated is for a hypothetical standard ET for an average ecosystem condition-state and average year, usually around the peak vegetation season in the given region (Burkhard et al. 2012).The meaning of the scale values have to be discussed at length with the expert panel.
Besides the score value for each ES/ET combination, we recommend asking the experts to provide an indication of their confidence on the ET and ES considered.In our case-study sites, we asked the experts to fill in a confidence index for each ES and each ET.This confidence index was used to estimate expert confidence in providing the capacity score and could be used to compute score errors (Campagne et al. 2017).Each expert was asked to state their confidence in their own knowledge on each ET and each ES via a confidence score ranking from 1 = "I don't feel confident with my score" to 2 = "I feel fairly confident with my score" or 3 = "I feel confident with my score".The individual ET and ES scores can then be summed to give a confidence score for each ES/ET combination, which can be scaled in terms of ES/ET score variability (Campagne et al. 2017).

Step 2 -Workshop
The main aim of the workshop is to harmonise understanding of the nature of the ES and the ET assessed and get a common understanding of the goal of the study.The workshop should bring together all the experts and stakeholders in the study.The presentation must be adapted to the public and its diversity of expertise.We recommend taking the time to present each ES and each ET with a precise definition, a local example and a picture.Caution should be taken to limit the cognitive biaises associated with the example and the picture and using several examples is better.The workshop is also designed to allow the experts to ask any and all questions they may have and to create interaction amongst all participants.A detailed presentation of the approach, the methodology and all definitions can help narrow differences in interpretations of the ES, the ET and, above all, the ES chain component addressed.
Different expert elicitation protocols or methods in citizen science could be used (Kopperoinen et al. 2017, Harrison et al. 2017, Priess and Kopperoinen 2017, Mukherjee et al. 2018).If the scores are determined in consensus, the workshop is composed of the discussion time.An essential issue in ES assessment is the documenting of the variability.It holds the trade-offs and potential conflicts/solutions amongst stakeholders.Consensus methods hide these, unless the process in reaching the consensus is well registered.If the filling-in process is done individually, the workshop needs to include a time when each expert gets to individually fill in a predefined line or column of the matrix, in order to train the expert on scoring and leave time for questions and discussion.
If it is impossible to bring the entire expert panel together during the workshop, the experts can be met individually in order to obtain the appropriate expert panel size and all the necessary expertise, although this option is less attractive as it prevents interaction and discussion.The analyses of the variabilities between experts (Step 5 and 6) may allow to identify ES/ET combinations with high variability, low confidence and/or low inter-rater consensus.This could open the possibility of organising a second workshop to address specifically those cases.

Step 3 -Initial matrix
Two main options can be used as starting point to the filling-in process: using a pre-filled matrix or using an empty matrix (Table 1).

Pre-filled matrix
With the development of the matrix approach, the literature counts a number of matrices that can be adapted and used as initial matrix (Fig. 1).Otherwise, a matrix can be created as a first step with quantitative data or a small (but at least suitably-sized) panel of experts and then used as initial matrix for a larger expert's panel.The initial matrix is given to the experts, who then have to adjust the pre-filled scores (i.e. the approach adopted in Stoll et al. 2015).This option may be quicker than the empty matrix option, but it does have drawbacks, particularly for the statistical analysis.The experts may be influenced by the pre-filled scores or unwilling to change values even if they harbour doubts, which may make it difficult to differentiate whether pre-filled scores left unchanged were because the expert did not think he/she had the knowledge to credibly change them or because he/she agreed with them.Some evidence existed that the use of pre-existing evidences can influence the scoring notably when it is associated with a majority effect and/or with some degree of authority (Martin et al. 2002, Gardikiotis 2017).It could be possible to track changes and actually requires some meta information regarding the choices of the expert for each score but that would cancel the time gain advantage of the method.

Empty matrix
Another approach is to start with an empty matrix where each score has to be defined by each expert.The process is longer, but it has no influence from pre-filled scores and is more adapted to computation of statistics that requires independency of the dataset.
Step 4 -Filling-in the matrix Three different options are presented to fill in the matrix: filling in consensus, full individual filling and partial individual filling.Each option can be applied with a pre-filled matrix or an empty matrix.Also the individual filling can be completed with a consensus round method as explained in Step 5.The pros and cons of each method are summarised in Table 1.Table 1.
Overview of the different options for the initial matrix and the filling-in process.
May the matrix be with you!Guidelines for the application of expert-based ...

• Consensus fill-in
All experts need to be present during a workshop and each score is discussed in order to adjust or set each score in a consensus.It is a long collective process but not a long individual process as the final matrix is defined at the end of the workshop.It is important to facilitate the discussion in order to reduce power and personality influences (Anderson and Kilduff 2009).This method might prove hard to implement for large matrices for time constraints.

• Full individual fill-in
Each expert adjusts or defines each score of the matrix.It is a long individual process and fewer experts may contribute as they have to take the time independently (feedback teaches us that an 800-scores matrix takes about 3-4 hours to complete in full starting with a empty matrix) but all expertise is taken into account and it is the most convenient method for statistical analysis since it is based on independent replicates (individual scoring) and homogeneous sampling size (same number of raters for every score).Accordingly, usual parametric statistics can be used, as well as concordance indexes (see Step 5 and 6).

• Partial individual fill-in
Each expert adjusts or completes the part of the matrix related to their expertise.Depending on the expertise, the individual process is shorter than with a full individual fill-in and allows more experts to participate.Experts will thus only provide scores for the ES/ET combination they consider within their core expertise.With an empty matrix, 10 to 15 people are needed for each score in order to reach a suitable panel size.In a partial individual fill-in process, it may prove difficult to ensure that at least 10 to 15 different experts complete each score and many specific services and LC types may not have enough statistical robustness.
In both individual fill-in procedures, the experts can continue to fill the matrix after the workshop.Whatever the matrix-filling method used, the number of "missing data" (i.e. an expert failed to provide a score) has to be kept as small as possible.Depending on the statistical analysis considered, it is useful to record information on the experts themselves (gender, level of training, job title, organisation, main type of mission, expertise on specific ET etc.) for analysis of score biases.
Step 5 -Compiling the Values

• Statistical central values and variabilities
The central value is generally computed using the arithmetic mean of all individual experts' scores for each ES/ET combination, but some authors advocate using the median of the individual scores (Kopperoinen et al. 2014).However statistical evidence supports the use of parametric statistics of Likert ranking (such as a 0-5 scale) provided that the rater panel is large enough, usually 10-15 (Norman 2010).In the mapping approach, weighting can be applied during the extrapolation at larger geographic scales, as in Hermann et al. (2013) who chose to weight scores by the base unit area.Koschke et al. (2012) elected to apply "explicit weights as the importance of the various ecosystem services might differ with respect to the context, the included stakeholders, and the investigated region".These weights are linked to specific methodological choices.Campagne et al. (2017) tried a weight based on the confidence scores, but the upshot is that an expert with more confidence than another gives a better or more realistic estimate of the ET's capacity to provide the ES.This also increase the risks associated with overconfidence and underconfidence biases (Mukherjee et al. 2018).Accordingly, we recommend not to use weights in the statistics.
Score variabilities can be estimated using the variance, the standard deviation of the scores or the standard error of the mean if the average is used or inter-quartile scores if the median is used.Score variabilities is one approach to identify variabilities in scoring agreement between experts (Campagne et al. 2017).As stated by (Martin et al. 2012), it is important that differences in judgement between expert is retained and communicated.

• Adjustment in consensus
Following the individual fill-in and the computation of central scores and variabilities, it is possible to conduct a "second round of exchange with the expert panel", notably to reconsider the scores with high variance (inspired by the Delphi approach; Martin et al. 2012).This requires a simultaneous presence of all experts or a sequence of successive scoring by the different experts but could exploit digital data to retrieve the individual scores, centralise them and determine which ones to reconsider.The use of consensus rounds helps refine the output by reaching a certain degree of convergence or stabilisation (Jacobs et al. 2015).

• Final matrix
The final output of the fill-in process is a final matrix of the central scores that can be completed by a variability matrix and confidence scores (Stoll et al. 2015, Tao et al. 2018, Campagne et al. 2017).

Step 6 -Reliability and validation
Methods based on inter-rater reliability statistics such as Krippendorff's alpha (Krippendorff 1980) intra-class correlation statistics (Müller and Büttner 1994) or other indexes of interrater reliability and agreement are required to evaluate the degree of agreement or consistency amongst the experts (Jacobs et al. 2015).Agreement refers to what extent different experts provide the same estimates, while consistency refers to how far experts rank ET capacities to provide ES in the same order, though not necessarily with the same scores.These types of index will complement the scores variability in searching to identify the ES on which the different experts disagree.Computations could easily be done in R (R Development Core Team 2017) using, for example, the "irr" package.The inter-rater reliability indices could be used to identify ES that will need more attention in order to interpret the sources of lack of agreements between experts (discipline biais, knowledge gaps, lack of relevant experts...). Step

-Outputs
Three main types of outputs are usually interesting for the stakeholders: the ES matrix, maps and bundles graphs.The ES matrix and particularly the central scores matrix can be used directly by the stakeholders for their own analyses, particularly by land managers and land planners.We recommend communicating and raising awareness about the uncertainties and variabilities outputs in their uses.Maps are a second very important type of outputs.(Burkhard 2017).The final matrix also serves to analyse ES bundles, as in the example below.The communication of the results as bundles graphs (flowers, spider graphs...) of ES values for a given ecosystem type or area allows a quick overview of the ES patterns (i.e.Fig. 5).It is also possible to produce bundle graphs of ET for a given ES.
In that case, it shows rapidly the ET having the higher or the lower capacity regarding a given ES.
It is very important that the results and outputs are communicated back to all the experts who were involved in the assessment so they can see the results of their participation and provide feedback.This diffusion also contributes to the appropriation of the results and dissemination of the study.

Example of application
As an illustrative example, we present an application of the matrix approach in the Alpilles RNP in the south of France.This ES assessment was done in August 2017 (Surun 2017).The methodology steps were applied described below. ( Step 1) The goal was to assess the potential impact of a Life+ Nature project on the ES capacity of the ET managed through actions led under the Life+ project.The LIFE is the EU's financial instrument supporting environmental, nature conservation and climate action projects throughout the EU (http://ec.europa.eu/environment/life/).The LIFE12 NAT/ FR/000107 is a LIFE + Nature project on the Alpilles RNP which aims to optimise the link between human activities and the maintenance of ornithological biodiversity, to promote the appropriation of ecological issues by local stakeholders and to strengthen ornithological recognition of the territory by enhancing certain practices.Its actions, analysed here, are restoration of a grass layer in olive groves (A1), plantation of multispecies hedgerows (A2), creation of open areas in dense garrigue scrub by shrub cutting (A3) and implementation of good forestry practices with opened forest by shrub cutting and culling (A4).
Linked to these actions, the ET selected for the matrix are not linked to landcover/use typology but to those addressed by Life actions with before-and-after action states, i.e.Bare-soil olive groves vs. Olive groves with grass cover, Multispecies hedgerows vs. Monospecific cypress hedgerows, Opened garrigue vs. Closed garrigue, Opened coniferous forest vs. Dense coniferous forest.Another 6 ecosystems were also considered in order to factor-in other important ET within the study area, i. provisioning services and 5 cultural services were chosen (presented in Fig. 3).Moreover, in the context of the local conditions, the ecosystem disservice (EDS) "Contribution to fire risks" was added to the matrix.
Following Burkhard et al. (2009), ES and EDS scores were ranked from 0 to 5 to express capacity and we asked for a confidence index from 1 to 3 following Campagne et al. (2017).
In total, the matrix included 14 ET and 24ES+1EDS giving a total of 350 scores.As we assessed the capacity, the experts invited to participate had theoretical and practical knowledge of the local environment and/or ES.The panel of experts counted 5 generalist profiles, 3 forest profiles and 2 naturalist profiles, along with specialist profiles in livestock, agriculture and hunting. ( Step 2) The workshop was held in August 2017 with all the in-panel experts and the course of the workshop followed the previous recommended description.(Step 3) As the evaluation had a very specific context (assessment of Life action impacts), there was no existing matrix so we started with an empty matrix.( Step 4) We chose a full individual fill-in to get 12 full matrices and run comparative analysis.
(Step 5) Compilation was by meansof the 12 matrices given equal weight between experts.Adjustment by consensus was not done, as the matrix was too long to be filled in during the workshop and technical reasons made it difficult to organise two workshops.The final matrix is the central scores and mean confidence scores for each ES/EDS and ET (Fig. 3).The impacts of Life actions were computed with the difference between the two linked ET states (i.e.before and after the Life actions; Fig. 4).The potential impact of the Life+ project was analysed with a Student's t-test for paired data using the R software statistics package (R Development Core Team 2017) to test whether the two scores were significantly different.
(Step 6) Comparison with quantitative data or models would have been a challenge, as the application was too specific for reliable comparison against existing data.We computed two indices of inter-rater reliability, i.e.Krippendorff's alpha and ICC, using the 'irr' package in R software.For Krippendorff's alpha, we considered the scores as interval data and, for the ICC, we considered the consistency of the scoring.The index values ranged from 0.062 to 0.90 for Krippendorff's alpha and from 0.09 to 0.90 for ICC.All the ICC values were significantly different from 0, indicating some degree of agreement between the experts, but there was nevertheless strong differences in concordance values for the different ES.This indicates that some ES have a very low level of agreement of rating amongst the experts and this has to be further analysed.Eight ES had high ICC values (>0.6), i.e.PS1 "Cultivated crops (including seaweed farming)" (0.9), PS6 "Materials and fibres" (0.63), PS9 "Biomass-based energy sources" (0.61), RS1 "Global climate regulation by greenhouse gas reduction" (0.72), RS2 "Local climate regulation and local atmospheric composition" (0.6), RS7 "Mass stabilisation and control of erosion rates" (0.63), RS8 "Protection against winds and storms" (0.81) and RS10 "Limitations of noise pollutions and odour and visual nuisances " (0.68).DS1 "Contribution to fire risks" had an ICC value of 0.72.This indicates that, for these ES and DS1, the different experts consistently scored the different ET scores in the same rank-order, which means they should be robust.Interrater reliability was very low on some ES, such as PS5 "Drinking water" (0.09), PS8 "Genetic and medicinal materials " (0.3) and CS3 "Existence and bequest" (0.27), indicating very low agreement between experts on the scores of the different ET.These results can mean that ES scores for these services lack reliability or that there are diverse interpretations of the ES between experts or these services encompass different types of knowledge that require different types of expertise.Those services should be interpreted with caution and further analyses are needed to understand the cause of the disagreement amongst experts as helicopter interviews of experts and literature reviews.
(Step 7) To illustrate the impact of Life actions, graphs bundles of ES can be used (Fig. 5).
The results show that action A1 (restoration of a grass layer in olive groves) generates benefits in 21 ES including 14 ES with significant differences in the ES provided before and after the actions (t-test, p<0.05).There were no estimated changes for 3 ES.Provision of DS1 "Contribution to fire risks" increases significantly, associated with the increase in grass cover (Fig. 5).Similar analyses can be made for the impacts of actions A2, A3 and A4.
These results need to be read in relation to the confidence scores expressed by each expert (averages shown in the margins -in grey-of Fig. 3) and to inter-rater reliability.Some When the black part of a wedge exceeds the coloured part, it indicates that the potential is considered higher before than after the Life+ action.If the coloured part exceeds the black part, the potential is considered to be higher after, rather than before, the Life+ action.
services such as RS8 "Protection against winds and storms" and PS1 "Cultivated crops (including seaweed farming)" have both a high confidence value and high ICC, indicating the experts are self-confident and provided concordant scoring.For some other ES such as PS5 "Drinking water", PS8 "Genetic and medicinal materials", the confidence indices are low but the ICCs are low indicating.These ES indices should be taken with cautious and certainly need complementary analyses or a new expert consultation.Finally some other ES such as the Cultural services have a high confidence index and low concordance.They will also need some further analysis since the experts seem to consider they have knowledge of them but they score very differently.

Discussion
We proposed a 7-step methodology for producting an ES matrix based on expert elicitation protocols.The example we presented here was for the sake of demonstrating the implementation and the interest in the proposed methodology.Our methodology aims to promote cogency in the method and coherency in the matrices produced and to reduce biases linked to experts' judgements.In a recently published book, Burkhard (2017) proposed a 10-steps methodology for the matrix application for ES mapping.His protocol can be complementary to ours, since he focused mainly on the mapping issues and the use of spatial data, while our methodology aims to promote good practices to produce a reliable expert-based ES matrix and the analysis of different sources of variabilities.As stated by Choy et al. 2009, expert elicitation possesses several key issues of relevance for ecological applications that are relevant in our context: 1. Experts' opinions are an important source of information, 2. Ecological problems should be broken down to facilitate the expert knowledge elicitation, 3. communication style is important in targeting different groups, 4. technology could be used for providing more interactive environment, 5. Expert panels provide a useful mechanism for facilitating elicitation.The Choy et al. 2009 discussion resonates with our 7-steps approach for an ES matrix.ES are mainly complex and based on several interacting ecosystem functions and generally human contribution.The expert opinion is generally based on integrating several pieces of information rather than a very specific indicator, such as necessarily used by quantitative modelling.Our recommendations for the organisation of steps 1 to 5 and interactions with key stakeholders and experts are important in order to define the objectives and to exchange with the expert panels.We state that these steps are very important and should be overseen as mere preparatory elements.Good planning and good communication with experts are needed to ensure a reduction of many biaises that can be linked with ES scoring by experts.We preconise to use individual scoring instead of consensus methods in order to keep simpler options for statistical analysis of uncertainties (Campagne et al. 2017).More generally, interesting discussions on the use of the matrix approach can be found in Burkhard et al. (2012), Campagne et al. (2017), Hou et al. (2013), Jacobs et al. (2015).Burkard and Maes (2017) discuss ES mapping in general.In the following paragraphs, we summarise and regroup the pros and cons of the matrix approach in general, the use of expert elicitation in particular and precise the pros and cons linked to the methodology we propose here.

Flexibility
We agree with Schröter et al. (2012) by considering that Burkhard et al. (2012) made "an important contribution to the development of mapping supply and demand of ES".
The matrix approach enables quick ES assessments (Hermann et al. 2013, Stoll et al. 2015) and provides readily understandable and "mappable" ES data (Jacobs et al. 2015).
The flexibility of the matrix approach in terms of detail and levels of abstraction from rather simple to highly complex further adds to its attraction (see Burkhard et al. 2014)

Appropriability
The underlying ES concept, simplicity, flexibility and participatory approach with coproduction of the results converge to make the final output readily understandable and appropriable by stakeholders.As in the case applied by Kopperoinen et al. ( 2014), experts and stakeholders appreciated the approach as very fruitful and interesting as it combined input from all experts working on a territory and provided output data integrating all their expertise-sets.

Cost efficiency
The matrix approach and expert elicitation are cost-efficient, as they provide a quick assessment for a large number of ES and a large number of ET (Jacobs and Burkhard 2017).Providing that rules are used and carefully applied (this paper), the results also have a quantified reliability and credibility that is not necessarily lower than more complex modelling approaches that also incorporate expert knowledge at different stages of parameterisation (Jacobs and Burkhard 2017, Campagne et al. 2017).
As shown in Table 1, depending and the initial matrix, the filling in and the precision of ES and ET, the time required to apply the approach varies.

Integrative
There have been very few attempts to quantitatively assess regulating services and cultural services due to the fact that available data are scarce and relevant indicator quite complex (European Commission 2014 in Schröter et al. 2012).However, the matrix approach can quickly provide estimates for all types of ES.As quantitative data based on different units poses the problem of comparability of ES/EDS potentials using different metrics, the matrix-based scoring approach is more integrative and provides same-scale estimates (i.e.0 to 5).The intrinsic complexities of estimating values linked to ES or EDS, particularly those that cannot be easily quantified (some regulation services and most cultural services), challenges the reliability of many biophysical or economic valuation approaches (Aldred 2006).In the case of economic valuation, the question of putting value on services raises many problems (for a recent synthesis, see Rey-Valette et al. 2017), such as the perception and identification of services by social actors or the measure of option and nonuse values

Spatiotemporal invariance
The matrix gives an average score of ES provided by ET/ LC types.Two distant areas with the same ET will thus have the same scores without accounting for their specificity (Jacobs et al. 2015).Protection status, ecosystem health status, topographical, topological and other features, such as socio-economic data (human population density), are not directly taken into account.Additional spatial analysis can be factored (e.g.Hermann et al. 2013) or spatial invariance (e.g.regional variation) can be considered in the ET lists in the matrix but this considerably increases the matrix size and time required to fill-in.
ES provision is temporarily variable (Kandziora et al. 2013).As stated by Turkelboom et al. (2017), the matrix with a "list of ES can give the impression that provisioning, regulating and cultural ES can be met at the same time, while in most situations it is impossible to manage ecosystems in such a way that these ES are simultaneously utilized at desired levels".

Lack of consideration of spatial processes
The matrix does not take into account trade-offs and synergies between ES as they form an interaction network (Schröter et al. 2012), but correlation can be analysed.Also, the matrix alone does not take into account the impact in the ES provision of the interaction between ET and landscape structure with its mosaic of patches (Syrbe et al. 2017) Expert-based estimates The results, based on expert judgements, are limited by the experts' own understanding and interpretation and by a number of cognitive and social biases.However, carefullydesigned methods can reduce some of the impact of specific biases (see Drescher et al. 2013, Mukherjee et al. 2018).
Some ES and ET are more difficult to define, which may lead to different interpretations.Moreover, the distinction between the different components of ES notions (supply, capacity, uses, demand etc.) is also debatable and may again lead to different interpretations.The workshop plays an important role in addressing this limitation, as it serves to set aside the time needed to explain all the matrix-related definitions and to go back over them if needed.
An intrinsic limit of expert-based methods is the subjectivity of their estimates.By definition, they are not measurements but a plausible score based on the best knowledge of the experts mobilised (Jacobs et al. 2015).Expert knowledge and empirical data exist on a continuum of subjectivity and both require validation steps (Martin et al. 2012).Spangenberg and Settele (2010) stressed that, despite an illusion of precision given by the rigorous methods used, the economic evaluation of services is based on conventional premises that preclude any claim to "objectively" calculate their value.Although the expert elicitation of ES scores contains many sources of uncertainties (Hou et al. 2013), it is not obvious nor easy to determine if that uncertainty is higher or lower than other empirical methods (Drescher et al. 2013) Relative quantification As the ES scoring used in the matrix approach is only semi-quantitative and expert-based, we cannot equate the scores directly to actual biophysical quantities, there is an obvious need to confront them to actual or model-based quantitative estimates in order to better define their domain of validity.Some preliminary results seems to indicate a monotone relationship between expert-based ES scores and quantitative estimates but more research is needed.

Conclusion
Finally, we are convinced of the usefulness of the expert-based matrix approach and recommend a complete individual filling with an empty matrix for its fair compromise between the time requested, taking into account all expertise, analytical and statistical advantages and reasonable participatory time to avoid over-solicitation.As presented in the example, we quickly obtained relevant and appropriable results for stakeholders.The flexibility of the approach allows an unlimited adaptation to contextual objectives.However, for the matrix to be with you, this flexibility should be framed in order to achieve results with high scientific standards.We recommend the adoption of our explicitly structured and robust procedure, involving several steps linked with the study design, capacity building, scoring and expert accuracy and uncertainty assessment.However, we strongly recommend that the ES matrix should not only focus on ES central scores, but also address the variabilities and uncertainties as part of the ES assessment.
Dépôts copilotée) Biodiversité et la Mission Economie de la Biodiversité.The application was funded by the Alpilles NAT/FR/000107 LIFE programme and the Alpilles Regional Natural Park.
e.: Deciduous forest, Rocky habitat, Vineyard with bare soil, Orchards with inter-row grass strips, Grassland, Annual crops.The list of ET was completed by detailed descriptions and local pictures to harmonise participants' mental representations.Based on CICES (V4.3, Haines-Young and Potschin 2013), we proposed an ES start-list that we then adapted by working with the park managers to select relevant ES and to adapt the definitions and examples to fit the local context.At the end, 10 regulating services, 9

Figure 3 .
Figure 3.The final capacity matrix with mean ES capacity scores giving mean confidence scores by ET and ES in the table margins.ES = ecosystem service ; EDS = Ecosystem disservices.

Figure 4 .
Figure 4. ES impacts of Life actions.Difference between the ES/EDS scores for Life+ action-related ecosystem types (before-after).Bold underlined values indicate significant differences in scores (paired t-tests, p<0.05).ES = ecosystem service ; EDS = Ecosystem disservices.

Figure 5 .
Figure 5. Bundles of ES capacity of the potential Life action impacts.A1: restoration of a grass layer in olive groves, A2: plantation of multispecies hedgerows, A3: creation of open areas in dense garrigue scrubs by shrub cutting, A4: implementation of good forestry practices with opened forest by shrub cutting and culling.The wedges in the chart are same-shaped: length (radius) indicates the potential capacity of the ET to generate ES, black parts indicate ES provided before the Life actions and colour parts indicate ES provided after the Life actions.When the black part of a wedge exceeds the coloured part, it indicates that the potential is considered higher before than after the Life+ action.If the coloured part exceeds the black part, the potential is considered to be higher after, rather than before, the Life+ action.
The matrix applications have mainly been used for mapping ES (e.g.Depellegrin et al. 2016, Egarter Vigl et al. 2017, Koschke et al. 2012, Burkhard et al. 2012), so the main outputs are maps of ES supply, demand and supply/demand budget (i.e.hotspots and coldspots of ES budget depending on the objectives defined at Step 1).As the ET list in the matrix can come from landcover/use typology, ES scores can be linked to the spatial unit Campagne et al. 2018.it can be applied based on either very simple or fairly complex methods (i.e.simple or advanced matrix mapping inHarrison et al. 2017and Tiers 1 to 3 inGrêt-Regamey et al.  2015).Moreover, other concepts can be added to the ES assessed as ecological integrity indicators (e.g.Schröter et al. 2012, Islam Sohel et al. 2015)including the notion of biodiversity and the concept of ecosystem disservices, as recommended byStoll et al. (2015)and employed in our example application and analysis inCampagne et al. 2018.