Distribution of bumblebees across Europe

Insect pollinators are a key component of biodiversity; they also play a major role in the reproduction of many species of wild plants and crops. It is widely acknowledged that insect pollinators are threatened by many environmental pressures, mostly of anthropogenic nature. Their decline is a global phenomenon. A better understanding of their distribution can help their monitoring and ultimately facilitate conservation actions. Since we only have partial knowledge of where pollinator species occur, the possibility to predict suitable environmental conditions from scattered species records can facilitate not ‡ ‡ ‡ § | ¶ # ¤ « » ˄ ˅ ¦ ˀ ˁ § © Polce C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. only species monitoring, but also the identification of areas potentially vulnerable to pollinators decline. This data paper contains the predicted distribution of 47 species of bumblebees across the 28 Member States of the European Union (EU-28). Amongst the wild pollinators, bumblebees are one of the major groups contributing to the production of many crop species, hence their decline in Europe, North America and Asia can potentially threaten food security. Predictions were derived from distribution models, using species records with a spatial resolution of 10 km accessed from a central repository. Predictions were based on records from 1991 to 2012 and on a series of spatial environmental predictors from three main thematic areas: land use and land cover, climate and topography. These distributions were used to estimate the value of pollination as an ecosystem service. In light of the recent European Pollinators Initiative, this paper provides valuable information for a better understanding of where wild pollinators occur and it should be extended to other pollinator species.


Overview and background
Pollination is a key ecosystem service vital to the maintenance of both wild plant communities and agricultural productivity.Over three quarters of the world's major crops benefit from insect pollination (Klein et al. 2007), with an annual economic value estimated at $235-577 billion globally (in 2015 US$) (IPBES 2016).In Europe, insect pollination benefits more than 80% of crops (Williams 1994), with an estimated annual value of several billion euros (depending on the methods and crop considered, estimates range from €3 billion to just above €14 billion, see for instance : Vallecillo et al. 2018, Leonhardt et al. 2013, Gallai et al. 2009).Pollination services are mainly provided by wild insect pollinators (bees, hoverflies, flies, moths, beetles etc.) and domesticated bees (primarily western honey bee Apis mellifera and Bombus terrestris).Hence, there is growing concern that observed declines in insect pollinators, mainly bees (Potts et al. 2010, Biesmeijer 2006, Nieto et al. 2014), may impact on production and revenues from pollinator-dependent crops.Knowing the distribution of pollinators, therefore, is crucial for estimating their availability to pollinate crops (and wild plants).This information, in turn, can be used to ensure the maintenance of habitats that support insect pollinators, ultimately safeguarding the long term provision of (crop-)pollination services.In general, however, there is incomplete knowledge of where wild pollinators occur.To overcome this issue, two main approaches are usually adopted: one based on expert knowledge of the species' ecology (e.g.Zulian et al. 2013), the other based on species records (e.g.Polce et al. 2014, Polce et al. 2013).Both deliver a relative indicator between 0 and 1 expressing the environmental suitability to support the species of interest.This indicator is often interpreted as probability of occurrence of the species and, hence, used as a proxy for their potential presence.
Each of these approaches has some strengths and weaknesses: an expert-based model (EBM), for instance, can account for the effect of detailed local information, such as the presence of wild flower edges between crop-fields or other small patches of habitat suitable for pollinators.The EBM, however, could fail to reflect the environmental suitability for poorly known species or to capture environmental characteristics that can modify the expected suitability (e.g.climatic differences) or, again, it may not be able to predict species richness.A species-distribution model (SDM), on the other hand, can be formed by actual species records, which are used to characterise the 'quality' of the environment where species are recorded through statistics or machine-learning techniques for instance.An SDM, therefore, is constrained by the spatial and temporal resolution of these records; hence, it could fail to capture the effect of local landscape elements, if their spatial accuracy is greater than that which is available for the species records; or it could lead to biased predictions if the input data are biased (e.g.spatially, temporally biased) and no corrections are applied.These two approaches have been recently applied to derive a spatial indicator for the 'pollination potential by wild insect pollinators' across the European Union, within KIP INCA project ('Knowledge Innovation Project Integrated system for Natural Capital and ecosystem services Accounting').The original models by Zulian et al. (2013) and Polce et al. (2013) were adapted to meet the requirements of the accounting, such as the need to rely on datasets that are regularly updated.The EBM was adopted to predict the potential distribution of solitary bees across Europe, while the SDM was used to predict the potential distribution of (wild) bumblebees.The outputs of these models were then integrated to estimate the potential availability of wild insect pollinators across Europe and, finally, the pollination service (Vallecillo et al. 2018).While the EBM largely followed the scores and main dataset from Zulian et al. (2013), the SDM was based on a dataset never used for this purpose, while the approach largely followed Polce et al. (2013).Hence, in this paper, we present the predictions of bumblebee species from the SDM and describe the methods to generate them.
Bumblebees are important pollinators not only of wild plants, but also of crops.So their decline in Europe, North America and Asia is a cause of concern.Like many other species, bumblebees are also sensitive to environmental change.Maps derived from their records across Europe were recently produced to characterise their current climatic niche and their projected distribution based on climate change scenarios (Rasmont et al. 2015) as well as on the interplay between climate and land use change (Marshall et al. 2017).Work of this type allows the understanding of the expected shift in suitable habitat and climatic conditions for each species and, hence, an estimate of the potential risks due to environmental change.Bumblebees are also a highly charismatic group, as they are large brightly coloured insects associated with flowers.Therefore, they can effectively contribute to raising public awareness on the issue of pollinators and engage citizens in pollinatorsrelated activities (e.g.Bumblebee Conservation Trust).
In light of the recent EU Pollinators Initiative, the possibility to predict suitable environmental conditions based on species records is particularly important, not only to facilitate species monitoring over time and across geographical space, but also to predict areas potentially vulnerable to pollinators decline.
Hence, we present the potential distribution of 47 species of European bumblebees, derived from their records and key environmental drivers.

Overview of the study
For the importance of bumblebees within agricultural production, maps displaying their likely distribution are a key component of Natural Capital and Ecosystem Services Accounting.For the KIP INCA project, we inferred the potential distribution of bumblebees across Europe using species occurrences and their relationships with key environmental drivers.
Environmental drivers were chosen through a combination of ecological criteria (specific for this group) and statistics (see Methods).Species records were made available by the EU-FP7 funded STEP project ('Status and Trends of European Pollinators'), at a spatial accuracy of 10 km.They consisted of validated presence-only bumblebee records, gathered from different data donors in Europe (Suppl.material 7).These records are currently stored in the Atlas Hymenotpera (Rasmont and Francis 2018), a database which also includes records from other taxa of insects and which is constantly updated.We obtained access to bumblebee records for the period between 1970 and 2012, distributed throughout the territories of the European Union (except for Malta and Cyprus, for which the data were not yet available).These territories excluded the Outermost Regions and the Overseas Countries and Territories.The original dataset is described in Rasmont et al. (2015), while an update for the years 2010-2014 is presented in Rasmont and Iserbyt (2014).
We defined: • occupancy as the number of 10 km grid cells with records for a given bumblebee species; • record as a database entry for a given bumblebee species, with a spatial accuracy of 10 km.

Species data
The latitude and longitude of each record (world geodetic system WGS 1984, EPSG:4326) were projected to the LAEA (Lambert azimuthal equal-area projection, EPSG:3035) over a 10 km spatial resolution grid.For the purpose of this work, we used bumblebee records available from 1991 to 2012 inclusive, identified to the level of species.To allow us to obtain robust predictions, we considered all species with an occupancy of at least 25 grid cells (47 species).The list is included in Suppl.material 1, together with the list of species excluded from the analysis.

Environmental predictors
We used four types of environmental predictors: Copernicus (25 m spatial resolution, 7 m vertical accuracy).We used these two variables to characterise each 10 km grid cell in terms of its elevation and roughness.

•
Others: we computed the 'Average distance from natural and semi-natural areas' (snd_km) within each 10 x 10 km grid cell, rounded to the nearest metreand then converted to kilometre.A number of field-based studies reviewed by Ricketts et al. (2008) and Garibaldi et al. (2011), in fact, prove that increasing distance from semi-natural areas has a negative impact on a number of pollinator-related outcomes (namely: pollinator richness, visitation rates and stability of pollination services).CORINE LC classes, considered 'natural or semi-natural', are listed in Suppl.material 2.
To minimise multicollinearity between predictors (Guisan and Thuiller 2005), we employed Jolliffe's Principal Component Analysis with the rejection method B4 (Jolliffe 1972, Jolliffe 1973).Details of this procedure and the Pearson's correlation between the predictors are in Suppl.material 3.
In total, 22 predictors were used (

Model settings
Species distribution models were carried out within Maxent (Maximum Entropy Modelling of Species Geographic Distributions, Version 3.4.0,December 2016) (Phillips et al. 2004, Phillips et al. 2006, Phillips et al. 2017).One of the advantages of Maxent is the possibility to use presence-only data, such as the bumblebee records of our study.In addition to the environmental conditions at localities where a species is found, the model requires a random sample from the study region (i.e. from the background).This sample assumes a uniform survey / visiting record over the entire study area.This is seldom the case when species records come from different donors and geographic regions.In case this assumption is violated, the background information should reflect the sample bias.A possible correction is to restrict the selection of the background points to a region with records of similar species (called 'target group', TG) observed by similar methods (Phillips et al. 2009).We tested for violation of this assumption by comparing the AUC (Area Under the Curve of the Receiver Operating Characteristic) of models based on all records of bumblebees from 1970 to 2012 (and not necessarily identified to the level of species), against the AUC of models based on n-time sampling of an equal number of points from the entire study area (referred as 'null models') (Raes and ter Steege 2007).We found that the average AUCs of models from 100 sets of 500, 1000 and 5000 points drawn from the TG were significantly greater than the average AUCs of an equivalent number of null models from the entire study area (Suppl.material 4).The background localities for the individual SDMs were therefore drawn from within the TG.
We followed the model calibration described in Polce et al. (2013): • We fitted the data with "Hinge" features, which are base functions for piecewise linear splines (Phillips and Dudík 2008).

•
We modified default values of "Prevalence", which is defined as the probability of presence at ordinary occurrence points.When absence data are not available, Maxent assigns it 0.5 (Phillips and Dudík 2008).Prevalence is defined over specific spatial and temporal scales, which should be taken into account particularly when working with pools of species differing in their rarity (Elith et al. 2010).We followed Polce et al. (2013) and empirically computed prevalences, using the number of available records, the number of occurrences and the number of years with nonzero observations over the temporal scale .Each species was then assigned a value for prevalence from 0.1 to 0.5 (Suppl.material 1).

•
Additional model settings are listed in Suppl.material 4.
We assessed model predictions using the AUC, which, despite known assumptions and limitations (Austin 2007, Termansen et al. 2006), is commonly used as a thresholdindependent measure of model performance within SDMs.With presence-only data such as the pollinators' sightings, the maximum achievable AUC is less than one (Wiley et al. 2003), so standard thresholds for evaluating "goodness of fit" do not apply.As an alternative, we compared the average AUC value of each pollinator distribution model (AUC ) with the average AUC value of a set of null models (AUC ) where species records were replaced by randomly chosen locations (Raes and ter Steege 2007).We expected AUC > AUC .
At last, model predictions were interpreted as 'Probability of occurrence'.
For each species, two main outputs were considered: • The average 'Probability of occurrence' across the area of interest, from each of the five model runs.

•
The average threshold indicating, for each model run, the species 'Minimum training presence'.This threshold was used to convert the average probability of occurrence (P ) to 'Presence/Absence' across the study area: if P >= Threshold, then presence = 1 or else presence = 0.
These outputs were also used to derive: • A 'Species richness' map, by summing each species 'Presence/Absence' map.

•
An average 'Probability of occurrence' map, from the aggregated set of 47 species: single species P maps were summed and their average extracted by dividing it by the 'Species Richness' map.Hence, for each grid cell, the average P was based only on the number of species likely to be present.

Results
The assessment of model predictions through the comparison of species-model AUCs and null-model AUCs showed in all cases a performance significantly better than random (Suppl.material 4).
The predicted probability of occurrence for each species, the species richness (i.e. the number of bumblebee species predicted to be present) and the composite probability (average probability of occurrence for those species predicted to be present) are listed in Suppl.material 5 (the "Dataset"), for each pair of geographic coordinates (of the 10 km spatial resolution grid).Low resolution figures derived from these predictions are shown in Suppl.material 6, together with the low resolution figures of the species occupancies used as model inputs.Fig. 1 shows the predicted number of bumblebee species across EU28, Fig. 2 shows the predicted probability of occurrence from species predicted to be present (i.e. for species with P > 0); in other words, an overall probability of occurrence for bumblebees.Predicted number of bumblebee species, derived from the sum of each species presence maps.Presence was defined for each species using an individual threshold applied to the predicted probability of occurrence (see main text for details).Values above the threshold were set to 'presence' ( 1) and values below it to 'absence' (0).The sum of 'presences' indicate, for each locality on the map, the number of bumblebee species potentially present, i.e. the potential species richness.Predicted probability of occurrence for bumblebees, resulting from the individual species distribution maps.Single species distribution maps were summed and their average extracted by dividing it by the species richness map (Fig. 1).Hence, for each map locality, the average probability of occurrence was based only on the number species predicted to be present.
Column 50: Richness: predicted number of bumblebee species predicted to be present (i.e. with probability of occurrence greater than zero); Column 51: CompositeProbability: average probability of occurrence, from those species predicted to be present.

Creation dates
May 2018

Dataset contributors
See list of co-authors and Suppl.material 7.

Repository location
This paper (Suppl.material 5)

Publication date
Accepted for publication on 06 September 2018

Re-use potential
Any re-use of these data must acknowledge this source.Any interpretation of results obtained from re-using these data must acknowledge the characteristics of our dataset, which are described within this paper.The authors may be contacted in case of any doubts.
The predicted probability of occurrence must be interpreted as the potential environmental suitability based on the set of environmental predictors and the input occupancy.Additional records and updated environmental information can lead to different predictions.
The presence of a species is derived from the predicted probability of occurrence, after applying a threshold to discriminate presence from absence.No common rules exist to choose the threshold.A combination of statistical and ecological criteria might be used.For this study, we adopted 'Minimum training presence' as a threshold rule, which uses the suitability associated with the least suitable training presence record as the threshold.While this rule might lead to optimistic predictions (also assigning presence to areas at the margins of the species' ecological requirements), it also shows the potential suitability of the environment for the species.This information can be used, for instance, to select target areas for specific pollinator-friendly measures aiming at improving local environmental suitability.
Further research could investigate the possibility of adopting different thresholds for species' presence, based on species' commonality.Such fine-tuning could bring the predicted potential distribution closer to the species' actual distribution.

Disclaimer
The views expressed in this article are personal and do not necessarily reflect an official position of the European Commission.
Page 49: two maps showing the overall probability of occurrence for those species predicted to be present in each grid cell and predicted species richness derived therefrom.The presence of a species is derived from the predicted probability of occurrence, after applying a threshold to discriminate presence from absence.No common rules exist to choose the threshold.A combination of statistical and ecological criteria might be used.For this study, we adopted 'Minimum training presence' as a threshold rule, which uses the suitability associated with the least suitable training presence record as the threshold.While this rule might lead to optimistic predictions (also assigning presence to areas at the margin of the species' ecological requirements), it also shows the potential suitability of the environment for the species.This information can be used, for instance, to select target areas for specific pollinator-friendly measures aiming at improving local environmental suitability.
See main text for additional details and for the data sources.
Bioclimatic variables are biologically relevant variables representing annual trends, seasonality and extreme or limiting environmental factors.For these reasons, they are often used in species distribution modelling.The 19 bioclimatic variables (see WorldClim for their description) can be computed, for instance, with the 'biovars' function of the dismo package in R.• Topography: we computed mode and standard deviation of elevations within each 10 x 10 km grid cell, using the European Digital Elevation Model (EU-DEM) from •Land use / land cover (LULC): percentage cover of 16 continuous variables, derived from the 100 m spatial resolution CORINE Land Cover 2006 accounting layer (version 18.5, Mar.2017) (Corine LC).This layer was produced by the European Environmental Agency (EEA) through a harmonisation method, allowing the use of the CORINE LC data series to detect changes over time.We chose the CORINE LC hierarchical level 3, the finest one available across the whole area of interest and converted each class to percentage cover within the same 10 x 10 km grid cell of the species data.We then reduced the number of classes to a set of 16, which were ecologically relevant for the taxa.The correspondence with the original CORINE LC classes is listed in Suppl.material 2. • Climate: we computed monthly averages from daily minimum and maximum temperature and daily total precipitation, using the 25 km spatial resolution E-OBS gridded meteorological data.Since our interest was to have an indication of the climatic pattern, we computed the 12 monthly averages from two decades-data (1991-2012 to match bumblebee records), taking into account leap years.The monthly averages for each of these climatic variables (minimum temperature, maximum temperature and precipitation) served as inputs to compute 19 bioclimatic variables.

Table 1 .
Table 1): 16 aggregated LC classes, 4 bioclimatic variables, 1 topographic layer and 1 layer showing the average distance from semi-natural areas.Environmental predictors used to derive species distribution.
Filename: SM06_Bumblebee_distribution.pdf -Download file (6.01 MB) MO Excel table Brief description: A table-like spreadsheet listing primary sources of bumblebee records used to derive the predictions presented in this paper.