A map of pollinator floral resource habitats in the agricultural landscape of Central New York

We created a spatially and temporally-explicit model of floral area in Central New York State, USA, using public data from federal and state governmental agencies and non-governmental organisations. This model incorporates remote sensing-derived natural habitat, crop and land-use data products with roads GIS data to predict land cover indicative of floral resources for pollinators. The resulting dataset provides the necessary land-cover data to quantify floral resources available within a user-specified area (e

Therefore, estimates of floral resources for pollinators must take into account the land use and land cover within a heterogeneous landscape in order to model variability over space and time (Lonsdorf et al. 2009).An important step in developing this understanding is characterising the landscape into land-cover classes that can be translated to potential pollinator communities (Koh et al. 2015).Further, fine-scale information may play an important role in understanding pollinator distributions in some landscapes (Lonsdorf et al. 2009).Here we describe the process we used to create a spatial dataset that classifies land cover into categories relevant to their flowering vegetation communities at a high resolution (1 m) within a region of Central New York State.This dataset can be combined with data on flowering area and flowering phenology of plant communities in each landcover category to predict floral resources available to pollinators over the year (Iverson et al., in prep.).

Context
The focal area of this dataset covers 12 counties in New York State, within the United States of America (USA): Cayuga, Chemung, Cortland, Monroe, Onondaga, Ontario, Schuyler, Seneca, Tioga, Tompkins, Wayne and Yates (Fig. 1).We produced dataset versions that include crop data for the years 2012-2019.

Methods
We combined land-cover data relevant to estimating floral resources, including natural habitat types (including wetlands), crops, grasses (like pasture, hayfields, oldfields and urban lawns), roadside ditches and urban areas (see Table 2).This involved combining and reclassifying annual crop cover data from the Cropland Data Layer (CDL) (Boryan et al. 2011) and a natural habitat layer covering the northeast US and Atlantic Canada (Ferree and Anderson 2013) into classes relevant to predicting flowering plant communities.We then downscaled the land-cover classification information from the combined crop and habitat layer to a 1 m resolution lidar-based dataset that classifies, based on vegetation height and impervious cover (Chesapeake Conservancy 2020), for most (nine) of the counties within our study area.Counties not covered by this high resolution layer were still downscaled to 1-m resolution to match the rest of the data for further processing.To the downscaled data, we added wetland and waterbody delineations derived from the National Wetland Inventory (USFWS 2018) and our own delineations of roadside ditches, based on road vector data.1.
Description of datasets used to map floral resources of pollinators for Central New York.
Land-cover classes of the final combined land-cover dataset and the numeric code used to represent them in the output raster layers.The data origin column gives the input dataset that was used to provide information for the coverage of each land-cover class (CDL = Cropland Data Layer, Chesapeake = Chesapeake Bay Land Cover Data Project, NY Street = New York State Goverment roads layer, TNC = The Nature Conservancy Terrestrial Habitat Map, NWI = National Wetlands Inventory).Where there was information available from the high resolution Chesapeake Conservancy layer, more detailed delineations from that layer were used, based on the vegetation type.All input geographic datasets are publicly available from the sources listed in Table 1.We converted these layers to the same projected coordinate system, USA Contiguous Albers Equal Area Conic USGS (ESRI WKID: 102039), within the geographic coordinate system North American 1983 (EPSG: 4269).Geoprocessing was conducted using tools in ArcGIS ESRI (2020) available under the spatial analyst and data management licences and were scripted with the ArcGIS visual programming application "ModelBuilder".The full modelling workflow is described in Suppl.material 1.The land-cover classes of the final dataset (Table 2) aggregate crop and natural habitat classifications into classes that are sufficiently narrow to capture major variation in floral characteristics, yet coarse enough to allow for feasible sampling with replication within the study region (Suppl.material 1).Major annual and perennial crops in the region that provide floral resources are indicated by their species.Other perennial crops are grouped together ("perennial"), reflecting a similar weed community resulting from the common growing practice of using mowed grass alleyways between crop rows.Remaining annual crop types are categorised into two general groups, "insect-pollinated crops" and "nonresource crops".We use the term "insect-pollinated crops" for crops that flower under cultivation (e.g.sunflower).We use "non-resource crops" for any crop that does not produce insect-pollinated flowers, ecologically (e.g.wind-pollinated crops like wheat) or as it is cultivated (e.g.insect-pollinated plants that are harvested prior to flowering, like broccoli).While some of our "non-resource crops" are pollinated by insects if allowed to flower, we use this term to represent the actual availability of floral resources, based on management practices.Additionally, "non-resource crops-wintercover" indicate nonresource crops that are sown in autumn to overwinter, as opposed to grown within one growing season.Some discrepancies between the CDL and Terrestrial Habitat layers inevitably emerge, when the CDL classifies a 30 m grid cell as natural habitat, but no natural habitat is indicated at that cell in the Terrestrial Habitat layer.We resolved these mismatches by referring to the nearest Terrestrial Habitat class (processing workflow in Suppl.material 1).

High resolution landscape features
We derived high resolution delineations of landscape features from data layers on vegetation cover, wetland inventories and roads data.

High resolution vegetation data
We obtained 1 m resolution vegetation coverage data from a land-cover dataset produced by the Land Cover Data Project of the Chesapeake Conservancy (Chesapeake Conservancy 2020).This layer was created, based on lidar data obtained from the Federal Emergency Management Administration and the US Geological Survey (USGS), orthoimagery from the National Agriculture Imagery Program, county-level planimetrics data, statewide data on roads from the US Census and information from the National Wetlands Inventory.However, these data are only available for nine of the 12 counties in the study area (Cayuga, Chemung, Cortland, Onondaga, Ontario, Schuyler, Tioga, Tompkins and Yates Counties).Vegetation within this dataset is classified as either trees or low vegetation, with additional classes for water and impervious land-cover types.For our purpose, we reclassified all categories related to impervious cover as "no resource" to reflect no floral resources.We overlaid the high-resolution vegetation data with additional 1 m resolution features representing wetland and roadside ditch delineations derived from vector-based data, described below.

Vector-based wetland and water features
We used delineations from the vector-based National Wetland Inventory (NWI) dataset ( USFWS 2018) to define wetlands and, in some cases, waterbodies, within the final output land-cover data layer.To do this, we converted the NWI layer from vector to raster using the resolution of the Chesapeake Conservancy layer (1 m).We inserted wetland features within the Chesapeake Conservancy land-cover layer, using the latter layer's vegetation height classes to update NWI wetland cells as low or high wetland types, i.e. emergent or shrub wetlands, respectively.In areas not covered by the high resolution Chesapeake Conservancy layer, we maintained the NWI layer's wetland classifications and used the NWI waterbody delineations to replace the open water grid cells in the 30 m datasets, filling missing areas with the nearest non-water land-cover class from the CDL or Terrestrial Habitat layers.

Roadside ditches
Road verges and ditches can be an abundant source of floral resources for pollinators ( Phillips et al. 2020).Remotely sensing ditches from imagery requires very high resolution imagery and substantial analytical effort (Ayana et al. 2017); so instead, we based our prediction of likely flowering ditch locations on a roads layer obtained from the New York State Government (Winters 2018).We did not consider roads that intersected with a city and village boundaries layer (Gehrer 2018) because these were unlikely candidates for roads with ditches that are clearly differentiated from adjacent land covers (e.g.unmowed ditch next to agriculture or forest).We excluded road lines that were classified as "Parking lot" in the "Jurisdiction" layer attribute because these represented contiguous paved areas.Additionally, we excluded roads classified as a "Town Road" (a broad jurisdiction category that includes both city streets and rural roads) that did not contain "road" in its name (i.e."street", "place" "boulevard", "avenue" etc.).This last criterion is based on our observation that the latter names are given to urban streets as opposed to rural roads in the region.
Based on these criteria, we eliminated most urban and suburban streets from consideration, which were not likely to have a clearly differentiated 'ditch' habitat.We informally checked the buffer distances used to predict ditch locations against aerial photos, in order to assess the accuracy of our ditch placement parameters.
We then simulated ditches along the selected roads using a buffer from the road centreline, at a width dependent on the road type (full description in Suppl.material 1, Step 1c) and assigning a ditch width of 3 m, the average size in our study region, on each side of the road.We erased the portions of simulated ditches that intersected water features in the NWI layer.

Combining crop and natural habitat information with high resolution landscape features
We downscaled land-cover information from the combined crop and natural habitat land cover raster from 30 m to 1 m resolution, using Table 2 to assign the 30 m land-cover classes to the vegetation types in the 1 m resolution layer (see Suppl.material 1 for further reclassification details).Wetland, water and ditch features, which were already added to the high-resolution layer as described above, were preserved in this process and did not take on the crop or habitat land-cover classes from the 30 m resolution layer.
In cases where the vegetation type indicated in the 1 m resolution land cover layer (i.e.tree canopy or low vegetation) differed from the overlaying 30 m combined crop and habitat layer, we assigned the nearest height-matching vegetation land-cover class from crop or natural habitat land cover (further details in Suppl.material 1, Step 2).For the three counties that were not covered by the high-resolution layer (Monroe, Seneca and Wayne Counties), crop and habitat land-cover delineations remained the same as the 30 m combined crop and habitat layer, though we upscaled the raster to 1 m resolution so that wetland, water and ditch delineations could be added.

Special considerations for counties without high-resolution vegetation data
For the three counties without high-resolution vegetation data, we used an alternative approach to estimate the area of lawn and urban tree coverage within the developed landcover areas.In these counties, developed areas are represented by two development intensity classes, which should be converted to an average value for proportion of lawn and urban tree coverage.The conversion values in Table 3 were calculated from the average relationship between the two developed classes and the underlying proportion of lawn and urban tree coverage for the nine-county area where these 1 m-resolution data are available.The centroids of 30 m-cells were used as centres for 30 m-wide buffers to sample the proportional lawn and urban tree coverage within the 1 m data.The values in Table 3 are the averages across the study years of the average 30 m pixel coverage over the sampled region.We also explored an alternative method converting a continuous permeable surface coverage variable to estimated urban lawn and tree coverage, but this approach requires additional processing steps and does not improve predictions over the class-based averages (Suppl.material 1, Step 6).

Steps
The order of data synthesis is outlined below.These are encoded as ArcGIS Modelbuilder tools that were developed for this project and are uploaded to the repository associated with this article.More details on the geoprocessing routines within each step are described in Suppl.material 1.

1.
Prepare high-resolution data layers: reclassify relevant vegetation height classes (where available), rasterise national wetland inventory and estimate ditches.

2.
Combine high resolution layers prepared in step 1.

3.
Reclassify natural habitat 30 m raster to represent vegetation categories relevant to floral resources and erase wetland classes in preparation for combination with high resolution wetland data.4.
Combine reclassified natural habitat layer prepared in step 3 with crop layer, reclassified to reflect relevant floral resources land-cover classes. 5.
Downscale the combined natural habitat and crop layer to 1 m resolution and add the high-resolution vegetation, wetland and ditch features.6.
In three counties where the high resolution vegetation data are not available, use developed land-cover classes or percent permeable land cover to estimate urban lawn and tree coverage.

Quality control
Since we downscaled the 30 m resolution input data to 1 m resolution, the final land-cover data layer may not always match the classification indicated by the originating land-cover layer at a given point.This is due to the inclusion of fine-scale landscape information from the high resolution layers (the Chesapeake Conservancy, NWI and ditches layers).The additional details provided by these layers may indicate mismatches in vegetation type (e.g.trees mixed within field) or finer scale landscape features (e.g.ditches or small waterbodies), which were not included in the coarser resolution layers.In order to check that the data processing steps downscaled the 30 m resolution land-cover information with adequate fidelity, we compared the final land-cover class to the classes of the originating data layers using contingency tables based on 10,000 randomly placed points that sampled the land-cover identity in the final and input layers.In Table 4, we calculate the percent of the sample points whose land cover in the final layer matches the land cover of the originating layer ("Fidelity").Higher percent fidelity classes deviate less from the originating layers indicated in the "Data origin" column of Table 4.
In general, agricultural classes are preserved in the downscaled dataset, with fidelity values above 80% and, in many cases, above 90%.This reflects the homogeneity of agricultural areas, which makes it unlikely that the high resolution vegetation layer would indicate an unexpected vegetation type (e.g.trees in alfalfa cells).Exceptions to this could  Comparison of final land-cover data layer class to the input data layer class.The "fidelity" column quantifies the percent of sample points within the final land-cover class that matches the same general class in the originating layer.These values are the averages of the percent values taken for each of the eight years for which we generated separate data layers.In cases where multiple originating land-cover classes were aggregated to form the final land-cover class, these classes are indicated in the "Original class(es)" column.Land-cover classes present in Table 2, but not present here, were not sampled by the 10,000 random points used to generate these statistics and are rare land covers for this region.Vegetation in developed land-cover classes have 100% fidelity because cells with these two land-cover classes are only found outside of the coverage of the high resolution landcover dataset and generally do not coincide with waterbodies or ditches that would change their identity in the final layer.Within the coverage of the high resolution land-cover dataset, low vegetation is reclassified as "lawn" and tree canopy is reclassified as "urban trees".
Natural areas have lower fidelity, likely because these land covers are more heterogeneous.Classifications at 30 m resolution represent the most predominant land cover, whereas the 1 m vegetation data can better reflect a mix of land-cover types.Our downscaling process approximated this by taking land-cover information from nearby areas with the appropriate vegetation type, but this would lead to more cases where the final land cover differed from the class of the originating layer.This is shown in more detail in Suppl.material 2, which provides the full contingency tables across all land-cover combinations.Shrubland, which has 42% fidelity in Table 4, also occurs prominently in areas classified as agricultural land in the TNC dataset (38% in Suppl.material 2).This could indicate woody vegetation in old fields undergoing succession that border shrublands.Likewise, wet emergent vegetation falling outside the original NWI delineations occur mainly within shrub wetland, likely representing low wetland vegetation that would not have been noted in the NWI dataset, but was mapped by the high resolution vegetation layer.
In addition to the full contingency tables associated with Table 4, Suppl.material 2 also contains contingency tables comparing how well the original land cover is preserved in the final land-cover data, i.e. the percentage of sample points whose original classification matches the final dataset.These tables give an idea of the composition of the coarser resolution cells, once downscaled.The two types of contingency tables are roughly analogous to "user's" and "producer's" accuracy typically used in remote sensing classification (Lillesand et al. 2015), except that we compare the final land-cover classifications to professionally-produced input land-cover datasets rather than field data.User's accuracy estimates how often the map class is present on the ground, while producer's accuracy estimates how often the habitat on the ground is mapped correctly.An exhaustive field validation of the final land-cover dataset is beyond the scope of this project, though extensive methods documentation and accuracy assessments are available for many of the input layers, for example, Boryan et al. (2011), Ferree and Anderson (2013), Lark et al. (2021).Likewise, the accuracy of flowering ditch locations were not validated to field conditions, but could be the subject of future resesarch.
We estimated the error associated with predicting lawn and urban tree cover in counties without high resolution data by using the developed land-cover factors in Table 3 to calculate urban lawn and tree cover in the nine counties where this information is available.We calculated the root mean square error (RMSE) comparing between high resolution coverage estimates and category-based predictions for 100 points randomly placed in developed areas, for buffers ranging from 15 m to 1 km radius.The results in Table 5 show that the RMSE of both lawn and urban tree percent cover estimates decreases (i.e.prediction accuracy improves) with increasing buffer size.An alternative method of estimating lawn and urban tree cover using continuous permeable land-cover data had similar error values (Suppl.material 1, Step 6).However, we recommend estimating lawn and urban tree coverage using the simpler category-based method presented here, as the alternative method does not offer any improvement in prediction accuracy while adding more processing steps.

Dataset description
The output floral resources land-cover layers (Fig. 2) are stored as 1 m resolution rasters in the USA Contiguous Albers Equal Area Conic USGS projection (ESRI WKID: 102039).Data are available on Zenodo.The dataset covers 12 counties: Cayuga, Chemung, Cortland, Monroe, Onondaga, Ontario, Schuyler, Seneca, Tioga, Tompkins, Wayne and Yates.Of these, nine counties (Cayuga, Chemung, Cortland, Onondaga, Ontario, Schuyler, Tioga, Tompkins and Yates) have 1 m resolution delineations of low vegetation and tree canopies, which take on appropriate classifications based on the underlying 30 m data or nearby appropriate land covers, as described above.Outside of these counties, land covers follow the delineations of the 30 m layers, except for wetland, water and ditch features.

Object name
A map of pollinator floral resource habitats in the agricultural landscape of Central New York.

Figure 2 .
Figure 2.Final land-cover layer (2016 shown).Area with high resolution vegetation data is outlined in black (majority of south-eastern region).The three counties without detailed vegetation data are available with urban vegetation represented by alternative methods shown in the two inset callouts in the upper left: either as a continuous percent permeability value (upper callout) or as low and medium developed intensity categories (middle callout).In the area with high resolution vegetation data that covers most of the region, urban vegetation consists of lawn or urban tree categories (example in lower callout).

Table 3 .
Modelled mean (and standard deviation) of lawn and urban tree proportional coverage in the 1 m resolution layer, for developed (low and medium intensity) land-cover classes in the 30 m data.Values represent the average (and propagated standard deviation) across the study years of average 30 m pixel coverage in the sampled region.bealongfield edges bordering forest or other contrasting land-cover types or cases where the CDL was misclassified(Lark et al. 2021).

Table 5 .
Estimated root mean square error (RMSE) of lawn and urban land cover predictions using developed land-cover variables.Error is calculated, based on buffers around 100 random points placed in the nine counties where high resolution data are available.Calculated error is for the 2016 dataset.There are two versions of the dataset, each consisting of eight rasters representing CDL crop data from years 2012-2019.The versions differ in their representation of the developed areas in the counties beyond the coverage of the high resolution vegetation layer (i.e.Monroe, Seneca and Wayne Counties).A simplified version classifies developed areas into "low" and "medium" development categories.An alternative version converts these areas to continuous values representing the percent permeable area.Either of these variables can be converted to an estimate of lawn and urban tree coverage, though we recommend the category-based version (see Usage Notes).Both versions are available online in the Zenodo repository.