U-shaped deep-learning models for island ecosystem type classification, a case study in Con Dao Island of Vietnam

The monitoring of ecosystem dynamics utilises time and resources from scientists and land-use managers, especially in wetland ecosystems in islands that have been affected significantly by both the current state of oceans and human-made activities. Deep-learning models for natural and anthropogenic ecosystem type classification, based on remote sensing data, have become a tool to potentially replace manual image interpretation. This study proposes a U-Net model to develop a deep learning model for classifying 10 island ecosystems with cloudand shadow-based data using Sentinel-2, ALOS and NOAA remote sensing data. We tested and compared different optimiser methods with two benchmark methods, including support vector machines and random forests. In total, 48 U-Net models ‡ ‡ ‡ § |


Introduction
Currently, more than 100,000 islands have 500 million residents in total, encompass 20% of global biodiversity and provide the according sustenance (Muñoz et al. 2013). Small islands with an area under 10,000 km are home to about 500,000 people (Liyun et al. 2018). According to the Millennium Ecosystem Assessment (MEA), island ecosystems are isolated from inland areas and surrounded by a large area of water or sea. A sixth of the Earth's surface is covered by island ecosystems and the oceans around them (MEA 2003). These ecosystems also support more rare, endangered and vulnerable species than those found on the mainland (Balzan et al. 2018). They provide both terrestrial and marine ecosystem services (Laurans et al. 2013). However, islands are amongst the most susceptible locations on the planet to the effects of human activities and environmental changes. Eighty percent of the recorded species extinctions occur on islands and they are presently home to 45 percent of the world's endangered species (Mueller-Dombois 1992). Consequently, the changes in island ecosystems have received great attention from scientists in recent years (McLean et al. 2001, Laurans et al. 2013. Improved earth observation and analytical skills have transformed our perspective on our world, allowing for a more global perspective (Araujo et al. 2015, Kennedy et al. 2021, which has the potential to have a profound impact on how humanity manages limited island resources in particular (Laso et al. 2020). Several remote sensing sensor types have been used to categorise natural and artificial ecosystems at various scales, including MODIS for global land use/cover monitoring (Nichol and Abbas 2015), Sentinel-2 and Landsat for national/regional monitoring (Dang et al. 2020b) and Worldview and Planet for local monitoring ). One obstacle is how remote sensing experts communicate their findings to potential end-users (island managers, policy-makers and conservation practitioners). For example, it is necessary to discretise continuous data regarding coral reefs cover into usable information, for example, for research, monitoring, planning and management (Kennedy et al. 2021). An automated procedure that can classify island ecosystems and monitor their changes, based on multi-temporal remote sensing data, can be relevant and should be apparent, transparent and discoverable for the end-users.
When using artificial intelligence, machine learning (ML) classify information, based on stored knowledge and do the work without any further assistance. Numerous research projects have used deep-learning methods to identify vegetation clusters, emphasising coastal and wetland habitats rather than island ecosystems (Dang et al. 2020b). Deeplearning technology evolved as a reaction to the limitations of many computer programmes and the world's infinite complexity. One of the primary advantages in object classification from remote sensing images has been to successfully identify real-world objects from a vast number of pixels. Therefore, image classification systems have often relied on statistical classifiers that find characteristics (e.g. surface type or ground cover), based on a variety of reflectance values across serveral spectral bands or using preset rules to logically divide images into areas (Zhu et al. 2017). Nowadays, deep learning models using remote sensing data (e.g. Sentinel-2, Landsat, Worldview and UAV) have been applied to different land-use planning fields . Various neural networks have been applied in deep-learning processes, such as Support Vector Machine (SVM), Convolutional Neural Network (CNN), fully convolutional network (FCN) and U-shaped convolutional neural network (U-Net) (Zhu et al. 2017). The greatest challenge for the deep-learning application is finding out which data-learning algorithms are needed to detect image features reliably, how many different training samples are needed and how variable is their performance.
Additionally, the deep learning models for land-cover classification have been commonly designed for inland or coastal ecosystems (Feng et al. 2019) and not for isolated islands which contain different dynamic natural ecosystems, such as wetland and deep-sea ecosystems. Therefore, developing deep-learning models for island categorisation is becoming more relevant for scientists and managers (Hamylton et al. 2020). These deeplearning-based models, which use both spatial and spectral data, are being evaluated as a possible end-to-end solution for the categorisation of island ecosystems since they can distinguish between objects impacted by water, waves, tides and currents.
This study aims to develop the most suited deep-learning models, based on the U-shapebased neural network for classifying and monitoring ten ecosystems on a particular island in Vietnam using Sentinel-2 images. This study addresses three issues related to deeplearning-based ecosystem type classification on a particular island in Vietnam: • What are the benefits to use deep-learning models for island ecosystem type classification? • How do U-Net models compare to traditional models in island ecosystem type classification? • How were ecosystem types distributed on the Con Dao Island of Vietnam during the last five years?
A 4-band Sentinel-2 image (including red, green, blue and near-infrared) and digital elevation models (DEMs) were utilised as input data for the U-Net (basic) models to categorise different island ecosystems. Land covers on an island in Vietnam of about 20 km x 25 km were built as a mask for training deep-learning models (Sections "study area" and "input dataset preparation"). An accuracy comparison was made between the results obtained from the trained U-Net models and two benchmark techniques, namely Random Forest (RF) and Support Vector Machine (SVM). Lastly, the new Sentinel-2 images taken since 2017 were used to analyse changes in land cover on Con Dao Island, Vietnam.

Study Area
The Con Dao Island, which is a district of the Ba Ria-Vung Tau Province in the southeast of Vietnam, approximately 187 km from Vung Tau City, was chosen as the study area (Fig. 1). Con Dao Island comprises altogether 16 sub-islands with a total natural area of approximately 76 km , the largest of which is Con Son Island (52 km ), which serves as the island district's economic, political, cultural and social centre (Hong Nguyen et al. 2014). Con Dao National Park has a high level of biodiversity, with many rare and valuable species, has been designated as a Ramsar site and is a member of the Network of Important Sea Turtle Conservation Areas of the Indian Ocean -Southeast Asia region (IOSEA) (PPC 2019).
In addition to the typical habitats comprising woods, rivers, streams, lakes, sandbanks and residential areas, the study area also includes specific ecosystems, such as corals, seagrasses, shallow seas and deep seas. The mangrove forest ecosystem on Con Dao Island is narrow, with approximately 30 ha, located primarily on three sub-islands (PPC 2019). This ecosystem type can only develop in arc-shaped bays, which are exposed to a few powerful waves, contain dead coral, gravel and are subjected to annual alluvium deposits. Therefore, the area of the mangrove ecosystems in this Island is smaller than those in coastal areas. Secondly, coral reefs are located at most islands at a depth of 6-22 m, typically at around 6-8 m (PPC 2019). Thirdly, the seagrass ecosystem covers an area of more than 1000 ha, is maintained by the Con Dao National Park and is primarily located in two areas: Con Son and Dam Trau Bays (PPC 2019). The ecosystem of Con Dao may be classified into terrestrial, wetland and marine ecosystem types (Tuan 2012). The forest ecosystem is split into natural timber, mangrove and bamboo forests. Other habitats, such as highland areca palm forests, sandy beaches and annual trees, are dispersed throughout residential areas.
Based on remote sensing and GIS technology, Tuan (Tuan 2012) clarified that the size of the forest area has not changed much during the last 20 years; only about 1.65% of the islands' area showed a decrease. The main reason is the expansion of the airport and the water reservoir in the period from 1996-2000(Hong Nguyen et al. 2014). In the 2020s, tourism development was identified as a spearhead economic sector. All economic activities of production and services are aimed at serving ecotourism development in an effective and sustainable manner (Hong Nguyen et al. 2014). It is forecast that the increase in population size and the number of tourists will lead to various pressures on local land-2 2 use planning from 2021 to 2030 (PPC 2017). Therefore, it is necessary to develop a monitoring system for ecosystem changes in Con Dao under the current socio-economic development context.

Input dataset preparation
The deep-learning model is set up based on three steps (Fig. 2). These are explained in detail in this section, from data review and collection to training and testing models. The best deep-learning models are compared with two traditional models before using them for new predictions.
Three main steps to develop a U-Net model for island ecosystem type classification are shown in Fig. 2. The digital elevation models (DEM) and tidal wave data were needed to get started on the first step of separating the inland, offshore and coastal ecosystems and to build new land-cover predictions (sections 2.4 and 2.5). Additionally, the DEM data were used for separating cliffs with a slope steeper than 25 degrees. ALOS and NOAA satellite data of seafloor and inland elevations of medium resolution were merged with topographical map data at 1:10000 scale. The ALOS sensor measured 30 metre elevation data (so-called ALOS-DEM) with the use of the Panchromatic Remote Sensing Instrument for Stereo Mapping (PRISM) that were collected, based on the use of the Google Earth Engine programme (Mahdianpari et al. 2017). Due to the fact that ALOS-DEM data only provide elevation information for terrestrial areas, the offshore relief was taken from the NOAA data (Kuo et al. 2000). A 90 metre resolution raster for the offshore area was created using NOAA-DEM data that were projected at WGS84/UTM 48N and downscaled to 30 m. In order to generate a complete DEM for both parts of the research area, the authors used ArcGIS software to combine the NOAA-DEM data with the ALOS-DEM data.
Regarding the tide level, the land-sea boundary can be identified differently on the Sentinel-2 image between high and low tide during a day. Due to the tides in the research area fluctuating from 0.5-3.5 m, the boundary between land and sea can be identified in the elevation data from -2 m to +2 m. It could be a large coastal area. Therefore, the tidal information is also collected to correct the boundary between inland and wetland ecosystems obtained from DEM and Sentinel-2. According to the metadata of the Sentinel-2 images, seven images were taken at about 3:00 am. Meanwhile, the local tide at that time is about 2.0-2.3 m. Therefore, it does not make a large change of coastline in the seven images.
In addition to the cliff separation, based on ALOS and NOAA DEM data, the Sentinel-2 image obtained in February 2019 was integrated with the field mission in January 2021 to identify nine other island ecosystem types with cloud and its shadow. The initial stage of classification was image segmentation, based on the pixel using eCognition software (Trimble 2018). The segmentation process aimed to achieve uniformity in each image object and a pair of adjacent objects were combined to reduce heterogeneity. However, objects with different colours, structures and shapes were always classified as the same type in some regions. Conversely, several objects with the same colours, structures and shapes were included in the different categories. Therefore, it was necessary to integrate the visual interpretation with the auxiliary data collected to increase the manual island ecosystem type classification accuracy.
Fieldwork was carried out in January 2021 at Con Dao Island, Ba Ria-Vung Tau Province, to verify the visual interpretation that was done indoors. It is difficult to find a good-quality Sentinel-2 image on an island due to the effects of the cloud and its shadow, especially a suitable image in 2020. Therefore, the fieldwork has been done in January 2021 when the image obtained in April 2021 was not published. To improve the accuracy during the fieldwork, the authors worked with the National Park managers in Con Dao Island to identify the stable area of ten island ecosystems during three years and then used them as samples. The area with high changes in land cover was eliminated in the sampling. With this method, the authors can identify correct samples in 2019. With the inland ecosystems, the authors could access them easily. With the wetland ecosystems, the authors had to use both boats and diving equipment for observation and sampling. Twenty polygons for each island ecosystem types for image interpretation were randomly selected to assess the accuracy, based on fieldwork samples. Each polygon was limited by the circular plots with a radius of 40 m. In total, 180 polygons (20 polygons x 9 categories = 180 polygons) were checked in the fieldwork and compared with the visual interpretation results from the satellite image obtained on 07/02/2019 ( Fig. 1). Fig. 3 shows the sampling on the Sentinel-2 and the field image in January 2021. With the combination of natural colours, the shallow water surface that is distributed along the coastline is easy to distinguish on images with light tones, while deepwater surfaces have darker tones and are distributed further from the shallow water surfaces. Two ecosystem types (seagrass and sandy dunes), which are located next to each other, are also distributed along the coast and have a linear shape. Comparing the samples on the remote-sensing image analysis and in the fieldwork, the authors differentiated these two ecosystem types, based on their luminosity. The sandy dunes tend to reflect light more strongly than the seagrass.
In the study area, it is challenging to distinguish mangroves and corals on the images from a pixel-based classification because their total area is so small and scattered. However, these types of ecosystems are easily accessible in the field. Therefore, these types were added to the outcome of the U-Net model after the fieldwork. For natural forests, the vegetation density is high, so the pixels in the image have a relatively uniform reflectance spectrum with the tone of natural colours and the forest edges often have irregular shapes. For residential areas, due to the appearance of many different objects, such as buildings, gardens, roads and parks, the reflection spectrum is not uniform, with relatively clear boundaries. The spatial arrangement of the residential area manifests itself in the orderly repetition of colour tones and similar structures.
Based on the main sample characteristics, the authors interpreted ecosystem types, based on their colours, structures and shape from the segmentation process in the eCognition software (Trimble 2018). This process formed altogether 9546 polygons classified into 11 categories (nine inland ecosystems, cloud and shadow). Regions with the same colours, structures and shapes have been combined into one ecosystem type. For areas with the same colours, structures and shape, but different natural characteristics, we additionally used higher resolution images like Google Earth or the land-use map. These more precise results of this step are essential for developing the U-shaped Deep-Learning models in the next steps.

Setting up U-Net model for island ecosystem type classification
The basic U-Net architecture is a supervised learning algorithm, based on a Convolutional Neural Network (CNN) to identify the classes of interest by modifying the parameters of convolutional filters . The term "U-Net" relates to its shape. It is similar to the "U" letter with three main parts, including contraction (or encoder), bottle-neck and expansion (or decoder). First, it does not use any fully connected layers during the classification process. The other half of the U-Net provides the connection between features. From that, the U-Net could help to implement any size of input data. Second, the U-Net uses the padding method, which allows the architecture to be partitioned into completed images. This method is critical in segmentation due to its ability to avoid the limitation of GPU memory in the classification process (Dang et al. 2020a). It explains why the U-Net has been applied in various studies, including the research related to ecosystem type classification.
The structure of the U-Net model for island ecosystem type classification is presented in Fig. 4. The input image is passed in the encoder part by different blocks with two CNN layers, a 3 x 3 kernel size and one 2 x 2 Max Pooling layer. The number of kernel and feature maps is doubled after each block. It means that the spatial resolution of the input image is decreased and the spectral resolution of the input image is increased. This structure can support the efficient learning of complex features ). However, the most important part of the basic U-Net is focused on the expansion part (or decoder). The encoder part also includes different blocks with two 3 x 3 CNN layers and one 2 x 2 up-sampling layer. Nevertheless, each input image block can be added to the feature map of the respective encoder to keep the structure of features during the regeneration process (Diakogiannis et al. 2020). It should be noted that the number of encoders and decoders is the same. In this study, the U-Net model for the island ecosystem type classification was implemented using Python and the Scikit-Learn package. Numerous combinations of parameters were pre-defined, including the number of filters, number of hidden layers, batch size and dropout probability, to obtain the optimal parameters for the model. Besides, the number of iterations was also modified to avoid the over-fitting problem during the training process. The results were compared by using accuracy assessment indices such as Overall Accuracy, loss function and Kappa values.

Model optimisation
Based on the deep-learning approach, various methods have been used to optimise a U-Net model, such as the changes of training size, optimiser functions and loss function Regarding the optimisation, various loss functions were considered in this study. In most cases, the loss function has been used to calculate the quantity that the model should attempt to minimise throughout the training process. The mean squared error function is the most frequently used loss function in regression models, while the "cross entropy" loss function is the most commonly used loss function in classification models, based on probability calculations (Pasupa et al. 2020). As 11 land-cover objects were assigned an integer value in the model before they were translated to the corrected names in the integration step of the U-Net models, the binary cross entropy was not appropriate. The binary cross-entropy function is used to calculate the cross-entropy loss between actual labels and forecast labels in binary data. Therefore, the "categorical cross-entropy" loss function was selected for the multi-island ecosystems. Multi-class classification models utilise this function type to assign a number or a one-hot code as the output label. The Cross-Entropy loss value was estimated after running Softmax activation layers (Elfwing et al. 2018). Therefore, it is called "Softmax Loss". The "categorical cross-entropy" loss function evaluates the performance of a model that generates a probability between 0 and 1, based on the following formula: (Formula 1) where V denotes the net's estimated scores for each class in 11 island ecosystem types and V denotes the network's estimated score for the positive class.
Different optimiser approaches may be used to build neural networks in order to reduce their related costs (e.g. loss of data information, training time and uncertainty). In this study, four optimiser types were applied including Adaptive Moment Estimation (Adam), Adaptive Gradient Algorithm (Adagrad), Adadelta and Stochastic Gradient Descent algorithm (SGD) (Fig. 5) (Dang et al. 2020a). It was necessary to calculate the errors of the trained models (or the loss function) on a continuous basis while running the optimisation cycles. After each epoch, the weights of all trained U-Net models were adjusted in order to reduce the size of the weight loss for the next assessment as much as possible. This figure offers a high-level summary of the optimisation techniques that have been previously covered. The training size and the number of filters were modified in each optimiser method. Lastly, selecting the optimal optimiser technique is the most efficient method of determining a model with the highest accuracy and the lowest loss function value.

Model comparison
In order to assess the performance of all trained U-Net models for ecosystem classification, based on an object-based approach, two traditional models, based on a pixel-based approach were generated, including random forest (RF) and support vector machine (SVM). As these two models were made, based on the pixel-based approach, the optimiser parameters are also different from the U-Net models, as follows: i p

Random Forest (RF)
Random Forest (RF) is a powerful algorithm in a supervised-learning class, based on the predicted results of decision trees for resolving problems in classification and regression. This algorithm was firstly introduced by Breiman and his group in 2001 (Breiman 1996). RF allows combining (or ensembling) a large number of weak models to obtain better results with a higher accuracy than a single model. Each sub-model (or each decision tree) in the classification is assessed by a voting method to identify which one is the best model. In this case, majority voting is commonly used (Lary et al. 2016). Other voting approaches were also implemented in RF with lower frequency, such as veto and weighted voting methods. Create decision trees in the forest for each sample; 3.
Vote for the predicted result; and 4.
Return the decision tree with the most votes.
During the training process, the RF decreases the bias and increases the variance of the model. From that, it avoids the over-fitting problem by passing the average of predictions (Mahdianpari et al. 2017). This is one of its main advantages. RF also allows the processing of the missing data problem by using the median of adjacent values. The performance of RF is affected by several parameters, such as max_features, n_estimators and min_sample_leaf. The selection of parameter values is very important because it directly relates to the speed and the accuracy of the model. The higher value of parameters will give the high accuracy. However, it also makes the model speed slower. In addition, there will not be much change in accuracy when the parameters reach a certain value. Therefore, we need to select an optimised value for parameters to have the balance between accuracy and speed. In this study, the input data for the RF model that is similar to those for the U-Net models was encoded through 100 trees (n_estimators) before achieving the final model. The RF has a low memory performance because it has a large number of decision trees, which require processing many times.

Support Vector Machine (SVM)
SVM, or Support Vector Machine, is a popular supervised-learning algorithm that was first proposed in the 1970s (Karatzoglou et al. 2006). This algorithm has been applied in a variety of applications in different fields, including chemistry (Houssein et al. 2020), biology (Huo et al. 2020) and especially in Earth Science for remote sensing image classification (Sabat-Tomala et al. 2020). This is an effective tool in high-dimensional computing space with a low memory cost. The initial idea of SVM is to design an optimal hyper-plane (or the maximal margin) to divide the destination dataset into a separated number of pre-defined classes from the training dataset (Cervantes et al. 2020). In other words, the main goal of SVM is to convert a set of data from the 2-dimensional space into a higher dimensional space and split features into different groups. However, it becomes more difficult to analyse the non-linear properties of data. The soft margin and kernel functions, which Vapnik and Cortes established, were used to solve this limitation (Gopinath et al. 2020).
The performance of SVM highly depends on the selection of kernel functions because it increases the flexibility in creating the decision boundaries of a dataset (Razaque et al. 2021). SVM has five kernel functions including linear, poly, RBF, sigmoid and precomputed. The authors chose the RBF kernel function for this research because it is one of the most widely used kernels, which has similarity to the Gaussian distribution and has a good performance for image classification problems. Besides the kernel function, SVM has two parameters, which affect the performance of the model, such as C and gamma. The gamma parameter allows checking how far the influence of a single training sample reaches. The C parameter, which is considered as a regularisation parameter of SVM, relates to the correct classification of training samples to counteract the maximisation margin of the decision function. The value of the two parameters needs to be optimised to obtain the best performance during the SVM development process. Commonly, SVM models can be optimised with a higher gamma value and lower C value. In this study, the gamma value was selected at 0.2 and C value is 1.0. The input data samples for SVM are similar to RF where it was divided into two arrays including values of attributes (or features) and values of labels (or observed values). In which, the values of attributes were transformed to values in the range [0, 1] using the Min-Max Normalisation method to help to increase the speed performance during the training process. We also applied K-Fold cross-validation to evaluate the models with k = 10. The K-Fold cross-validation splits a dataset into k non-overlapping folds. This technique will allow avoiding the overfitting problem when training the models.
All models were implemented in a workstation (Intel Xeon Silver 4112 2.6GHz; Ram: 128GB DDR4 3200 MHz; Graphics: Nvidia Quadro RTX5000, 16GB, 4DP) using Python programming language via TensorFlow and Scikit-Learn frameworks. After completing both SVM and RF models, the results were compared with the best U-Net model to check the improvement of the selected deep-learning models for the island ecosystem type classification.

Application of trained U-Net models for the island ecossytem type classification
Once the optimal U-Net model for the classification of island ecosystem types using Sentinel-2 and DEM data have been established, its primary purpose was then to identify ten island ecosystem types with cloud and its shadow on new images. This research project concentrated on ten habitats on the Con Dao Island. Six new Sentinel-2 images in the specified region were selected for new interpretation across a three year period (2017, 2019 and 2021). Additionally, as described in above sections, data collection and preprocessing were performed. As soon as the new picture was fed into the trained U-Net, the model made use of the previously learned parameters to convert the new images into particular spatial matrices, creating intermediate matrices and to interpret the appropriate classes for each pixel in the new image. All of these prediction methods are self-contained and do not need additional training data.

U-Net model performance
Based on the changes in the training size, the number of filters and optimiser methods, 48 U-Net models were trained. The total accuracy and loss function values were used to compare the performance of these U-Net models (Table 1). Accordingly, the accuracy showed an upward trend with increasing filter numbers. Although the increase in the training size did not express a clear trend in the loss and accuracy values, the training size at 256 x 256 x 4 made a more accurate prediction in all cases of the optimier methods. In four types of optimiser methods, the UNet-SGD models had the lowest performance compared to other methods. These models commonly provide an average loss value of 0.59 and an average accuracy of 75.1%. Three U-Net models had an accuracy higher than 80%: the UNet-Adam-256-32, UNet-Adadelta-256-64 and UNet-Adagrad-256-64 models. Especially, the UNet-Adadelta-256-64 model was assessed to have the highest performance with an accuracy of 93.36% and a loss function value of 0.16 ( Fig. 6 and Table 1). In general, the loss and accuracy values were closely aligned. These values fluctuated during the first 30 epochs before converging in the last 30 epochs. The faster converging process can be found in the UNet-Adam-256-32 and UNet-Adagrad-256-64 models. The UNet-Adadelta-256-64 model provided a better prediction for some specific island ecosystem types compared to others. Meanwhile, the prediction performance of the UNet-Adam-256-32 model, although it achieved a total accuracy of 81.73%, can be balanced amongst all island ecosystem types.

Accuracy comparison
The accuracy of the island ecosystem type classification on Con Dao Island, based on the interpretation of five trained models is shown in Fig. 7 and Table 2. Accordingly, most inland ecosystems, as well as clouds and their shadows, were predicted similarly in all three model types. The wetland ecosystems along coasts are different, especially with coral reefs, shallow-water areas and deep-water areas. The Unet-Adadelta-256-64 model detected most of the coral reefs, while the Unet-Adam-256-32 model only detected about 75%. The shallow-water areas were interpreted heterogeneously by the Unet-Adam-256-32 and Unet-Adagrad-256-64 models, whereas the distribution of this ecosystem type seems to be more homogenous in the interpretation of the Unet-Adadelta-256-64 model. In the results obtained from the two benchmark models, the coral reefs were not detected by the RF model. The residential areas were mixed with forest and sandy dunes with the RF model result. Both benchmark models predicted that it is difficult to separate deep-water areas from the deep sea. The differences between the results of all U-Net models and the two benchmark models can be seen in the shallow water areas. According to the benchmark models, this specific ecosystem type can be observed in the eastern part of the Island, whereas all U-Net models interpreted its distribution around the Island.  The loss function and accuracy values of three trained U-Net models that achieved the highest performance. The accuracy comparison between the three U-Net models and the two benchmark models with new predictions is shown in Table 2. All three U-Net models can detect four types of island ecosystem types: deep sea, seagrass, residential areas and natural forests. The UNet-Adadelta-256-64 model is the best model for classifying most island ecosystem types with a total accuracy of 86.6% and a Kappa index of 0.9. The two other U-Net models interpret coral reefs and deep-water areas with a low accuracy. In the two benchmark models, the RF only achieved an accuracy of 50% with a Kappa index of 0.5. Although the SVM can interpret seagrass and natural forest with an accuracy higher than 80%, it cannot be used to interpret coral reefs and deep-water areas. Therefore, it is easy to confirm that the results from all U-Net models have a higher accuracy than those from the two benchmark models.

Island ecosystem changes in Con Dao Island
Fig. 8 depicts the distribution of ten island ecosystem types on the Con Dao Island. Besides the ten ecosystem types that were separated successfully, based on the UNet-Adadelta-256-64 model, cliffs were identified, based on the DEM data with a slope higher than 30 degrees. The speed at which one can interpret a full Sentinel-2 image is about 125 to 140 seconds. Clouds and their shadows were found in all Sentinel-2 images and were then combined into one type. The mangrove and natural forests have been maintained or have slightly decreased since 2017. The wetland ecosystems changed significantly due to the effects of ocean currents and tide levels, especially in shallow-water (about 7-10%), deep-water areas (about 10-14%) and others (about 2-4%). The ocean currents from the north-eastern part of the Island in the dry season (from October to April) have created suitable conditions for seagrass and coral reefs to develop on the south-eastern side of the Island ecosystem classification from ALOS and NOAA DEM data and multi-temporal Sentinel-2 images, based on the UNet-Adadelta-256-64 model.
Island. Sand dunes and residential areas are also stable on this side. The area of the deep-water regions have increased significantly in July and different high-slope cliffs developed along the north-western side.

Comparison with formal networks/frameworks
It is worthwhile to have a tool that suits the specific needs of different stakeholders (e.g. land managers). This research project developed different deep-learning models to interpret ten different inland and offshore ecosystem types on the famous Island of Con Dao, Vietnam. As island ecosystems are commonly affected by both local and global climates, especially by storms and waves, the land cover of all ecosystems can change rapidly during rainy and dry seasons. Previous studies have already developed classification models for inland and coastal wetland ecosystems; however, some island ecosystems, such as coral reefs and seagrass, were not identified. The addition of these two ecosystems in the trained models can meet the needs of island managers. In comparison, generating an island land-cover map using conventional interpretation techniques with actual field samples may take considerable time. Meanwhile, the UNet-Adadelta-256-64 model can effectively and quickly interpret ten different island ecosystem types, clouds and their shadows from recent satellite images using training weight and calibration results contained in the trained model.
In addition to the former inland ecosystems, clouds with their shadows and seven wetland ecosystem types, based on the RAMSAR and MONRE classification systems, were added to the trained models. The addition of seven wetland ecosystem types is the first difference in comparison to all other models that were developed in previous studies (Pouliot et al. 2019, Dang et al. 2020a. Previous studies mainly explored methods and models to describe wetlands, rather than why their findings matched the wetland categorisation systems and how to implement their findings in practice. In this study, the preparation for all U-Net models, training and testing steps, based on the remote sensing images, were explained in detail. Secondly, as an additional function compared to the traditional models, all trained U-Net models can specifically separate clouds and their shadows, as well as objects covering natural and anthropogenic ecosystems in all islands. It is easy to collect Sentinel-2 or Landsat images without clouds and shadows for inland or coastal areas, but it becomes more complex for islands due to the effects of weather and terrain. For example, on most islands, the clouds and their shadows are near high mountains even in summer. Therefore, it is necessary to add them to the island cover interpretation models. Cloud cover affects the availability of useable satellite data in the study region by preventing optical sensors from acquiring high-quality images of the island ecosystem types. The sky varies significantly in terms of cloud and surface brightness an, in certain instances, it is hard to differentiate between white clouds and bright land, mainly if the land surface is covered with sand. Following that, hazy cloud boundaries and thin clouds obscure ground surfaces, creating ambiguity and making the data harder to interpret. Furthermore, cloud shadows may be combined with darkened, moist soil, water and other dark objects. All these issues influence the interpretation of the cloud and its shadow objects in the trained models. This issue has reduced the accuracy of the models for interpreting these objects to about 80%.

Improvement of island ecosystem type classification models
As the research area is a small island, where the training and testing samples were collected in one year, the U-Net models could not clearly detect coral reefs, mangroves or sandy dunes. All islands are affected by currents waves and annual storms, leading to partly dramatic changes in the offshore sediments and climate. In particular, coral reefs can develop in waters with temperatures ranging from 20-32°C. During the rainy season, they can easily vanish when a wave or current containing offshore sediments flows over them, converting them to shallow water cover. Meanwhile, the mangrove ecosystems commonly develop in coastal areas. Therefore, the areas of coral reefs and mangroves observed on islands are rather small. As a powerful function of deep-learning methods, all U-Net models enable developers to update trained models with new data in order to build more accurate models. When more samples are available, sophisticated models may predict more accurately the kind of island ecosystem and offer more management choices. The multitemporal remote sensing data can be used in this step to optimise the total accuracy, as well as the accuracy of coral reef interpretation. As the area of mangrove ecosystems is too small in the research area, it is necessary to collect more mangrove samples in coastal areas. However, the addition of coastal mangroves can improve the variety between the island and coastal ecosystem types. Therefore, to improve this issue, we think the SAR data from Sentinel-1 or data related to sea surface topography, sea and land surface temperature and ocean and land surface colour, calculated from Sentinel-3, can improve the interpretation of mangrove and coral reefs. However, they all are new sensors and require more research in the future. Some application of SAR data for analysing climatic condition was also mentioned in different Data Cube in European and Asian countries and can correct the distribution of mangrove and coral reefs. However, the resolution of these data is still low. The high-resolution images obtained, for instance, from Lidar or unmanned aerial vehicles (UAVs) can also be used to monitor this specific ecosystem in the future.
The development of 48 U-Net models for island ecosystem categorisation is expensive and time-consuming. A CPU Intel (R) Xeon (R) CPU @ 2.6GHz with 32GB RAM and a GPU NVIDIA GeForce GTX1070 were built for this study. Each U-Net model took from 30 to 40 seconds to train each epoch. Additionally, each RF and SVM model takes 60 to 70 seconds to train, on average. Even though it takes a while to train a U-Net model, fresh data may be used to update a learned model. Future U-Net models may benefit from adopting other optimisation methods, such as evolutionary or swarm intelligence, in place of a traditional optimisation method; or using fresh multi-spectral satellite image data to gain additional knowledge. High-resolution data may be used with a supercomputer to quickly interpret all kinds of (island) ecosystem types.

Conclusions
This study demonstrated the benefits of combining deep-learning and remote-sensing data for monitoring island ecosystem types. Besides interpreting new satellite images in any coastal region at any moment, the UNet-Adadelta-256-64 model was developed to interpret the distribution of ten island ecosystem types, as well as clouds and their shadows. The accuracy of the model reached 93%, with a loss function value of 0.16. The best-trained U-Net model was utilised to effectively identify the island ecosystem types on Con Dao Island within six years using Sentinel-2 data. A total of 11 different ecosystem types was found on Con Dao Island. Besides comparably common ecosystem types, characteristic coral reefs and seagrass can be found surrounding the Island, whereas the distribution of the shallow water ecosystems depends on the season and currents. After five years, the mainland ecosystems have not changed, except for residential areas due to urbanisation. Land-use managers could use the data and approaches to monitor ecosystem dynamics on islands every season instead of using traditional methods that assess changes every five years. It may be possible to retrain the model with additional samples in the future and use it to categorise ecosystems on other islands.