Research gaps and trends in the Arctic tundra: a topic-modelling approach

Climate change is affecting the biodiversity, ecosystem services and the well-being of people that live in the Arctic tundra. Understanding the societal implications and adapting to these changes depend on knowledge produced by multiple disciplines. We analysed peer-reviewed publications to identify the main research themes relating to the Arctic tundra and assessed to what extent current research build on multiple disciplines to confront the upcoming challenges of rapid environmental changes. We used a topicmodelling approach, based on the Latent Dirichlet Allocation algorithm to detect topics based on semantic similarity. We found that plant and soil ecology dominate the tundra research and are highly connected to other ecological disciplines and biophysical sciences. Despite the fivefold increase in the number of publications during the past decades, the proportion of studies that address societal implications of climate change remains low. The strong scientific interest in the tundra reflects the concern of the rapid warming of the Arctic, but few studies include the cross-disciplinary approach necessary to fully assess the implications of these changes for society.


Introduction
In the coming decades, climate warming will rapidly transform the tundra ecosystems in the Arctic. Thawing permafrost, snow icing events, snow cover decrease, rainfall patterns and hydrological cycles, intensification of wildfires, shifts in growing and flowering seasons and expansion of shrubs and trees are all observable changes that are impacting Arctic tundra ecosystems (Elmhagen et al. 2015, Myers-Smith et al. 2015, Box et al. 2019. Current research focuses on ecosystem functioning and the biotic and abiotic interactions in the ecosystem, but there is a need for research that specifically assesses the implications of climate change for biological diversity, ecosystem services and for the well-being of people living in the tundra regions (Malinauskaite et al. 2019).
Several authors have worked with the identification of research gaps in the tundra biome: Post et al. (2019) presented a broad synthesis about some of the key concerns facing the Arctic under a scenario of 2°C warmer global temperatures. They concluded that the accelerating changes in the Arctic compared to the rest of the world can drastically change ecological systems through species range shifts and declines in large herbivores and threaten indigenous people that highly depend on wildlife and other natural resources for their livelihoods. Other scholars have focused on the implications of climate change on specific indicators, such as phenology (Diepstraten et al. 2018), shrub expansion (Martin et al. 2017, Myers-Smith andHik 2018) or the role of large herbivores at mitigating the expansion of shrubs and trees (Olofsson et al. 2001). Climate-related impacts are not equally distributed across the Arctic, but depend upon the region and the ecosystem contexts (Soininen et al. 2018). Understanding the implications of localised climate-related impacts on tundra ecosystems and societies is crucial for adaptation actions, but current observation systems are biased towards specific bioclimatic zones and disciplines and do not fully reflect the breadth of impacts associated with Arctic warming (Biebow et al. 2019, Virkkala et al. 2019.
Although the biological aspects of climate change are routinely studied, societal implications of climate change in the Arctic have received less attention. Malinauskaite et al. (2019) used a thematic literature review of the ecosystem services literature, finding that there is a knowledge gap in mechanisms and feedbacks of social-ecological interactions, which lead to inefficiencies in integrating ecosystem services into policy-making. Their search was based on a limited number of articles (n = 33) that directly referred to the concept of ecosystem services. Ford et al. (2012) noted that there is an increasing trend in socio-ecological system (SES) research globally, but it is mainly carried out by universities and governmental organisations and do not sufficiently include the priorities, knowledge and concerns of local and indigenous people in the research projects. Social sciences and humanities are also under-represented in Arctic research, but are necessary for understanding the implications of rapid Arctic warming (Niemeyer et al. 2005, Whiteman et al. 2013, Blue 2016. In this study, we quantitatively assess the temporal trends of different research disciplines and identify the main knowledge gaps for understanding the implications of a rapid Arctic warming for tundra ecosystems and societies. We use machine learning and a bibliometric approach to synthesise trends and the topics of relevance across all disciplines and geographical regions. We use Latent Dirichlet Allocation (LDA) (Blei et al. 2003) to identify latent topics in literature, which is quickly being established as a standard procedure to investigate the quantitative patterns and trends of peer-reviewed literature (Valle et al. 2014, Syed and Weber 2018, Luiz et al. 2019. Topic modelling is a probabilistic approach to text mining that cluster words into topics, based on their semantic similarity. This statistical approach facilitates the discovery of the latent topics addressed by each article, based on the content of the text items (e.g. abstracts) that can be labelled according to their most predominant keywords for further screening. This unsupervised approach to cluster the research topics allows researchers to process a large corpus of articles and identify the main topics that each article addresses. This approach is more efficient than manually tagging every article, as user fatigue (Healy et al. 2004) and subjective biases can result in non-repeatable synthesis of data. Assessing large corpora of articles using more comprehensive reviews provide an in-depth understanding, but requires long times, a network of researchers to individually assess each article or a combination of both (Soininen et al. 2018).
Our study aims to complement the more detailed reviews that target a limited set of topics and disciplines to identify knowledge gaps and the degree to which research addresses more than one discipline, with the purpose of better understanding the societal implications of climate change.

Methods
We used bibliometric analysis, which quantitatively assesses trends, based on metadata (e.g. author, year or keywords) and visualise temporal trends, based on the information retrieved. The corpus of these documents can be used for topic discovery using text analysis tools. We used Latent Dirichlet Allocation (LDA), as a probabilistic model that assumes the presence of every word in every topic and the presence of all topics in a given document with varying probabilities (Blei et al. 2003). The topics are grouped into their corresponding disciplines depending on word co-occurrence, given that a coherent set of terms define every individual topic and their most representative words.

Database creation and processing
We searched the Elsevier Scopus (Scopus) database for relevant publications using the search string TITLE-ABS-KEY (tundra) on 26 November 2019. We selected Scopus as a search database due to its wider coverage compared to other search engines (Falagas et al. 2008). This broad search string aimed to identify how research topics have increased and declined over time and to explore how research disciplines on tundra are studied. We used only abstracts for this analysis to obtain a broad overview of the publication trends relating to tundra.
First, we removed articles for which abstracts were not retrieved from the database. We converted all words starting with "graz" and "herbiv" to "grazing" to avoid confusion between these two terms. Additionally, we removed the journal names and copyright notices that are written at the end of the abstracts in order to reduce noise in the topics that may be associated with each journal's publication scope. In addition, we processed the database with a lemmatisation process, where different manners of writing a word (or, for example, verb tenses) are consolidated into a single, consistent word (i.e. the lemmatisation of the words runner, running and ran becomes run) that simplifies the text to fewer words. For that purpose, we used the English lemmatisation tool from the udpipe package in R (Wijffels 2019). Finally, we removed the most common words in a language, such as pronouns (e.g. me, we, their) or interrogative words (e.g. who, why, where) to reduce noise in the database.

Statistical analyses
The statistical analyses were performed using the package bibliometrix (Aria and Cuccurullo 2017) and the topic identification was done using a LDA model from the package textmineR (Jones 2019) in R.
We defined the number of topics (k) as 50 topics: we considered that four to five topics would allow us to identify the disciplines, thus deciding on 50 topics as a conservative estimate of k. We pooled these 50 topics into nine disciplines (modified from Virkkala et al. (2019)) by individually assessing the top words for each topic and manually assigning a discipline to the topics, based on these top words. We used the average coherence between topics (i.e. the semantic similarity between the top words for each topic) as the coherence value for each discipline. Once the LDA model was established, we used it to assess which disciplines were covered in each article. Given that each discipline was composed of several topics, we considered the percentage sum of the topics as the total percentage of a given discipline. We assessed the main topic and degree of multidisciplinarity of the articles by finding which topics were identified for each abstract with more than a 20% probability. On the other hand, articles where only a single topic was assigned with more than a 20% probability were considered as single-disciplinary articles and used to assess the temporal trends in the tundra research.
Finally, we estimated the closeness between disciplines by means of the cosine correlation. For that purpose, we aggregated all the keywords for each discipline and calculated the cosine correlations for all the disciplines combinations. This approach allowed us to find which disciplines are more closely correlated and thus more easily interconnected and which disciplines have weaker connections between them as a proxy for gaps in interdisciplinary collaboration.

Results
The search resulted in 9274 articles that specifically use the word tundra in their research, after removing 253 records with no abstract and nine duplicated records. The interest in tundra research has grown 5-fold during the last 20 years from less than 100 articles per year in the 1990s to over 500 articles per year in 2018 (Fig. 1).
Manual tagging of disciplines, based on the top 20 words, resulted in a coherent topic classification (Suppl. material 1). Plant ecology dominated research on tundra, with 14 topics, followed by soil ecology, with 11 topics (Table 1). These results confirm that the prevailing research on tundra systems addresses fundamental ecosystem science and the functioning of the ecosystem in a changing climate (e.g. nutrient flow from soil to animals through primary productivity). From the articles that cover only one discipline (i.e. the topic probability is higher than 20% for one discipline, n = 5077), plant ecology, soil ecology and paleoecology were clearly dominating the research with over 70% of the publications. Temporal trends in research disciplines in the tundra based on the articles covering a single discipline (n = 5077). Table 1.
Summary of topics belonging to a discipline, mean coherence for each topic group and the number of articles where only one discipline had a probability higher than 20%. The temporal trends in research disciplines show an erratic pattern until the 1980s (Fig. 1). Although the total publication numbers have steadily increased over time, the proportions of the disciplines have remained consistent, with minor variations: plant ecology, soil ecology and paleoecology have dominated the tundra research, while the other disciplines have had a generally low research volume (lower than 15%). However, the main discipline covered by an article does not indicate that other disciplines have no interest, rather it indicates that some disciplines are studied in combination with others (e.g. plant ecology alongside animal ecology). When assessing the disciplinary combinations individually, i.e. disciplines present in an abstract with more than 20% probability, we found that the articles covering a single discipline (n = 5077) have plant ecology, soil ecology and paleoecology as the most prominent disciplines (over 1000 abstracts assigned to each of these disciplines). Articles covering two disciplines (n = 3785) showed that plant ecology was connected to soil ecology (n = 682), paleoecology (n = 403), biogeochemistry (n = 352), animal ecology (n = 248) and SES (n = 167). Soil ecology and biogeochemistry were studied together in 350 articles. From the 407 articles covering three disciplines, plant or soil ecology were consistently present in nearly all the multidisciplinary articles. There were only five articles combining four disciplines.
Cosine correlation coefficients show how the topics are closely interconnected (Fig. 2) and share common characteristics. Paleoecology and SES are the disciplines that are most weakly coupled to the other scientific disciplines. Plant ecology, on the other hand, is strongly correlated to most of the topics, having the strongest correlation with plantherbivore interaction.
Societal implications of a changing Arctic tundra are studied in a total of 873 articles overall, either as the main topic (n = 282 articles) or otherwise. This discipline had the lowest coherence score, reflecting a highly-fragmented field of research drawing on a broad range of perspectives. This represents less than 10% of the research done in the tundra, showing that human dimensions are under-represented in the tundra research as a whole. The cosine correlation coefficients show that SES are weakly correlated to most disciplines, except for animal ecology (cosine correlation = 0.68) and plant ecology (cosine correlation = 0.53), emphasising that the link between humans and nature is poorly understood.

Discussion
Our study presents a quantitative assessment of research topics and trends in the tundra ecosystem. The research interest in the tundra has increased 5-fold since the early 1980s. This is a strong increase compared to the publication rates globally (Bornmann and Mutz 2015), where the overall publication rates have doubled in this amount of time. This reflects that research on the tundra system has gained high societal relevance as climate warms (Conservation of Arctic Flora and Fauna (CAFF) 2013, Arctic Monitoring 2017). Despite the increased efforts to understand how ecosystems are changing in the Arctic, the societal implications of Arctic warming are still a major knowledge gap to effectively prepare for climate change and for advancing research in this region. This is also evident in the low proportion of articles that include more than one discipline and the few articles that address human dimensions.
In our study, more than half of the analysed articles (n = 5077) were assigned to a single discipline. More integrative studies are needed with a stronger multidisciplinary or interdisciplinary focus to strengthen the present information flow between disciplines and that directly aim to bridge the gaps between the single-focus disciplines (even closelyrelated disciplines, such as plant and soil ecology) to achieve a more efficient, informationdriven management. The potential effects of the expected shifts in the tundra ecosystem (Wipf et al. 2006, Ylänne et al. 2015 need to be considered not only from the ecological Cosine correlation matrix between the disciplines. Darker colours represent higher cosine correlations.
point of view, but should also include social and economic impacts (Berkes and Jolly 2002, Parkinson and Evengård 2009, Jansson et al. 2015. Furthermore, given the large societal implications expected from Arctic warming, there is a need for a stronger focus on human dimensions in tundra ecosystem that integrates social science with ecology to address the implications of climate change on livelihoods. The low coherence for all topics indicates sparsity of the language used in the different articles. The specificity of each article to a given ecosystem process, for example, the tundra plant ecology, can cover the forest-tundra ecotone, the dwarf shrub tundra or the nutrient intake of plants under different biotic and abiotic conditions, amongst others. On the other hand, the language specificity facilitates assigning a discipline to each topic, based on the top keywords, since these keywords are strong representatives of their corresponding discipline, for example, forest growth is a clear representative of the plant ecology discipline. The low coherence score in the SES topic (0.02) shows that the field of research most relevant for understanding societal implications is fragmentary and less prevalent compared to the traditional disciplines, which is related to the fact that SES research trades pieces of knowledge between disciplines.
The cosine similarity analysis (Fig. 2) shows that all topics are connected through their main keywords: plant ecology has a consistently high correlation with all the other topics. Given the importance of primary productivity in tundra ecosystems, it is expected that most of the disciplines are, at least partly, related to this topic (Stoessel et al. 2019). In general, the cosine correlation analysis shows that the different disciplines are not isolated fragments of knowledge, but rather highly interconnected information highways. The knowledge generated in a given discipline depends, at least partly, on previous research in related disciplines and will in the future feed other disciplines with new information. A structured integration of disciplines would expedite this information flow and generate new management and research opportunities.

Conclusion
Our study presents a description of the current status and historical trends of the research in the tundra ecosystem. We show how plant ecology dominates the research in tundra ecosystems and we identify a gap in research showing that there is a need for more multidisciplinary approaches that integrate the expertise of different disciplines to achieve a broader understanding and more efficient management of ecosystem shifts and the societal impacts of climate change.

Data availability
The data underpinning the analysis reported in this paper are deposited at UiT -The Arctic University of Norway's data management system at https://doi.org/10.18710/WBKY7Q