Initially, we performed validation using ANN and LSTM to choose the best model for our data. ANN and LSTM models are trained and tested with data using an input parameter and an output parameter. The obtained R-squared values for temperature, snow cover and NDVI using ANN are 0.681, 0.66, 0.404 (Fig. 3a–c) and 0.89, 0.674, 0.48 using LSTM, respectively. (Fig. 4a–c).

From the above results, we can conclude that deep learning offers many advantages for time series forecasting, such as machine learning time dependence and dealing with time structures such as characteristics and seasonality. . It can be applied for the multiple numbers of inputs used for a mapping function and thus support multivariate time series forecasting. With its efficient way of extracting patterns from long sequences of input data, it can be perfectly applicable to deal with massive amounts of data, complex parameters and multi-step actions. To increase the accuracy of LSTM, we tried two input models and one output model with different hyperparameters.

### LSTM implementation

In this section, LSTM was applied to different models to forecast time series data to find the best fitting model (which has high r-squared values) for individual environmental factors.

#### Using the LSTM model with one input and one output

Here, the LSTM model has been implemented to predict a factor (temperature example) using the same factor as the input characteristic. Take data for each parameter individually from 2001 to 2015 to train the model and the next 2 years of data to test the model. A dense layer is added at the end of the model to make it more robust. Since only one value needs to be predicted in the output, only one neuron will be defined in the dense layer. We have used the root mean square error as the loss function and the Adam optimizer is used to optimize the algorithm. The figures mentioned below show the test results of temperature, snowfall and NDVI respectively with one characteristic. The R-squared values of these results are 0.89 for temperature Fig. 4a, 0.674 for snow cover Fig. 4b and 0.48 for NDVI Fig. 4c. If the R-squared value is greater than 0.75, then it is considered a good model. But as mentioned, it is only bigger for temperature and not for snow and NDVI, so this model with one characteristic is best suited for temperature but not ideal for the other two parameters.

#### Using the LSTM model with two input and one output features

LSTM can easily run into difficulties with multiple input parameters. All we would like is perhaps a 3D input vector that needs to be fed into the LSTM input form. So if you find a way to modify the input parameters to be represented in 3D vector form, we can use LSTM. The parameters used, that is to say the temperature, the snow cover, the NDVI are interdependent on each other. In the above method, the model takes each parameter individually for input as well as output and forecast, but the obtained R-squared values are not efficient and we cannot accept the model. To increase the accuracy of the model, we implemented the LSTM model with several combinations of these input parameters and an output characteristic and forecasted monthly, seasonally and annually accordingly.

#### Using the LSTM model with input functionality and output functionality

In the monthly forecast, we will take the parameters based on the month and take two parameters as input and one parameter as output. Here we take different combinations of parameters such as temperature and snow cover, temperature and NDVI, snow cover and NDVI. To render all input features on a scale of 0 to 1, the input data is normalized using MinMaxScaler. Since this is a time series problem, if we are to correctly forecast snowfall, we must consider the temperature and snowfall of previous months, as they are interrelated. Therefore, to forecast the NDVI, we consider these two combinations temperature-NDVI, snow cover-NDVI from previous months. A test dataset is created using 30% of the latest data as well as data created going back 60 months in the past for forecasting. This problem is a multivariate regression problem because it involves several input climatic characteristics. The model is compiled using the Adam optimizer and the root mean square error is used as the loss function for this regression problem. To get the precise values, we need to tune a set of hyperparameters, i.e. number of neurons, optimizer, epochs, etc. In this monthly case, we tuned the neurons to find the point where the model performs well. The present investigation used dropout with different values such as 0.25, 0.5, and 0.75 to avoid overfitting, and it was observed that a value of 0.5 was more suitable for the dropout layer for all models based on model performance, stability and efficiency. Changing the number of neurons can also occur underfitting and overfitting, so we need to choose an appropriate number of neurons. The R-squared values of these results are for temperature and snow cover Fig. 5a,b is 0.85 previous 0.51, for temperature and NDVI Fig. 6a,b is 0.79 previous 0.48 and for snow cover and NDVI Fig. 7a,b is 0.83 previous to 0.48. If the R-squared value is greater than 0.75, then it is considered a good model.

#### Seasonal

As it is a question of temperature, snow cover and vegetation which vary according to the seasons, we have divided a year into four seasons, namely winter (December-February), pre-monsoon (March-May), monsoon (June-August), and post-monsoon (September-November). We converted the monthly data to seasonal data by taking the average of the months of that specific season, for example to find the temperature value of the monsoon season, we took the average of 3 months of the monsoon season . In this seasonal case, we tuned the various optimizers i.e. ADAM, RMSProp, Adamax to get accurate results. In this method, the model is most suitable for the ADAM optimizer because it combines the advantages of AdaGrad and RMSProp and it achieves high R-squared values for temperature and snow cover Fig. 8a is 0.95, for temperature and NDVI Fig. 8b is 0.89 and for snow cover and NDVI Fig. 8c, it is 0.92 (Table 1).

#### Annual

The dataset consists of 17 years of monthly data from which we averaged all the months in that year to find annual data. The data is insufficient as it contains only 17 years of annual data. Compared to SVM, decision tree and other classifiers, neural networks may not categorize data very well for a lower volume of the dataset. Therefore, the prediction model must be large enough to capture the interrelationships of the data as well as the characteristics of the problem domain (for example, the number of categories). The initial layers of the model are used to capture the interrelationships between different parts of the input (i.e. patterns and outlines). Features that can help classify the desired outputs are captured by the output layer to give the final decision. If the number of entity parameters and the amount of data required for prediction are large, the chances of problems having high complexity increase proportionally. As a result, this causes an overfitting of the model. The R-squared values obtained for temperature and snow cover Fig. 9a are 0.95, for temperature and vegetation Fig. 9b, they are 0.97 and for snow cover and NDVI Fig. 9c, they are 0.98. We therefore do not consider the annual case of our model because of the overfitting.

#### LSTM forecast

Here we applied LSTM to forecast data by taking an input and forecasting an output for the next seven years. The data is split into 75% for training and 25% for testing the two-layer LSTM model with a dense output layer, then we forecast the output step by step for the test data and the model feeds back the current predicted value in the entry window by moving it one step forward to forecast the next time step and this is called the advance window technique. The study uses a rolling window of size 12. The 13th data point was forecast using the first 12 data entries. And similarly, the 14th data point is forecast using a window between 1 and 13 data entries and the same procedure continues. To move the window, panda’s shift function is used to shift the entire column by the specified number. Similarly, the column is shifted by 1 and then combined with the actual data. After setting the window size, the first 12 columns of the array become input features ×1, ×2,…, ×12 and the 13th column becomes the target y. It is like using LSTM for the NLP task, it looks like a 12 length fixed sequence of a sentence containing 12 words each and trying to predict the 13th word. Using this method, environmental factors individually for seven years are predicted. Figures 10a-c show the predicted values for temperature, snow cover, and NDVI for the next seven years.