






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Lecture Notes for Lab 15 for GEOL 4342
Typology: Lecture notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Lab1 5 : Time Series Forecasting Instructor: Rashik Islam What is Time Series Forecasting? Time series forecasting is an important area of machine learning that is often neglected. It is important because there are so many prediction problems that involve a time component. These problems are neglected because it is this time component that makes time series problems more difficult to handle. Predictions are made for new data when the actual outcome may not be known until some future date. The future is being predicted, but all prior observations are treated equally. Perhaps with some very minor temporal dynamics to overcome the idea of concept drift such as only using the last year of observations rather than all data available. Time series adds an explicit order dependence between observations: a time dimension. This additional dimension is both a constraint and a structure that provides a source of additional information. Components of Time series Forecasting: Time series analysis provides a body of techniques to better understand a dataset. Perhaps the most useful of these is the decomposition of a time series into 4 constituent parts:
value at the previous time step to predict the value at the next time-step. X(t-1) Y(t) ? 100 100 110 110 108 108 115 115 120 120? Univariate Time Series: These are datasets where only a single variable is observed at each time, such as temperature each hour. The example in the previous section is a univariate time series dataset. Multivariate Time Series: These are datasets where two or more variables are observed at each time. Below is a worked example to make the sliding window method concrete for multivariate time series. Assume we have the contrived multivariate time series dataset below with two observations at each time step time measure1 measure 1 0.2 88 2 0.5 89 3 0.7 87 4 0.4 88 5 1.0 90 We can re-frame this time series dataset as a supervised learning problem with a window width of one: X1(t-1) X2(t-1) X3(t) Y(t) ?? 0.2 88 0.2 88 0.5 89 0.5 89 0.7 87 0.7 87 0.4 88 0.4 88 1.0 90 1.0 90??
Date Time Features: these are components of the time step itself for each observation. Let’s start with some of the simplest features that we can use. These are features from the date/time of each observation. In fact, these can start off simply and head off into quite complex domain- specific areas. Two features that we can start with are the integer month and day for each observation Month Day Temperature Month Day Temperature Month Day Temperature import pandas as pd file_path = '/content/drive/My Drive/Air__Quality_Research/daily-min- temperatures.csv' series = pd.read_csv(file_path, header=0, index_col=0, parse_dates=True)
series['month'] = series.index.month series['day'] = series.index.day
series.columns = ['temperature', 'month', 'day'] series.head() Using just the month and day information alone to predict temperature is not sophisticated and will likely result in a poor model. Nevertheless, this information coupled with additional engineered features may ultimately result in a better model. You may enumerate all the properties of a time-stamp and consider what might be useful for your problem, such as:
Observed: Strong repeating seasonal pattern with moderate noise Trend: Weak and slowly varying long-term change. Seasonal: Dominant, stable annual cycle with consistent amplitude Residual: Random noise with no clear remaining structure Lag Features: Lag features are the classical way that time series forecasting problems are transformed into supervised learning problems. The simplest approach is to predict the value at the next time (t+1) given the value at the current time (t). The supervised learning problem with shifted values looks as follows: Value(t) Value(t+1) Value(t) Value(t+1) Value(t) Value(t+1)
df = pd.concat( [series['temperature'].shift(1), series['temperature']], axis=
) df.columns = ['temp_t-1', 'temp_t']
df['month'] = series['month'] df['day'] = series['day']
df = df.dropna() print(df.head()) You can see that we would have to discard the first row to use the dataset to train a supervised learning model, as it does not contain enough data to work with. The addition of lag features is called the sliding window method, in this case with a window width of 1. We can expand the window width and include more lagged features. For example, below is the above case modified to include the last 3 observed values to predict the value at the next time step. df = pd.concat( [ series['temperature'].shift(3), series['temperature'].shift(2), series['temperature'].shift(1), series['temperature'] ], axis= ) df.columns = ['temp_t-3', 'temp_t-2', 'temp_t-1', 'temp_t']
df['month'] = series['month'] df['day'] = series['day']
#df = df.dropna() print(df.head())
Exercise 1 5 : The Lab_09_Houston.csv dataset contains hourly air quality and meteorological measurements recorded in Houston, Texas starting from January 1, 2013. Load the Houston dataset and perform initial exploration. Task 15.1 Perform seasonal decomposition on the Temp (Temperature) and O3 (Ozone) columns. Compare the seasonal pattern of O3 with Temp. Is the seasonality similar or different? Explain why. Task 15.2 Transform the time series(Temp, O3) into a supervised learning problem using date- time features, 3 lag features, and window features. Use Temp as the target variable. Task 15.3 Extend the feature set to a multivariate time series by including additional predictors alongside O3. Task 15.4 Resample the hourly dataset to daily frequency using the mean and repeat the seasonal decomposition for Temp with period=365. Compare the Trend component between the hourly and daily decompositions.