data science · machine learning

Time Series Analysis


Difference between regression and time series: time series are not necessarily independent and not necessarily identically distributed.  They are lists of observations where the ordering matters.  Ordering is very important because there is dependency and changing the order could change the meaning of the data.


  • Is there a trend,  on average, the measurements tend to increase (or decrease) over time?
  • Is there seasonality, meaning that there is a regularly repeating pattern of highs and lows related to calendar time such as seasons, quarters, months, days of the week, and so on?
  • Are their outliers? In regression, outliers are far away from your line. With time series data, your outliers are far away from your other data.
  • Is there a long-run cycle or period unrelated to seasonality factors?
  • Is there constant variance over time, or is the variance non-constant?
  • Are there any abrupt changes to either the level of the series or the variance?

Residual analysis: the correlation is 0 between residuals separated by any given time span.  In other words, residuals should be unrelated to each other.  Residuals usually are theoretically assumed to have an ACF that has correlation = 0 for all lags


What is (weakly) stationary: the autocorrelation for any particular lag is the same regardless of where we are in time

  • The mean E(xt) is the same for all t.
  • The variance of xt is the same for all t.
  • The covariance (and also correlation) between xt and xt-h is the same for all t.


The observations in a stationary time series are not dependent on time.

Time series are stationary if they do not have trend or seasonal effects. Summary statistics calculated on the time series are consistent over time, like the mean or the variance of the observations.

When a time series is stationary, it can be easier to model. Statistical modeling methods assume or require the time series to be stationary to be effective.

How to check stationary:

  1. Look at Plots: You can review a time series plot of your data and visually check if there are any obvious trends or seasonality.
  2. Summary Statistics: You can review the summary statistics for your data for seasons or random partitions and check for obvious or significant differences.
  3. Statistical Tests: You can use statistical tests to check if the expectations of stationarity are met or have been violated. Unit root test (Augmented Di)

Auto-correlation (ACF):


For an ACF to make sense, the series must be a weakly stationary series.The ACF can be used to identify the possible structure of time series data. The ideal for a sample ACF of residuals is that there aren’t any significant correlations for any lag.

Partial Auto-correlation Function(PACF):

For a time series, the partial auto-correlation between x_t and x_{t-h} is defined as the conditional correlation between x_t and x_{t-h}, conditional on x_{t-h+1},..., x_{t-1}, the set of observations that come between the time points and t−h. (The two variances in the denominator will equal each other in a stationary series.)



De-trending:  De-trending each series using a linear regression with t, the index of time, as the predictor variable. The de-trended values for each of the three series are the residuals from this linear regression on t. The de-trending is useful conceptually because it takes away the common steering force that time may have on each series and created stationarity.


The First-order Autoregression Model (AR(1)):

x_t  = \delta + \phi_1 x_{t-1} + \omega_t


  • \omega_t \sim N(0,\delta_omega^2) , meaning that the errors are independently distributed with a normal distribution that has mean 0 and constant variance.
  • Properties of the errors \omega_t are independent of x_t.
  • The series x_1, x_2, … is (weakly) stationary.  A requirement for a stationary AR(1) is that |ϕ1|<1|ϕ1|<1.


  • \mu = \delta_1 = \phi_1
  • $latex Var(x_t= \frac{\sigma_\omega^2}{1-\phi_1^2}$
  • The correlation between observations h time periods apart is \rho_h = \phi_1^h

Pattern of ACF:



Pattern of PACF:  the theoretical PACF “shuts off” past the order of the model. Use PACF to choose an AR model.

Moving Average Models (MA)

The moving-average model specifies that the output variable depends linearly on the current and various past values of a stochastic(imperfectly predictable) term (white noise).

The first order moving average model MA(1) is: = \mu + w_t + \theta_1 w_{t-1}


  • E[x_t] = \mu
  • $latex Var[x_t] = \sigmaσw 2(1 + θ12)$
  • Autocorrelation function (ACF) is

captureA property of MA(q) models in general is that there are nonzero ACF for the first q lags and autocorrelations = 0 for all lags > q. Use this property to choose q. Unlike AR model, the MA model is always stationary.

Pattern of ACF:

Pattern of PACF:

Auto-regressive Integrated Moving Average (ARIMA):

Elements of the model: AR order, differencing, MA order






Vector Autoregressive models (VAR):

  1. The structure is that each variable is a linear function of past lags of itself and past lags of the other variables. u_t includes terms to simultaneously fit the constant and trend.capturecapture1
  2. Information criterion statistics to compare VAR models of different orders:
  3. Use ACF plot of residuals
  4. Examine cross-correlation of residuals





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s