Tutorial: M4 Daily

This notebook is designed to give a simple introduction to forecasting using the Deep4Cast package. The time series data is taken from the M4 dataset, specifically, the Daily subset of the data.

[1]:
import numpy as np
import os
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

import torch
from torch.utils.data import DataLoader

from deep4cast.forecasters import Forecaster
from deep4cast.models import WaveNet
from deep4cast.datasets import TimeSeriesDataset
import deep4cast.transforms as transforms
import deep4cast.metrics as metrics

# Make RNG predictable
np.random.seed(0)
torch.manual_seed(0)
# Use a gpu if available, otherwise use cpu
device = ('cuda' if torch.cuda.is_available() else 'cpu')

%matplotlib inline

Dataset

In this section we inspect the dataset, split it into a training and a test set, and prepare it for easy consuption with PyTorch-based data loaders. Model construction and training will be done in the next section.

[2]:
if not os.path.exists('data/Daily-train.csv'):
    !wget https://raw.githubusercontent.com/M4Competition/M4-methods/master/Dataset/Train/Daily-train.csv -P data/
if not os.path.exists('data/Daily-test.csv'):
    !wget https://raw.githubusercontent.com/M4Competition/M4-methods/master/Dataset/Test/Daily-test.csv -P data/
[3]:
data_arr = pd.read_csv('data/Daily-train.csv')
data_arr = data_arr.iloc[:, 1:].values
data_arr = list(data_arr)
for i, ts in enumerate(data_arr):
    data_arr[i] = ts[~np.isnan(ts)][None, :]

Divide into train and test

We use the DataLoader object from PyTorch to build batches from the test data set.

However, we first need to specify how much history to use in creating a forecast of a given length: - horizon = time steps to forecast - lookback = time steps leading up to the period to be forecast

[4]:
horizon = 14
lookback = 128

We’ve also found that it is not necessary to train on the full dataset, so we here select a 10% random sample of time series for training. We will evaluate on the full dataset later.

[5]:
import random

data_train = []
for time_series in data_arr:
    data_train.append(time_series[:, :-horizon],)
data_train = random.sample(data_train, int(len(data_train) * 0.1))

We follow Torchvision in processing examples using Transforms chained together by Compose.

  • Tensorize creates a tensor of the example.
  • LogTransform natural logarithm of the targets after adding the offset (similar to torch.log1p).
  • RemoveLast subtracts the final value in the lookback from both lookback and horizon.
  • Target specifies which index in the array to forecast.

We need to perform these transformations to have input features that are of the unit scale. If the input features are not of unit scale (i.e., of O(1)) for all features, the optimizer won’t be able to find an optimium due to blow-ups in the gradient calculations.

[6]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.LogTransform(targets=[0], offset=1.0),
    transforms.RemoveLast(targets=[0]),
    transforms.Target(targets=[0]),
])

TimeSeriesDataset inherits from Torch Datasets for use with Torch DataLoader. It handles the creation of the examples used to train the network using lookback and horizon to partition the time series.

The parameter ‘step’ controls how far apart consective windowed samples from a time series are spaced. For example, for a time series of length 100 and a setup with lookback 24 and horizon 12, we split the original time series into smaller training examples of length 24+12=36. How much these examples are overlapping is controlled by the parameter step in TimeSeriesDataset.

[7]:
data_train = TimeSeriesDataset(
    data_train,
    lookback,
    horizon,
    step=1,
    transform=transform
)

# Create mini-batch data loader
dataloader_train = DataLoader(
    data_train,
    batch_size=512,
    shuffle=True,
    pin_memory=True,
    num_workers=1
)

Modeling and Forecasting

Temporal Convolutions

The network architecture used here is based on ideas related to WaveNet. We employ the same architecture with a few modifications (e.g., a fully connected output layer for vector forecasts). It turns out that we do not need many layers in this example to achieve state-of-the-art results, most likely because of the simple autoregressive nature of the data.

In many ways, a temporal convoluational architecture is among the simplest possible architecures that we could employ using neural networks. In our approach, every layer has the same number of convolutional filters and uses residual connections.

When it comes to loss functions, we use the log-likelihood of probability distributions from the torch.distributions module. This mean that if one supplues a normal distribution the likelihood of the transformed data is modeled as coming from a normal distribution.

[8]:
# Define the model architecture
model = WaveNet(input_channels=1,
                output_channels=1,
                horizon=horizon,
                hidden_channels=89,
                skip_channels=199,
                dense_units=156,
                n_layers=7)

print('Number of model parameters: {}.'.format(model.n_parameters))
print('Receptive field size: {}.'.format(model.receptive_field_size))

# Enable multi-gpu if available
if torch.cuda.device_count() > 1:
    print('Using {} GPUs.'.format(torch.cuda.device_count()))
    model = torch.nn.DataParallel(model)

# .. and the optimizer
optim = torch.optim.Adam(model.parameters(), lr=0.0008097436666349985)

# .. and the loss
loss = torch.distributions.StudentT
Number of model parameters: 341347.
Receptive field size: 128.
Using 2 GPUs.
[9]:
# Fit the forecaster
forecaster = Forecaster(model, loss, optim, n_epochs=5, device=device)
forecaster.fit(dataloader_train, eval_model=True)
/home/austin/miniconda3/envs/d4cGithub/lib/python3.6/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '
Epoch 1/5 [915731/915731 (100%)]        Loss: -1.863526 Elapsed/Remaining: 3m52s/15m30s
Training error: -2.67e+01.
Epoch 2/5 [915731/915731 (100%)]        Loss: -1.963631 Elapsed/Remaining: 11m21s/17m2s
Training error: -2.71e+01.
Epoch 3/5 [915731/915731 (100%)]        Loss: -1.983338 Elapsed/Remaining: 18m42s/12m28s
Training error: -2.75e+01.
Epoch 4/5 [915731/915731 (100%)]        Loss: -1.974977 Elapsed/Remaining: 26m2s/6m30s
Training error: -2.78e+01.
Epoch 5/5 [915731/915731 (100%)]        Loss: -2.073579 Elapsed/Remaining: 33m20s/0m0s
Training error: -2.83e+01.

Evaluation

Before any evaluation score can be calculated, we load the held out test data.

[10]:
data_train = pd.read_csv('data/Daily-train.csv')
data_test = pd.read_csv('data/Daily-test.csv')
data_train = data_train.iloc[:, 1:].values
data_test = data_test.iloc[:, 1:].values

data_arr = []
for ts_train, ts_test in zip(data_train, data_test):
    ts_a = ts_train[~np.isnan(ts_train)]
    ts_b = ts_test
    ts = np.concatenate([ts_a, ts_b])[None, :]
    data_arr.append(ts)
[11]:
# Sequentialize the training and testing dataset
data_test = []
for time_series in data_arr:
    data_test.append(time_series[:, -horizon-lookback:])

data_test = TimeSeriesDataset(
    data_test,
    lookback,
    horizon,
    step=1,
    transform=transform
)
dataloader_test = DataLoader(
    data_test,
    batch_size=1024,
    shuffle=False,
    num_workers=2
)

We need to transform the output forecasts. The output from the foracaster is of the form (n_samples, n_time_series, n_variables, n_timesteps). This means, that a point forcast needs to be calculated from the samples, for example, by taking the mean or the median.

[12]:
# Get time series of actuals for the testing period
y_test = []
for example in dataloader_test:
    example = dataloader_test.dataset.transform.untransform(example)
    y_test.append(example['y'])
y_test = np.concatenate(y_test)

# Get corresponding predictions
y_samples = forecaster.predict(dataloader_test, n_samples=100)

We calculate the symmetric MAPE.

[13]:
# Evaluate forecasts
test_smape = metrics.smape(y_samples, y_test)

print('SMAPE: {}%'.format(test_smape.mean()))
SMAPE: 3.1666347980499268%