autopycoin.dataset.generator.WindowGenerator#

class autopycoin.dataset.generator.WindowGenerator(input_width: int, label_width: int, shift: Union[None, int] = None, valid_size: Union[int, float] = 0, test_size: Union[int, float] = 0, flat: bool = False, sequence_stride: int = 1, batch_size: int = None, preprocessing: Union[None, Callable] = None)[source]#

Transform a time serie into an usable format for tensorflow model.

It can be either a pandas dataframe, tensorflow tensor or numpy array.

Parameters
input_widthint

The number of historical time steps to use during the forecasting.

label_widthint

the number of time steps to forecast.

shiftint

Compute the shift between input time steps (input_width) and labels time steps (label_width). Hence if label_width is higher than shift label input and label datasets will have some indentical values.

valid_sizeint

The number of examples in the validation set. Use a float between 0 and 1 to use proportion.

test_sizeint

The number of examples in the test set. Use a float between 0 and 1 to use proportion.

flatbool

Flatten the inputs and labels tensors.

batch_sizeint

The number of examples per batch. If None, then all examples are stacked in one batch. Default to None.

preprocessingcallable() or None

Preprocessing function to use on the data. This function needs to take input of shape ((inputs, …), labels). It is applied after the train, validation and test split. Default to None.

Notes

The dataset’s shape depends on the columns defined in from_array method. There are currently four input tensors which can be added inside the inputs dataset.

Output shape: when all columns components are defined: Tuple of shape ((inputs, known, date_inputs, date_labels), labels)

inputs tensor:

The input tensor of shape (batch_size, input_width, input_columns) or (batch_size, input_width * input_columns) depending if flat is set to True. Basically, they are historical values.

known tensor:

The known tensor of shape (batch_size, input_width, known_columns) or (batch_size, input_width * known_columns) depending if flat is set to True are the variables whose values are known in advance or estimated. For example: time dates or temperatures.

date_inputs tensor:

Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by tf.range.

date_labels tensor:

Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by tf.range.

labels tensor:

The Output variables of shape (batch_size, label_width, label_columns) or (batch_size, label_width * label_columns) depending if flat is set to True. They are the values to predict.

Examples

>>> import pandas as pd
>>> from autopycoin.data import random_ts
>>> from autopycoin.dataset import WindowGenerator
...
... # We generate data
>>> data = random_ts(n_steps=100,
...                  trend_degree=2,
...                  periods=[10],
...                  fourier_orders=[10],
...                  trend_mean=0,
...                  trend_std=1,
...                  seasonality_mean=0,
...                  seasonality_std=1,
...                  batch_size=1,
...                  n_variables=1,
...                  noise=True,
...                  seed=42)
...
>>> w_oneshot = WindowGenerator(input_width=3,
...                             label_width=2,
...                             shift=10,
...                             valid_size=2,
...                             test_size=3,
...                             flat=True,
...                             batch_size=None,
...                             preprocessing=None)
...
... # Here juste inputs and labels tensors are generated
>>> w_oneshot = w_oneshot.from_array(data[0],
...     input_columns=[0],
...     label_columns=[0])
Attributes
input_widthint

Return the input_width.

label_widthint

Return the label_width.

shiftint

Return the shift.

valid_sizeint

Return the valid_size.

test_sizeint

Return the test_size.

flatbool

Return the attribute flat.

batch_sizeint or None

Return the attribute batch_size.

traindataset

Return the train dataset.

validdataset

Return the valid dataset.

testdataset

Build the test dataset.

dataDataFrame or ndarray or Tensor

Return the original data.

property batch_size#

Return the attribute batch_size.

property data: numpy.ndarray#

Return the original data.

property date_columns: List[Union[slice, int]]#

Return date_columns.

property flat#

Return the attribute flat.

from_array(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, tensorflow.python.framework.ops.Tensor, pandas.core.series.Series], input_columns: Union[None, List[Union[int, str]]] = None, label_columns: Union[None, List[Union[int, str]]] = None, known_columns: Union[None, List[Union[int, str]]] = None, date_columns: Union[None, List[Union[int, str]]] = None)[source]#

Feed WindowGenerator with a pandas dataframe or a numpy ndarray.

This method has to be called before using train, `test or valid methods as it initializes the data.

Parameters
dataDataFrame, Serie, list, ndarray or Tensor of shape (timesteps, variables)

The time series dataframe on which train, valid and test datasets are built.

input_columnslist[str or int]

The input column names. Variables used to forecast target values.

label_columnslist[str or int]

The label column names. Target variables to forecast, default to None.

known_columnslist[str or int]

The known column names, default to None. Those variables that we know exact or strong estimated values which happen during target period. Example: Dates or temperatures.

date_columnslist[str or int]

The date column names. Dates associated to each steps, default to None. Date columns will be cast to string and join by ‘-’ delimiter to be used as xticks in plot function.

Returns
selfWindowGenerator

return the instance.

property input_columns: List[Union[slice, int]]#

Return the input_width.

property input_width: int#

Return the input_width.

property known_columns: List[Union[slice, int]]#

Return the known_columns.

property label_columns: List[Union[slice, int]]#

Return the label_columns.

property label_width: int#

Return the label_width.

production(data: Union[pandas.core.frame.DataFrame, numpy.array, tensorflow.python.framework.ops.Tensor], batch_size: Optional[int] = None) tensorflow.python.data.ops.dataset_ops.DatasetV2[source]#

Build the production dataset.

Parameters
dataDataFrame of shape (input_width + shift, variables)

Data to forecast. inputs steps need to be inside data.

Returns
dataPrefetchDataset of shape (inputs, known, date_inputs, date_labels), labels

MapDataset which returns data with shape ((inputs, known, date_inputs, date_labels), labels).

Raises
AssertionError

It raises an error if not all columns defined in the constructor method are inside data.

property sequence_stride#

Return the attribute sequence_stride.

property shift: int#

Return the shift.

property test: Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]#

Build the test dataset.

Returns
dataset:literal:Dataset
property test_size: int#

Return the test_size.

property train: tensorflow.python.data.ops.dataset_ops.DatasetV2#

Return the train dataset.

Returns
dataset:literal:Dataset

Train dataset. It cannot be empty.

property valid: tensorflow.python.data.ops.dataset_ops.DatasetV2#

Return the valid dataset.

Returns
dataset:literal:Dataset
property valid_size: int#

Return the valid_size.