`autopycoin.dataset.generator`.WindowGenerator#

class autopycoin.dataset.generator.WindowGenerator(input_width: int, label_width: int, shift: Union[None, int] = None, valid_size: Union[int, float] = 0, test_size: Union[int, float] = 0, flat: bool = False, sequence_stride: int = 1, batch_size: int = None, preprocessing: Union[None, Callable] = None)[source]#

Transform a time serie into an usable format for tensorflow model.

It can be either a pandas dataframe, tensorflow tensor or numpy array.

Parameters

input_widthint: The number of historical time steps to use during the forecasting.
label_widthint: the number of time steps to forecast.
shiftint: Compute the shift between input time steps (input_width) and labels time steps (label_width). Hence if label_width is higher than shift label input and label datasets will have some indentical values.
valid_sizeint: The number of examples in the validation set. Use a float between 0 and 1 to use proportion.
test_sizeint: The number of examples in the test set. Use a float between 0 and 1 to use proportion.
flatbool: Flatten the inputs and labels tensors.
batch_sizeint: The number of examples per batch. If None, then all examples are stacked in one batch. Default to None.
preprocessingcallable() or None: Preprocessing function to use on the data. This function needs to take input of shape ((inputs, …), labels). It is applied after the train, validation and test split. Default to None.

Notes

The dataset’s shape depends on the columns defined in from_array method. There are currently four input tensors which can be added inside the inputs dataset.

Output shape: when all columns components are defined: Tuple of shape ((inputs, known, date_inputs, date_labels), labels)

inputs tensor:: The input tensor of shape (batch_size, input_width, input_columns) or (batch_size, input_width * input_columns) depending if flat is set to True. Basically, they are historical values.
known tensor:: The known tensor of shape (batch_size, input_width, known_columns) or (batch_size, input_width * known_columns) depending if flat is set to True are the variables whose values are known in advance or estimated. For example: time dates or temperatures.
date_inputs tensor:: Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by tf.range.
date_labels tensor:: Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by tf.range.
labels tensor:: The Output variables of shape (batch_size, label_width, label_columns) or (batch_size, label_width * label_columns) depending if flat is set to True. They are the values to predict.

Examples

>>> import pandas as pd
>>> from autopycoin.data import random_ts
>>> from autopycoin.dataset import WindowGenerator
...
... # We generate data
>>> data = random_ts(n_steps=100,
...                  trend_degree=2,
...                  periods=[10],
...                  fourier_orders=[10],
...                  trend_mean=0,
...                  trend_std=1,
...                  seasonality_mean=0,
...                  seasonality_std=1,
...                  batch_size=1,
...                  n_variables=1,
...                  noise=True,
...                  seed=42)
...
>>> w_oneshot = WindowGenerator(input_width=3,
...                             label_width=2,
...                             shift=10,
...                             valid_size=2,
...                             test_size=3,
...                             flat=True,
...                             batch_size=None,
...                             preprocessing=None)
...
... # Here juste inputs and labels tensors are generated
>>> w_oneshot = w_oneshot.from_array(data[0],
...     input_columns=[0],
...     label_columns=[0])

Attributes

input_widthint: Return the input_width.
label_widthint: Return the label_width.
shiftint: Return the shift.
valid_sizeint: Return the valid_size.
test_sizeint: Return the test_size.
flatbool: Return the attribute flat.
batch_sizeint or None: Return the attribute batch_size.
traindataset: Return the train dataset.
validdataset: Return the valid dataset.
testdataset: Build the test dataset.
dataDataFrame or ndarray or Tensor: Return the original data.

property batch_size#: Return the attribute batch_size.

property data: numpy.ndarray#: Return the original data.

property date_columns: List[Union[slice, int]]#: Return date_columns.

property flat#: Return the attribute flat.

from_array(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, tensorflow.python.framework.ops.Tensor, pandas.core.series.Series], input_columns: Union[None, List[Union[int, str]]] = None, label_columns: Union[None, List[Union[int, str]]] = None, known_columns: Union[None, List[Union[int, str]]] = None, date_columns: Union[None, List[Union[int, str]]] = None)[source]#

Feed WindowGenerator with a pandas dataframe or a numpy ndarray.

This method has to be called before using train, `test or valid methods as it initializes the data.

Parameters

dataDataFrame, Serie, list, ndarray or Tensor of shape (timesteps, variables): The time series dataframe on which train, valid and test datasets are built.
input_columnslist[str or int]: The input column names. Variables used to forecast target values.
label_columnslist[str or int]: The label column names. Target variables to forecast, default to None.
known_columnslist[str or int]: The known column names, default to None. Those variables that we know exact or strong estimated values which happen during target period. Example: Dates or temperatures.
date_columnslist[str or int]: The date column names. Dates associated to each steps, default to None. Date columns will be cast to string and join by ‘-’ delimiter to be used as xticks in plot function.

Returns

selfWindowGenerator: return the instance.

property input_columns: List[Union[slice, int]]#: Return the input_width.

property input_width: int#: Return the input_width.

property known_columns: List[Union[slice, int]]#: Return the known_columns.

property label_columns: List[Union[slice, int]]#: Return the label_columns.

property label_width: int#: Return the label_width.

production(data: Union[pandas.core.frame.DataFrame, numpy.array, tensorflow.python.framework.ops.Tensor], batch_size: Optional[int] = None) → tensorflow.python.data.ops.dataset_ops.DatasetV2[source]#

Build the production dataset.

Parameters

dataDataFrame of shape (input_width + shift, variables): Data to forecast. inputs steps need to be inside data.

Returns

dataPrefetchDataset of shape (inputs, known, date_inputs, date_labels), labels: MapDataset which returns data with shape ((inputs, known, date_inputs, date_labels), labels).

Raises

AssertionError: It raises an error if not all columns defined in the constructor method are inside data.

property sequence_stride#: Return the attribute sequence_stride.

property shift: int#: Return the shift.

property test: Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]#

Build the test dataset.

Returns

dataset:literal:Dataset

property test_size: int#: Return the test_size.

property train: tensorflow.python.data.ops.dataset_ops.DatasetV2#

Return the train dataset.

Returns

dataset:literal:Dataset: Train dataset. It cannot be empty.

property valid: tensorflow.python.data.ops.dataset_ops.DatasetV2#

Return the valid dataset.

Returns

dataset:literal:Dataset

property valid_size: int#: Return the valid_size.

autopycoin.dataset.generator.WindowGenerator#

`autopycoin.dataset.generator`.WindowGenerator#