autopycoin.dataset.generator.WindowGenerator#
- class autopycoin.dataset.generator.WindowGenerator(input_width: int, label_width: int, shift: Union[None, int] = None, valid_size: Union[int, float] = 0, test_size: Union[int, float] = 0, flat: bool = False, sequence_stride: int = 1, batch_size: int = None, preprocessing: Union[None, Callable] = None)[source]#
Transform a time serie into an usable format for tensorflow model.
It can be either a pandas dataframe, tensorflow tensor or numpy array.
- Parameters
- input_width
int The number of historical time steps to use during the forecasting.
- label_width
int the number of time steps to forecast.
- shift
int Compute the shift between input time steps (input_width) and labels time steps (label_width). Hence if label_width is higher than shift label input and label datasets will have some indentical values.
- valid_size
int The number of examples in the validation set. Use a float between 0 and 1 to use proportion.
- test_size
int The number of examples in the test set. Use a float between 0 and 1 to use proportion.
- flatbool
Flatten the inputs and labels tensors.
- batch_size
int The number of examples per batch. If None, then all examples are stacked in one batch. Default to None.
- preprocessing
callable()orNone Preprocessing function to use on the data. This function needs to take input of shape ((inputs, …), labels). It is applied after the train, validation and test split. Default to None.
- input_width
Notes
The dataset’s shape depends on the columns defined in
from_arraymethod. There are currently four input tensors which can be added inside the inputs dataset.Output shape: when all columns components are defined: Tuple of shape ((inputs, known, date_inputs, date_labels), labels)
- inputs tensor:
The input tensor of shape (batch_size, input_width, input_columns) or (batch_size, input_width * input_columns) depending if flat is set to True. Basically, they are historical values.
- known tensor:
The known tensor of shape (batch_size, input_width, known_columns) or (batch_size, input_width * known_columns) depending if flat is set to True are the variables whose values are known in advance or estimated. For example: time dates or temperatures.
- date_inputs tensor:
Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by
tf.range.- date_labels tensor:
Dates of shape (batch_size, input_width) are the dates associated to the inputs tensor. Default to a tensor generated by
tf.range.- labels tensor:
The Output variables of shape (batch_size, label_width, label_columns) or (batch_size, label_width * label_columns) depending if flat is set to True. They are the values to predict.
Examples
>>> import pandas as pd >>> from autopycoin.data import random_ts >>> from autopycoin.dataset import WindowGenerator ... ... # We generate data >>> data = random_ts(n_steps=100, ... trend_degree=2, ... periods=[10], ... fourier_orders=[10], ... trend_mean=0, ... trend_std=1, ... seasonality_mean=0, ... seasonality_std=1, ... batch_size=1, ... n_variables=1, ... noise=True, ... seed=42) ... >>> w_oneshot = WindowGenerator(input_width=3, ... label_width=2, ... shift=10, ... valid_size=2, ... test_size=3, ... flat=True, ... batch_size=None, ... preprocessing=None) ... ... # Here juste inputs and labels tensors are generated >>> w_oneshot = w_oneshot.from_array(data[0], ... input_columns=[0], ... label_columns=[0])
- Attributes
input_widthintReturn the input_width.
label_widthintReturn the label_width.
shiftintReturn the shift.
valid_sizeintReturn the valid_size.
test_sizeintReturn the test_size.
flatboolReturn the attribute flat.
batch_sizeintorNoneReturn the attribute batch_size.
traindatasetReturn the train dataset.
validdatasetReturn the valid dataset.
testdatasetBuild the test dataset.
dataDataFrameorndarrayorTensorReturn the original data.
- property batch_size#
Return the attribute batch_size.
- property data: numpy.ndarray#
Return the original data.
- property flat#
Return the attribute flat.
- from_array(data: Union[pandas.core.frame.DataFrame, numpy.ndarray, tensorflow.python.framework.ops.Tensor, pandas.core.series.Series], input_columns: Union[None, List[Union[int, str]]] = None, label_columns: Union[None, List[Union[int, str]]] = None, known_columns: Union[None, List[Union[int, str]]] = None, date_columns: Union[None, List[Union[int, str]]] = None)[source]#
Feed
WindowGeneratorwith a pandas dataframe or a numpy ndarray.This method has to be called before using train, `test or valid methods as it initializes the data.
- Parameters
- data
DataFrame, Serie, list, ndarray or Tensor of shape (timesteps, variables) The time series dataframe on which train, valid and test datasets are built.
- input_columns
list[strorint] The input column names. Variables used to forecast target values.
- label_columns
list[strorint] The label column names. Target variables to forecast, default to None.
- known_columns
list[strorint] The known column names, default to None. Those variables that we know exact or strong estimated values which happen during target period. Example: Dates or temperatures.
- date_columns
list[strorint] The date column names. Dates associated to each steps, default to None. Date columns will be cast to string and join by ‘-’ delimiter to be used as xticks in plot function.
- data
- Returns
- self
WindowGenerator return the instance.
- self
- production(data: Union[pandas.core.frame.DataFrame, numpy.array, tensorflow.python.framework.ops.Tensor], batch_size: Optional[int] = None) tensorflow.python.data.ops.dataset_ops.DatasetV2[source]#
Build the production dataset.
- Parameters
- data
DataFrame of shape (input_width + shift, variables) Data to forecast. inputs steps need to be inside data.
- data
- Returns
- data
PrefetchDataset of shape (inputs, known, date_inputs, date_labels), labels MapDataset which returns data with shape ((inputs, known, date_inputs, date_labels), labels).
- data
- Raises
AssertionErrorIt raises an error if not all columns defined in the constructor method are inside data.
- property sequence_stride#
Return the attribute sequence_stride.
- property test: Optional[tensorflow.python.data.ops.dataset_ops.DatasetV2]#
Build the test dataset.
- Returns
- dataset:literal:Dataset
- property train: tensorflow.python.data.ops.dataset_ops.DatasetV2#
Return the train dataset.
- Returns
- dataset:literal:Dataset
Train dataset. It cannot be empty.
- property valid: tensorflow.python.data.ops.dataset_ops.DatasetV2#
Return the valid dataset.
- Returns
- dataset:literal:Dataset