Working with multivariate time series forecasting
With Qlik 프로젝트, you can train machine learning models to forecast time-specific metrics. Using neural network-based methods, models learn and predict complex patterns involving time-specific associations, grouped target data, historical features, and known future variables. To create a time series forecast, prepare a training dataset, use it in a time series experiment, deploy a model, and then create apply datasets that you can use to generate predictions.
Components of a time series problem
With time series forecasting, the goal is to predict target values for specific dates into the future. For example, you might want to predict sales for the next week, month, or quarter.
When developing your time series problem, define the following components:
-
Target and groups
-
Date index
-
Forecast horizon
-
Covariates
Simplified illustration outlining the components of a time series forecasting problem in Qlik 프로젝트.

Target
As with other experiment types, the target is the column for which you want the model to predict future values. For time series experiments, the target needs to contain numeric data—for example, sales or inventory.
If you are using groups in the time series forecast, models will predict one target value per group per time step in the forecast window. If you are not using groups, your trained models will predict one target value for each time step in the forecast window.
Date index
The date index tracks the time series metrics over a continuous time interval (time step). You need to decide on your time step at an early stage: how often do you need to predict future values?
Specifically, the date index is a column that appears in your training and apply datasets for time series problems. The date index determines the structure of both of these apply datasets—each row represents a step in time (or, with groups, a step in time for each unique grouping).
When you add your training dataset in a time series experiment, possible date index columns are automatically identified and presented to you as Insights at the column level. You can identify them from the Possible date index insight in schema view.
Groups
Groups are features containing categorical information for which you want to generate predictions separately. Classic examples of groups include store number and product, which could have been used to organize data for a target such as sales. By selecting store number and product type as groups, your time series models will provide predictions for each individual value across these columns. For example, with a target of sales, if you have three store numbers — 1, 2, and 3 — and two product types — grocery and produce — your model will generate sales predictions for each unique combination of these values.
You should incorporate groups into your time series problem if you have the data and need individual predictions by category. Another advantage of groups is that models can learn globally, better understanding the patterns that exist between the different groupings you define.
You can configure the groups to use for each experiment version. If you do not specify groups but groups are identified in your training dataset, the training will use groups.
Groups are identified by duplicate values in the date index column—for example, for a date of 1/14/2025, you have two records: one for store A, and the other for store B.
Each group in a time series experiment — including the target alone — are considered to be separate time series within your dataset. See What is a time series?.
Primary and secondary groups
For time series problems with two group columns, one group is the primary group and the other is the secondary group. For examples, see Preparing a training dataset, Training dataset example — two groups, and 적용 데이터 집합 예 — 두 그룹.
The primary grouping defines independent time series. For example, each store becomes its own separate time series, allowing the model to learn different behaviors and patterns across stores.
The secondary grouping is treated differently. Instead of creating fully separate series, the system pivots those values into additional features (covariates), allowing related sub-series to provide contextual information to each other.
You generally do not need to know which group is the primary and which is the secondary, although you can optionally choose a primary group in your experiment. Certain considerations apply for predicting with deployed time series models—see Preparing an apply dataset.
All primary groups should share the same secondary groups. For example, if stores are primary groups and products are secondary groups, each store should contain the same set of products. Missing secondary groups would lead to inconsistent feature dimensions across time series.
If you expect secondary group values to not align with primary group values at prediction time, one possible approach is to combine the original primary and secondary group values into a new grouping column and use it as the primary group when retraining the model. In this setup, the model no longer depends on secondary groups, but correlated information between groups might be reduced.
Forecast horizon
The forecast horizon specifies how far into the future you want to forecast. The forecast horizon is composed of the forecast window (the number of time steps for which you need predictions) and forecast gap (an optional number of time steps after your historical data for which you do not want predictions).
You set the forecast window and gap size when configuring an experiment version. These values are used both during model training and when generating predictions from models deployed as ML deployments.
The forecast window is the number of time steps for which you want to predict into the future. For example, if your time step is one day and you want to forecast sales for the next two weeks, you would set your forecast window to 14.
The forecast gap is the amount of time in the future for which you do not require predictions. Setting a forecast gap is optional, because you may or may not need one. The forecast gap starts at the end of the recorded historical training data you have provided. The forecast window begins where the forecast gap ends.
For example, you might be looking to predict future sales, but you are only interested in future sales for dates later than one week after the end of your input data. In this case, with a time step of days, you could set your forecast gap size to seven time steps.
Your selected forecast window, in addition to how much training data you have, limits how far into the future you can forecast. For more information, see Maximum forecast window.
Covariates
In time series problems, features are often called covariates. Similar to other machine learning problems, covariates are the other variables that you suspect have an influence on the outcome of the target. Each covariate is represented as a single column in your training dataset.
In time series forecasting, there are several types of covariates and they have some important distinctions:
-
Static covariates: Columns that do not vary over the course of a time series. Static covariates are applicable in time series experiments where groups are being used. For example, suppose you have groups for Product and Store Number, and there is a feature Default Discount. If Product A in Store 1 has a default discount of 10% and Product B in Store 2 has a default discount of 20%, Default Discount would be a static covariate. That is, it does not vary within the data for the group within which it appears.
Static covariates are detected automatically from historical features you include in the experiment. You do not need to indicate which features are static covariates.
-
Past covariates: Time-dependent variables that are available only in the historical data, and which vary across this data. Past covariates are detected automatically from historical features you include in the experiment. You do not need to explicitly indicate which features are past covariates.
-
Future covariates: Future covariates, also known as future features, are time-dependent variables for which you will know the future values within the forecast horizon. When using future covariates in training, you need to indicate them as future features in the training configuration.
Future features
With future features, you can provide additional data to your models about future information you already know or can reasonably expect. In particular, you have access to future values for this feature spanning your selected forecast horizon. When defining future features, you need to provide historical as well as future data.
For example, for a model predicting metrics that could be influenced by future discounts offered by a store, you could include the historically observed discounts, as well as the discounts for future time periods within the forecast window. Other examples of future features could be weather or calendar information.
Other important concepts
This section outlines concepts that are relevant to your time series problem, but that you do not configure directly in an experiment or ML deployment. These are properties that are defined by your data or by other properties you configure for the model.
Time steps
The time step is defined by your training dataset and is important for both training and predictions.
In your training dataset, the time step is the interval at which the data in your date index is recorded. For example, the time step can be daily, every hour, every minute, or every second. The smallest time step detected is milliseconds.
It is important to be aware of the time step used in your training data. Other experiment parameters you define, such as forecast window and forecast gap size, will follow this time step interval.
After deploying your model, the apply data for which you want to create predictions will need to follow the same time step as defined in the training dataset.
Quality
When you select a training dataset, the system infers the time step used. If there are some missing values or gaps in the date index, columns such as target, groups, and covariates can be interpolated automatically by the system. However, if your data contains time intervals that are inconsistent to the point where different time steps are detected, the data must be fixed first. For example, if you have several months of data recorded once daily, but there is a section in which data is consistently recorded on a weekly basis, the dataset cannot be used because multiple time steps will be detected.
Apply window
The apply window, or look-back period, is the portion of the training data that the algorithm can use to provide the predictions for your specified forecast window.
The apply window is calculated and set by the system. It is measured in time steps. The apply window is defined by what you set as the forecast window and gap (forecast horizon). Your apply window size is shown in the experiment configuration panel and the Model training summary, after running at least one experiment version. It is also shown in an ML deployment Model schema when creating or editing a batch prediction configuration.
The apply window is identified automatically from your training configuration. To generate predictions for a given forecast window, you need to provide the historical data covering at least your apply window. This is provided in your apply dataset. See Preparing an apply dataset.
Maximum forecast window
The maximum forecast window is estimated as you configure your time series experiment. After you have run a version of the training, the maximum forecast window is confirmed with certainty. The maximum forecast window is displayed to you as the Estimated maximum forecast or Maximum forecast under Based on your data, when you open Target and experiment type in the experiment configuration panel. The maximum forecast window is the maximum number of time steps for which you can generate forecasts, given your chosen forecast window, how much historical data you have provided, and the minimum sample size expected by the system. The more historical data you provide, the further in time you will be able to predict. However, to generate reliable predictions, it is important to select a reasonable forecast window.
최대 예측 기간은 180 시간 단계까지 가능합니다.
Forecast cut-off time
The forecast cut-off time is especially important when defining your apply dataset during predictions. The forecast cut-off time is the last date in your sample for which you have a target value. Essentially, dates after this cut-off time are the dates for which you want to generate predictions.
What is a time series?
In Qlik 프로젝트 time series forecasting, each group — including the target alone — are considered to be separate time series within the training dataset. For example, suppose your training dataset contains sales metrics. These sales metrics are defined for each store and product type. With Store and Product Type columns defined as groups, there are three time series in the training dataset.
Preparing a training dataset
For multivariate time series forecasts, your training dataset needs to contain the following columns:
-
Date index
-
Target column
-
Group columns (optional)
-
Feature columns (optional—without features, you are training a univariate forecasting model)
Illustrations showing the required columns and data for time series training datasets. Scenarios with no groups, one group, and two groups are described.

Linear diagram outlining the needed components, and timeline, of a training dataset for a time series forecasting model.

Date index column
You need a date index containing full dates or time stamps. This column is the chronological index along which the target and covariate metrics are tracked. The date index column organizes the time-based measurements sequentially along a consistent time interval (the time step).
The date index column is organized as follows, depending on whether or not you are using groups:
-
No groups: A single record for each time step. For example, with a daily forecast, each row represents a single day.
-
With groups: One or more duplicate entries for each time step depending on the groups used.
With a multivariate training dataset, there will be one or more duplicate entries for each time step depending on the groups used. There is flexibility in the time step you use — you could, for example, record dates one or more times on a daily, weekly, or monthly basis, and so on.
Missing or inconsistently recorded values in this column are sometimes acceptable, if they can be interpolated. However, your date index values cannot contain multiple different time steps. For example, if the interval is determined to be once daily, but at some point, an interval of twice daily is identified, an error will occur during training.
Target column and group columns
Your dataset needs to have a target column containing a numeric metric that you want to forecast. A common example is sales.
If you are using groups, you provide historical target values for each possible value in groups that you add. For example, if your target is Sales and you add a group Store Number that contains data for Store A and Store B, your dataset needs to include two separate records for each time step: one with the sales value for Store A, and the other with the sales value for Store B.
Feature columns
You can train a time series model without any covariates. However, if you include covariates, provide a column in the dataset for each feature. Feature data should generally be historically recorded data unless you are adding future features. Future feature columns can contain both historical and future data. You should only include future feature data in the training dataset if you are confident that the future values of these column will be known when you create predictions.
Keep track of which features you will use as future features, as you will need to select them as such in the training configuration.
Data volume
Your dataset needs to contain enough records—data volume is determined by the time range shared across all groups. Only the data from this overlapping period is used to train the experiment.
The volume of your historical data plays a part in determining how far into the future you can predict. Your desired forecast window also affects how much historical data you need.
Generally, more historical data is better than less. However, the data needs to be of good quality and capture the desired trends. If the data provides irrelevant information or contains inaccuracies, it is not helpful to have it in the model. Consider a balance between optimizing volume and maintaining quality and relevance.
Examples
Preparing an apply dataset
After you deploy a time series model, you need to develop an apply dataset for which predictions will be made.
-
학습 데이터 세트에 포함된 모든 열의 열 및 열 머리글입니다.
-
학습 데이터 세트와 동일한 시간 단계입니다.
-
학습 데이터 세트에 존재했던 모든 그룹 및 그룹 값입니다.
정보 메모적용 데이터 세트에 (학습 데이터에는 없었던) 새로운 그룹 값이 있는 경우 해당 행에 대한 예측은 생성되지 않습니다. 이러한 새로운 그룹 값에 대한 예측이 필요한 경우 해당 값을 포함하는 학습 데이터로 모델을 다시 학습시키는 것이 좋습니다.정보 메모예측 시 또는 적용 데이터 세트에서 누락된 그룹 값은 다음과 같이 처리됩니다.
-
모델이 학습된 기본 그룹 값이 없는 것은 허용됩니다.
-
누락된 보조 그룹 값은 허용되지 않습니다. 예측이 오류와 함께 실패합니다.
가능한 한 예측 중에 모든 보조 그룹 데이터를 수집하고 제공하는 것이 가장 좋습니다. 그러나 주로 예측 시에 보조 그룹이 누락될 것으로 예상되는 경우 가능한 한 가지 해결책은 보조 그룹을 전혀 사용하지 않는 것입니다.
대신 원래의 기본 및 보조 그룹 값을 단일 새 그룹화 열로 결합하고 이를 새 기본 그룹으로 사용한 다음 해당 구조를 기반으로 모델을 다시 학습시킬 수 있습니다. 이 설정에서 새 모델은 새로 도입된 기본 그룹에만 의존합니다.
단점은 그룹 간의 상관된 정보 중 일부를 잃을 수 있다는 것입니다. 이제 그룹이 서로에게 컨텍스트 정보를 제공하는 관련 하위 시리즈가 아니라 완전히 별개의 시계열로 취급되기 때문입니다.
-
-
모델의 적용 창에 있는 레코드 수만큼 예측 마감 시간 이전의 과거 데이터 레코드(대상 및 그룹당)가 필요합니다. 이는 과거에 관찰된 날짜 또는 타임스탬프, 대상 및 공변량 값을 포함하는 전체 레코드여야 합니다. 적용 창은 학습 중에 구성된 예측 창 및 간격에 의해 결정됩니다. 즉, 더 먼 미래를 예측해야 할수록 예측을 실행하기 위해 적용 데이터 세트에 더 많은 과거 데이터가 필요합니다.
-
예측 범위의 모든 미래 시간 단계에 대한 레코드입니다. 이러한 미래 레코드의 경우 날짜 인덱스 열의 값과 모든 미래 기능만 포함합니다. 다른 열의 값은 비워 둡니다.
시계열 예측 모델에서 예측을 생성하는 데 사용되는 적용 데이터 집합에 필요한 열과 데이터를 보여주는 그림입니다. 그룹이 없는 시나리오, 하나의 그룹이 있는 시나리오, 두 개의 그룹이 있는 시나리오가 설명되어 있습니다.

시계열 예측 모델로 예측을 생성하는 데 사용되는 적용 데이터 집합의 필요한 구성 요소와 타임라인을 간략하게 보여주는 선형 다이어그램입니다.
