Künstliche Intelligenz

MachineLearnAthon

Sales Forecasting

Context

Sales forecasts are indispensable for companies, especially in the manufacturing industry. They enable companies to make well-founded decisions. They therefore often serve as the basis for planning processes along the supply chain. Sales forecasts help to plan production capacities, optimize stock levels and avoid supply bottlenecks. To avoid planning errors, the accuracy of sales forecasts is of great importance. However, demand patterns can be very complex. They depend on various factors, such as the time series pattern, e.g. trend, seasonal or sporadic, the company’s own marketing or the current purchasing power of customers.

In addition, nowadays more and more data is available for creating forecasts. Machine learning algorithms are able to generate precise forecasts from large volumes of data.

Task description

The aim of this challenge is to calculate a sales forecast for a company using machine learning methods. The dataset provides information about the sales of a production company located in the Ruhr area in Germany. The training data contain the documentation of sales from the 1st December 2012 until 30.06.2018 of five products. The task is to create monthly sales forecasts for July, August and September of 2018. To forecast, please divide the training data into proper training data, validation data and test data.

To improve your sales prediction, we suggest to previously apply cluster methods to group the time series.

Dataset

https://www.kaggle.com/competitions/sales-forecasting-new/data

Sample from the data

Date X1 X2 X3 X4 X5 X6 X7 Sales Quantity Sales Unit
01.12.2012 P1 S1 D1 O1 MD1 MG1 MC1 7992.0 PC
06.12.2012 P2 S1 D1 O2 MD3 MG2 MC2 79923.0 PC
06.12.2012 P3 S1 D1 O3 MD4 MG3 MC3 287724.0 M
02.12.2012 P2 S1 D1 O4 MD5 MG4 MC4 47954.0 PC
08.12.2012 P4 S1 D2 O5 MD3 MG5 MC5 47954.0 PC

Description

Variable        Type        Values Description
Date Datetime Dec 2012 to June 2018 Date/time of sale (shifted for privacy)
X7 Categorical MC1–MC2032 Product identifier
X2 – X6 Categorical X1: P1-P5
X2: S1–S6
X3: D1–D13
X4: O1–O53
X5: MD1–MD5
X6: MG1–MG208
Hierarchical anonymized product features:
• X7 represents the exact product designation.
• Each value of X7 maps uniquely to one value of X6.
• Each value of X6 maps uniquely to one value of X5.
• X4 represents subcategories of X2.
SalesQuantity Numerical 376.0 to 500,000,000.0 Number of units sold (may be transformed)
SalesUnit Categorical PC (piece)
M (meter)
M² (square meter)
KG (kg)
Unit of sale

2. A sample from your dataset also helps to visualize the challenge

3. A table of the description (variables, definitions, datatype)

4. License agreement

5. Some further comments if necessary

 

Evaluation method

Goal: Precisely forecast the future product sales for July, August and September of 2018
Metric: The forecasts will be evaluated using the root mean square error (RMSE). The error metric is specially suited for sales forecasting as it punishes extreme deviations from the actual value more strongly than for example absolute error measures. The RMSE will be calculated based on your submission.

Please additionally report the RMSE on your training data.

    \[{\Large \mathbf{\mathrm{RMSE} := \sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(y_i - \hat{y}_i\right)^2}}}\]

where yᵢ is the actual value and ŷᵢ is the predicted value.

Submission Format: Please submit a csv file containing the following variables in this exact order:
Date, X1, Forecast, SalesUnit

Tutorials and related study material

Micro lectures: The Micro Lectures offer a comprehensive insight into various topics in order to convey the most important information in a compact format. Each Micro Lecture focuses on a specific topic:

  • Installing Python
  • Introduction to ML
  • Data Visualization
  • Preparation
  • Clustering
  • Feature Extraction
  • Time Series Forecasting
  • Forecasting Methods
  • Hyperparameter Tuning
  • Overfitting
  • Imbalanced Datasets
  • Random Forest
  • XGBoost
  • LightGBM
  • Evaluation metrics

The creation of these resources has been
(partially) funded by the ERASMUS+ grant
program of the European Union under grant
no. 2022-1-DE01-KA220-HED-000086932.

Neither the European Commission nor the
project’s national funding agency DAAD are
responsible for the content or liable for any
losses or damage resulting of the use of
these resources.