Künstliche Intelligenz
MachineLearnAthon
Hand-drawn Unit Operation Recognition
Context
Process flow diagrams (PFDs) are very important documents in the chemical process industry. In particular, there exist a variety of different PFDs used during all engineering stages from early-stage process development to detailed engineering, construction, operation, and disassembly. These PFDs represent essential information about chemical processes [1], such as process topology, major unit operations, control equipment, and piping information[2], [3]. See below for an example:
Figure 1: Example of a typical process PFD found in the chemical process industry
PFDs contain three main sources of information: (i) unit operations, (ii) connectivities, and (ii) additional text.
Task description
In this challenge, you will be tasked with the classification of hand-drawn unit operation symbols, see Figure 2. These symbols come in various categories, each representing a specific unit operation used in chemical engineering, such as distillation, filtration, mixing, and reactions. Different unit operations sometimes are depicted in different ways.
The primary goal is to develop robust machine learning models capable of accurately classifying these symbols into their respective categories. Participants will have access to a diverse dataset of hand-drawn unit operation symbols for training and validation, and their models will be evaluated based on their ability to correctly classify symbols in a test dataset
By successfully addressing this challenge, participants will contribute to the digitization of PFDs in the chemical process industry, making essential process information more accessible and efficient for engineers and operators. Join us in this endeavor to bridge the gap between legacy hand-drawn diagrams and modern machine-readable representations, revolutionizing the way we understand and utilize chemical processes
Figure 2: Some examples of unit operations drawn by hand. From top to bottom, these are PFR reactors, valves, absorption columns, CSTR reactors, compressors, pumps, turbines, and storage.
Dataset
https://www.kaggle.com/competitions/hand-drawn-unit-operations-recognition/data
| Name | ID | labelID |
| Datatype | Int | Int |
| Description | The ID of the images for retrieving | The ID of the image label |
License

Evaluation methods
Accuracy is a measure of how close a model’s predictions are to the actual values. It is commonly used in classification tasks and is defined as the proportion of correctly predicted labels to the total number of instances.
Accuracy is expressed as:
![Rendered by QuickLaTeX.com \[{\Large \mathbf{\text{Accuracy} = \frac{1}{N}\sum_{i=1}^{N} \mathbf{1}\!\left(\hat{y}_i = y_i\right)}}\]](https://www.logi-do.de/wp-content/ql-cache/quicklatex.com-296e6803f84ea39cf655dff2454b3c3d_l3.png)
where:
Nis the total number of samples.y_iis the true label for the i-th sample.ŷ_iis the predicted label for the i-th sample.𝟙(ŷ_i = y_i)is an indicator function that returns 1 ifŷ_i = y_i, otherwise 0.
Accuracy provides a simple and intuitive measure of performance, but it may not be the best metric for imbalanced datasets or multi-class problems where precision and recall may provide more insight.
Submission format:
For each ID in the test set, you must predict a probability for the TARGET variable. The file should contain a header and have the following format:
ID,TARGET
2,0
5,0
6,0
etc.
Tutorials
Python tutorial: Python’s versatility, extensive library support, readability, and active community make it a foundational language for machine learning and contribute to its widespread adoption in the field. Its role in machine learning is expected to continue growing as the field evolves and new tools and techniques emerge
PyTorch tutorial: PyTorch is an open-source deep learning framework. It is designed to provide a flexible and dynamic platform for building and training artificial neural networks.
References
[1] G. Nasby, “Using process flowsheets as communication tools,” Chem Eng Prog, vol. 108, no. 10, pp. 36–44, Oct. 2012.
[2] L. S. Balhorn, Q. Gao, D. Goldstein, and A. M. Schweidtmann, “Flowsheet Recognition using Deep Convolutional Neural Networks,” Computer Aided Chemical Engineering, vol. 49, pp. 1567–1572, Jan. 2022, doi: 10.1016/B978-0-323-85159-6.50261-X.
[3] M. F. Theisen, K. N. Flores, L. Schulze Balhorn, and A. M. Schweidtmann, “Digitization of chemical process flow diagrams using deep convolutional neural networks,” Digital Chemical Engineering, vol. 6, p. 100072, Mar. 2023, doi: 10.1016/j.dche.2022.100072.
The creation of these resources has been
(partially) funded by the ERASMUS+ grant
program of the European Union under grant
no. 2022-1-DE01-KA220-HED-000086932.
Neither the European Commission nor the
project’s national funding agency DAAD are
responsible for the content or liable for any
losses or damage resulting of the use of
these resources.
