Abstract
A data-driven approach for multivariate time series combined with numerical static data was presented in this paper, the goal was to detect longitudinal cracks on steel slabs at the continuous caster. Thermocouple data of 68 sensors was processed and several features were extracted and combined with chemical and process information. We evaluated the effect of training set size and different downsampling strategies, since the data is largely imbalanced. The impact of removing chemical, and process information are separately evaluated on all strategies. Finally, the effect of shuffling the dataset is also presented since it is commonly presented on the literature or the temporal strategy is not mentioned (i.e., whether or not shuffling was used). Our baseline model for comparison is the one currently in use at ArcelorMittal Belgium, a rule-based differential model that is inspired on the current literature. We choose to use GBRT as the classifier because it matches several criteria required by industry, mainly its ability to handle large input spaces and highly imbalanced data while being a grey box model, where feature importance can be inferred after the training process for better understanding process behavior. Decreasing the imbalance on the dataset to 4% and using two years for the training period, we observed a relative improvement of 15% (Fß= .2.4 = 0.22) over the baseline (Fß = .2.4 = 0.19). Moreover, the largest gain of performance was made when both process and chemical information were joined with the thermocouple data, being that removing either from the training set reduces the overall performance. This confirms the hypothesis that chemical composition of the liquid steel and process information can improve the detection performance of longitudinal cracks on the continuous caster. Shuffling the dataset before the train and test split resulted in an overall gain of performance and results were not affected when removing process or chemical data. The understanding on this case is that the model learns more the behavior of the neighbor samples thus creating a biased test set. Such model would show worse performance when deployed as it does not represent a real-use scenario. Optimizing the detection of defects early in the production chain is essential to enhance steel quality and guarantee the supply chain within the factory, as the cost of a defect significantly grows the longer it stays in production, wasting time and material. The evidence of which variables are more significant for detection is valuable as it compels the search for more complex approaches, which can further improve prediction. Based on the outcome of this research, the factory aims to further enhance the current model in use with the most important features. Feature engineering is also a topic to be discussed as several signals have larger influence, e.g., sensors at the center of the mould are more relevant since most longitudinal cracks are more common on that region. Another research venue to be pursued is the use of more advanced models such as Long-Short-Term-Memory (LSTM) and Convolutional Neural Networks (CNNs); the black box models that are more suitable to be trained with large amounts of data without the need of feature extraction or engineering.
Original language | English (US) |
---|---|
Pages (from-to) | 711-718 |
Number of pages | 8 |
Journal | AISTech - Iron and Steel Technology Conference Proceedings |
Volume | 2022-May |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
Event | AISTech 2022 Iron and Steel Technology Conference and Exposition - Pittsburgh, United States Duration: May 16 2022 → May 18 2022 |
All Science Journal Classification (ASJC) codes
- Industrial and Manufacturing Engineering
Keywords
- Continuous Casting
- Detection
- Feature Extraction
- Longitudinal Cracks
- Machine Learning