DOI: https://doi.org/10.36719/2663-4619/121/90-103
Bakhshali Bakhtiyarov
Azerbaijan State Oil and Industry University
PhD student
https://orcid.org/0009-0006-2172-4632
bekhtiyarov@gmail.com
Application Fuzzy Logic for Clustering Big Data Sets
Abstract
The rapid growth of industrial data generated by modern SCADA systems in manufacturing requires advanced Big Data frameworks capable of handling large-scale, real-time analytics. Apache Spark has emerged as one of the most efficient platforms, offering execution speeds up to 100 times faster than MapReduce. In spiral steel pipe production, sensor-rich SCADA environments generate high-dimensional and continuous data streams that demand clustering algorithms which are both time-efficient and space-efficient. This paper introduces SRSIO-FCM, a scalable partitioning fuzzy clustering algorithm, specifically designed to address the challenges of Big Data clustering in industrial settings. Implemented on the Apache Spark platform, the proposed SRSIO-FCM is evaluated against SLFCM, a scalable version of the Literal Fuzzy c-Means (LFCM) algorithm. The evaluation employs F-measure, Adjusted Rand Index (ARI), objective function value (OFV), and execution time to assess performance on large-scale SCADA datasets. The experimental results confirm that SRSIO-FCM delivers superior clustering accuracy and significantly reduced execution time compared to SLFCM, proving its capability for real-time monitoring and predictive analytics in spiral steel pipe manufacturing.
Keywords: spark framework, fuzzy clustering, SCADA data, Big Data analytics, industrial process optimization