Smart City Gnosys

Smart city article details

Title A Dynamic Ensemble Learning Based Data Mining Framework For Medical Imbalanced Big Data
ID_Doc 1587
Authors Rithani M.; Kumar R.P.; Ali A.
Year 2025
Published Knowledge-Based Systems, 310
DOI http://dx.doi.org/10.1016/j.knosys.2024.112947
Abstract In the era of big data, technologies like the Internet of Things, smart cities, healthcare, and social media rely heavily on advanced data analytics. In medical data, certain critical diseases are significantly underrepresented compared to more prevalent conditions, creating a class imbalance that can lead to biased models favoring majority class predictions. This imbalance reduces the accuracy and reliability of predictions for the minority class, which is often essential for early diagnosis and intervention in rare but severe diseases. This is particularly challenging in medical data, where cancer classification faces problems such as high dimensionality, redundancy, and severe class imbalance. To address these challenges, this paper proposes a novel framework which integrates a Relevance Vector Machine classifier with an Incremental Ensemble framework to effectively manage data imbalance. It employs a Gaussian Mixture Models-based combined resampling algorithm to balance the dataset by resampling. Mutual Information Gain Maximization enhances the effectiveness of feature selection. To further enhance performance, an Adaptive Weighted Broad Learning System is incorporated a density-based weight generation mechanism using prior distribution information. Additionally, an Incremental Dynamic Learning Policy-based Relevance Vector Machine classifier is incorporated to adapt to new data, and maintain high accuracy. The proposed model achieves superior performance with an Accuracy of 99 %, a Kappa value of 98 %, an F1-Score of 99 %, and an MCC of 96.9 %. These results underscore the model's effectiveness in addressing class imbalance, enhancing predictive accuracy for minority classes, and offering a robust solution for complex medical datasets essential for improved healthcare outcomes. © 2024
Author Keywords Big data classification; Gaussian mixture model; Incremental weighted ensemble broad learning system; Mutual information gain maximization; Relevance vector machine