| Abstract |
This paper proposes an Natural Language Processing (NLP) semantic feature-driven heterogenous consensus model-based Structure Query Language Injection Attack (SQLIA) prediction model for System Configuration Profile Management (SCPM) purposes (NSF-HC SQLIAS). This research focused on identifying an optimal NLP computing environment and hence at first applied three different semantic feature extraction models such as term-frequency inverse difference frequency (TF-IDF), Continuous Bag of Words (CBOW), and N-Skip Gram (SKG) which extracted low-dimensional features. Subsequently, to alleviate class-imbalance problems, random sampling and up-sampling methods were applied. The resampled features, along with the original features, were processed for feature selection using significant predictor test and variance threshold feature selection (VTFS) algorithms to improve time-efficiency and reduce redundant computation. The selected features were processed for Min–Max normalization that helped alleviate convergence and over-fitting problems. The normalized features were processed for two-class classification using nine machine learning algorithms including Naïve Bayes variants regression techniques, pattern mining, association rule mining, neuro-computing, and ensemble learning. These nine base-classifiers constituted a robust heterogenous ensemble learning environment which labelled each SQL-query as the normal traffic or SQLIA and thus based on the maximum voting score, our proposed consensus (CONS) model predicted each query as the Normal SQL-traffic or the SQLIA. The depth performance characterization revealed that the proposed NSF-HC SQLIAS model with CBOW features, random sampling, significant predictor test, and normalization yields the highest accuracy of 98.6%, F-Measure and area under the curve (AUC) of 0. 993, and 0.999, respectively. Relative performance revealed the efficacy of the proposed NSF-HC SQLIAS model over other approaches. © 2025 IETE. |