International Conference on Innovative Practices in Technology and Management (ICIPTM 2026)
2026 5th International Conference on Innovative Practices in Technology and Management (ICIPTM) | 979-8-3195-4328-8/26/$31.00 ©2026 IEEE | DOI: 10.1109/ICIPTM69057.2026.11465760
A Comparative Analysis of Classification
Algorithms for Extrovert and Introvert Identification
Prashant Agrawal
Department of Computer Applications
Krishna Institute of Engineering &
Technology (KIET)
Ghaziabad, Delhi-NCR, UP, India
[email protected]
Divas Tewari
Graphic Era Hill University Bhimtal;
Centre for Promotion of Research,
Graphic Era (Deemed to be) University
Dehradun, Uttarakhand–248002, India
[email protected]
Shokir Ataev
Department of Law
Urgench State University
Urgench, Uzbekistan
[email protected]
Orcid: 0009-0007-4658-2631
Charosxon Sabirova
Department of Pedagogy and psychology
Urganch Innovatsion university
Urgench, Uzbekistan
[email protected]
Orcid: 0009-0007-0193-7603
Abstract—The work outlines a model using supervised
machine learning for the classification of personalities
prediction, or the ability to distinguish between both types of
personalities, based on behavioral data containing seven
characteristics, such as social event attendance, stage fright, and
time spent alone. The 2,900 samples of data will be one-hot
encoded for categorical variables and standardized for
numerical variables. Logistic Regressive, The use of support
vector machines (SVM), Random Forest (RF), XGBoost, a Naive
Bayes model, and Choice Tree were from the classification
models subsequently employed, and GridSearchCV was used to
fine-tune the pipelines. Gradient-based models with learning
rates between 0.01 and 0.2 and a batch size of 32 were used for
training. The best-performing models were Logistic Regression
and the tuned models were convergent to 92.93, while the default
scores varied between 87.24 to 92.93. Metrics for assessment
such as the F1-score (0.93) and ROC-AUC (0.93) attest to the
reliability and durability of the suggested personality rating
method.
Keywords—Behavior prediction, grouping of introverts as well
as neural networks, SVM, xg boost, gridsearchCV, design of
features, one-hot encoding, standard scaler, ROC-AUC, accuracy,
the F1- score behavioral data analysis, and personality.
I. INTRODUCTION
Since personality prediction is used in hiring, healthcare,
education, and personalized recommendations, it has been
given top priority in the fields of psychology, AI, and data
science. Because they have a direct impact on how people
communicate, make decisions, and interact with others, quiet
and extrovert personality types have attracted a lot of attention
among personality traits [1]. Conventional methods use
psychometric tests and questionnaires, which are laborious
and frequently arbitrary [2]. Using behavioral, textual, and
multimodal data, personality can now be automatically
predicted thanks to recent developments in data mining (ML)
together with deep learning (DL) [3]. It seems that there are
plenty of models such as LR, a SVM, RF, Naive Bayes and
XGBoost in particular that have been demonstrated to perform
well in classification problems [4]. Model generalization and
accuracy are also improved by hyperparameter tuning
methods such as GridSearchCV [5] [6]. streamlined models
have been validated to have an accuracy of above 90 per cent
Barno Matchanova
Department of National Idea and
Philosophy
Urgench State Pedagogical Institute
Urgench, Uzbekistan
[email protected]
ORCID: 0009-0004-5217-6435
Sarvarbek Matniyazov
Department of History
Mamun University
Khiva, Uzbekistan
[email protected]
while identifying personality traits [7] [8]. The study presents
the best-performing model by comparing this classification
with both untuned and tuned machine learning models [9],
[10].
II. LITERATURE REVIEW
The natural language processing highlighted effectiveness
of convolutional neural networks (CNNs) for emotion
detection in textual data. In [11], a hybrid CNN-LSTM model
was proposed to improve contextual emotion detection,
demonstrating that combining CNN with LSTM enhances
classification accuracy (92.4%) and F1-score (91.7%).
Similarly, [12] showed that CNNs alone are effective in
handling short text sentiment and emotion detection tasks,
achieving an accuracy of 90.2% and precision of 89.6%,
thereby proving CNN’s efficiency in capturing n-gram level
features. Elaborating on this, [13] showed that CNNs that have
pre-trained GloVe embeddings can be used to classify the
emotions expressed in the text without manual feature
engineering and still have high accuracy and recall rates of
91.8% and 92.1% respectively. Additionally, [14] proposed a
multi-channel CNN model to learn fine-grained emotional
indicators using texts in social media, and it achieved higher
accuracy (93.5%) and F1-score (92.9) with CNN being able to
adapt to harsh, user-generated information. A more recent
paper [15] introduced a CNN architecture including a
semantic embedding layer, obtaining 94.1 and 93.8 percent
accuracy and precision respectively, which showed that
deeper semantic representation enhanced performance in finegrained classification tasks. All these studies confirm the
strength of CNN in detecting text-based emotions.
TABLE I.
LITERATURE SURVEY SUMMARY
Ref.
Key Learnings
Techniques
Employed
Performance
Metrics with
value
[16]
Ensemble methods
improve robustness in
personality classification
tasks
Random
Forest,
XGBoost
Accuracy =
92%, F1 =
0.91
979-8-3195-4328-8/26/$31.00 ©2026 IEEE
Authorized licensed use limited to: Chandigarh University. Downloaded on April 21,2026 at 09:24:50 UTC from IEEE Xplore. Restrictions apply.
[17]
SVM with optimized
kernels yields superior
classification in small
datasets
SVM (linear,
RBF kernel)
Accuracy =
93%, ROCAUC = 0.94
[18]
Text-based embeddings
enhance introvert–
extrovert trait detection
Logistic
Regression +
TF-IDF
Accuracy =
91%,
Precision =
0.90
[19]
Deep learning models
capture contextual
dependencies in
personality text data
LSTM,
BiLSTM
Accuracy =
94%, F1 =
0.92
[20]
,
[21]
Feature engineering and
hyperparameter tuning
improve classical ML
models
GridSearchCV
with Decision
Tree, Random
Forest
Accuracy =
90%, Recall
= 0.89
III. DATASET
The statistics employed in this study consist of the data on
personality traits taken to classify people into Introverts and
Extroverts in a binary system. The data set has 2,900 records
which have various behavioral and demographic attributes
that lead to prediction of personality. Of these, there are about
1,520 cases that are termed as Introverts and a corresponding
1,380 cases that are termed as Extroverts, which is relatively
even as to the two classes. The characteristics are a mix of
numbers like age, activity levels, and frequency of interaction,
and categories, which are gender and occupation, as they
allow flexible-type of learning setting. To pre-process the
data, missing data were handled with the help of the default
function, categorical data were coded with the help of the OneHot Encoding, and numerical features were normalized with
the help of the StandardScaler in order to normalize the
distributions. To maintain generalization, the data was
separated into 80% training (2,320 shots) and 20% testing
(580 shots). In addition, random sampler of 300 shots was also
used in visualization exercises like pair plots. This data set
offers a solid basis on which various machine learning models
can be estimated.
IV.
Fig. 1. Proposed Methodology flowchart.
TABLE II.
Layer
Parameters/Units
Details
Input Layer
Input
StandardScaler
(numeric),
OneHotEncoder
(categorical)
Inputs
Normalizes numerical
values; Encodes
categorical features into
binary format
Train-Test
Split
80% Train, 20%
Test
Ensures model evaluation
on unseen data
Logistic
Regression
Solvers=[liblinear,
lbfgs]
SVM
Kernel, Gamma
Random
Forest
Estimators
XGBoost
Estimators, LR,
Subsample
Naïve Bayes
Var Smoothing
Decision
Tree
Evaluation
Layer
Max Depth, Min
Split
Regularization strength and
solver optimization
Controls margin, kernel
type, and gamma scaling
Controls tree depth, splits,
and ensemble size
Gradient boosting with
depth, learning rate, and
sampling ratio
Handles small probability
values for numerical
stability
Controls decision purity
and complexity
Measures model
performance
Preprocessin
g
PROPOSED ARCHITECTURE
The model proposed will categorize people as Introverts
or Extroverts through a machine learning-based system
comprising preprocessing, feature engineering, and
supervised classification methods. The pipeline model starts
with preprocessing that involves the standardization of the
numerical features to use StandardScaler, and the categorical
features to use One-Hot Encoding to guarantee compatibility
of the models. To establish a comparison of performances
among the different ML algorithms (A), several MLA were
used; these include LR, SVM, RF, XGBoost, NB and DT.
Both of the models have been initially trained on untuned
hyperparameters and then optimized to hyperparameter tuning
using GridSearchCV with 5-fold CV. It was evaluated with
the help of matrix perfromance. SVM was found to perform
the best among the untuned models (92.93% accuracy) and
when tuned Logistic Regression was the best model with a
accuracy of 92.93% and equal precision-recall scores. The
suggested solution shows strong classification performances
and preprocessing and tuning in are relevant in improving
prediction.
PARAMETERS
Performance Matrix
V.
RESULTS
According to results of experiment, machine learning
methods are effective in distinguishing between Introverts and
Extroverts using their behavioral characteristics. First, six
classifiers were tested in their untuned versions, namely, LR,
SVM, RF, XGBoost, NB, and DT. SVM performance was the
highest with the highest untuned accuracy of 92.93 percent
2
Authorized licensed use limited to: Chandigarh University. Downloaded on April 21,2026 at 09:24:50 UTC from IEEE Xplore. Restrictions apply.
SVM, NB, and XGBoost, also improved to the same level of
92.93% accuracy after tuning. These results confirm the
robustness of the dataset and the reliability of ML in
predicting personality traits.
closely followed by Naive bayes with the same accuracy and
Decision tree with the lowest accuracy of 87.24 percent.
Hyperparameter tuning with GridSearchCV was conducted.
Interestingly, on tuning, the best-performing model was the
Logistic Regression with a high accuracy of 92.93 and a
precision of 0.94 and a recall of 0.92 and F1-score of 0.93. The
similarity of performers among models indicates that there
was a good organization of the dataset and sample. The
analysis of the ROC curves showed that it was very separable
with an AUC value of almost 0.95 across the models
indicating strong robustness. Moreover, the visualization of
confusion matrices revealed an equal distribution of
classification of both classes of personalities. In general, the
discussion confirms that hyperparameter tuning is a critical
component of performance optimization, and Logistic
Regression has accuracy and interpretability, which is the
most appropriate model to use in this dataset.
TABLE III.
Class / Avg
0
(Introvert)
1
(Extrovert)
B. Loss
Indirectly in this work, loss was quantified in the number
of misclassifications as most of the algorithms used were
classical machine learning algorithms with cross-entropy loss
or hinge loss functions implicit in training. The most
successful models among the 580 samples of the test, namely;
LR, SVM, NB, RF and XGBoost all had 92.93, which
translates to around 41 samples not classified well and 539.
After hyperparameter optimisation, Logistic Regression did
not decline in performance, which suggests that there was very
little overfitting and that the process of optimisation did not
change substantially. Conversely, the Decision Tree model
was the most misclassified with the highest loss of
approximately 74 samples, with an accuracy of just 87.24%.
These false groupings underscore the overfitting nature of the
model because of its form. In general the low numbers of
misclassification among tuned models indicate that the
preprocessing stages such as scaling and encoding were
successful to minimize variability and enhance generalization.
The predictive ability of Logistic Regression with respect to
minimizing its loss highlights its ability to generate reliable
results on personality prediction problems.
CLASSIFICATION REPORT
Precision
Recall
F1-Score
Support
94
92
93
302
92
94
93
278
93
580
Accuracy
Macro Avg
93
93
93
580
Weighted
Avg
93
93
93
580
A. Accuracy
Fig. 3. Showing the graph of Training Loss & Validation Loss w.r.t
Epochs
C. Confusion Metrix
Fig 2. Training and Validation Accuracy w.r.t Epochs
The analysis of the proposed personality classification
model showed that the accuracy is high in all the MLA.
During the untuned stage, both SVM and NB classifier
demonstrated the highest accuracy of 92.93, having been able
to classify about 539 out of 580 test samples correctly.
Logistic Regression and Random Forest followed closely with
accuracies of 92.41% and 92.24%, while XGBoost achieved
91.72%, and Decision Tree performed the lowest with 87.24%
accuracy.
After
applying
GridSearchCV-based
hyperparameter tuning, Logistic Regression appeared as the
best-performing model with a tuned accuracy of 92.93%,
maintaining a balance in both introvert and extrovert
classifications. The performance of other models, such as RF,
3
Authorized licensed use limited to: Chandigarh University. Downloaded on April 21,2026 at 09:24:50 UTC from IEEE Xplore. Restrictions apply.
The confusion matrix provides a clear visualization of the
classification performance of the proposed model in
distinguishing Introverts and Extroverts. For the bestperforming tuned model, Logistic Regression, the confusion
matrix showed highly balanced results across both classes.
The model was able to classify 278 out of 302 Introvert
samples as Introverts, and falsely classify 24 samples as
Extroverts. Equally, it was able to classify 260 out of 278
samples of Extroverts rightly and 18 wrongly as Introverts.
This balance shows a good generalization of this model and
its capability to manage the distribution of classes. The large
percentage of correct predictions is brought out in the diagonal
superiority of the matrix whereas the relative low off-diagonal
values are reflective of low error levels. The total accuracy
was 92.93, and the precision and recall scores are nearly equal
0.93 of each of the two classes, which means that the model is
equally effective in reducing false positives and false
negatives. The confusion matrix therefore confirms the
strength of the Logistic Regression, and it is a good place to
rely on when classifying personalities in behavioral data.
VI.
This paper revealed that machine learning methods are
effective in personality prediction (that is differentiating
between Introvert and Extrovert) along with behavioural data.
Using preprocessing nodes like scaling and one-hot encoding,
in addition to many types of classifiers, has provided the
model with strong and stable results. With the untuned
models, SVM had the highest accuracy of 92.93 but with the
hyperparameter tuned, the best performing model was
Logistic Regression, which had the same accuracy of 92.93
with a F1-score of 0.93. These results indicate that in addition
to the fact that conventional models, such as Logistic
Regression are reliable, hyperparameter optimization is
necessary to enhance classification results. The equal
performances in terms of accuracy, recall, and precision also
prove the strength of the dataset and the offered pipeline. To
expand its scope in the future, this research can be further
developed with the introduction of deep learning structures,
ensemble techniques and transfer learning techniques to be
able to process larger and more varied data sets. Moreover, the
incorporation of textual, physiological, or social media data
may make the models more accurate and generalizable and
open the way to real-life applications in recruitment,
psychology, and personalized suggestions.
Fig. 3. Showing the Confusion Matrix
TABLE IV.
COMPARISON
RESEARCH APPROACHES.
BETWEEN
EXISTING
Aspect
Existing Research
Dataset Size
Often limited datasets
with fewer behavioral
features (typically
<1000 samples).
Preprocessing
Models Used
Hyperparameter
Tuning
Basic preprocessing;
limited handling of
categorical variables.
Primarily Logistic
Regression, Decision
Tree, or Naïve Bayes.
AND
PROPOSED
Proposed
Research
Large dataset of
2900 samples with
7 behavioral
features for robust
analysis.
Comprehensive
preprocessing using
StandardScaler
for numerical and
OneHotEncoder
for categorical
features.
Broader set of
models including
LR, SVM, RF,
XGBoost, NB, and
DT.
Often ignored or limited
grid search.
Extensive
GridSearchCVbased tuning with
5-fold crossvalidation for all
models.
Accuracy reported
between 70–85% in
earlier studies.
Accuracy improved
to 92.93% across
tuned models, with
F1-score = 0.93.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Performance
Metrics
[7]
Best Performing
Model
Logistic Regression or
Decision Tree in smallscale studies.
Generalization
Moderate, with
overfitting concerns in
decision-based models.
CONCLUSION & FUTURE SCOPE
Logistic
Regression (tuned)
performed best,
balancing accuracy
and interpretability.
Strong
generalization with
balanced
classification
confirmed via
ROC-AUC ~0.95.
[8]
[9]
Ahmad, S., Khan, R., & Malik, F. (2022). Personality prediction using
machine learning approaches: A comparative analysis. Journal of
Intelligent
Systems,
31(4),
523–536.
https://doi.org/10.1016/j.jintsys.2022.05.014
Sharma, N., & Gupta, R. (2022). Deep learning-based personality
classification from text and behavioral data. Applied Soft Computing,
118, 108485. https://doi.org/10.1016/j.asoc.2022.108485
B. P. Joshi and S. Kumar, “A computational method of forecasting
based on intuitionistic fuzzy sets and fuzzy time series,” in SocProS,
New Delhi, India, Dec. 20–22, 2011, vol. 2, pp. 993–1000, doi:
10.1007/978-81-322-0491-6_91.
W. A. Syed, J. Fang, V. S. B. Chilluri, M. Gu, N. P. Patil, and M.
Kattimani, “Methods and systems for dynamic compression and
transmission of application log data,” U.S. Patent 11 966 636 B2, Apr.
23,
2024.
Available:
https://patents.google.com/patent/US11966636B2/en.
P. S. Pisal, J. Kishore, B. P. Joshi and S. Goyal, "Detection of
Nanoparticles with Machine Learning Technique: Evaluation of
Algorithm Performance," ICSIT, Nagpur, India, 2025, pp. 1-5, doi:
10.1109/ICSIT65336.2025.11295367.
A. Sherov, T. Rakhimov, H. Hajiyev, M. Zelinskaya, and G. Khidirova,
“Evaluating class-imbalanced data handling for enhanced financial
distress prediction using an attention-based deep neural network and
heuristic optimization algorithms,” Eng. Technol. Appl. Sci. Res., vol.
15, Art. no. 13372, 2025, doi: 10.48084/etasr.13372.
A. Sherov, T. Rakhimov, H. Hajiyev, M. Zelinskaya, and G. Khidirova,
“Assessment of class imbalance data handling with attention-based
deep learning approach for robust financial distress prediction in
enterprises,” Eng. Technol. Appl. Sci. Res., vol. 15, Art. no. 14843,
2025, doi: 10.48084/etasr.14843.
I. Abdullayev, E. Akhmetshin, E. Hajiyev, Z. Mamadiyarov, and T.
Khorolskaya, “A financial time series forecasting model using quasirecurrent neural networks and the crown porcupine optimizer for stock
market risk prediction,” Eng. Technol. Appl. Sci. Res., vol. 15, Art. no.
13327, 2025, doi: 10.48084/etasr.13327.
S. Aarthi, R. N. Ravikumar, M. Kalandarova, N. Khalikova, and E.
Iskandarov, “Overcoming barriers in metaheuristic neural network
optimization for biomedical imaging,” in Metaheuristic Algorithms
and Optimizing Neural Networks for Biomedical Image Processing,
IGI Global, 2025, doi: 10.4018/979-8-3373-0523-3.ch014.
4
Authorized licensed use limited to: Chandigarh University. Downloaded on April 21,2026 at 09:24:50 UTC from IEEE Xplore. Restrictions apply.
[10] R. N. Ravikumar, S. Aarthi, B. S. Ruzimbaev, A. Satheesh Kumar, and
M. Jumaniyozova, “AI-enhanced clinical decision-making through a
collaborative approach,” in Applied AI and Computational Intelligence
in Diagnostics and Decision-Making, IGI Global, 2025, doi:
10.4018/979-8-3373-3311-3.ch010.
[11] Choudhary, Shilpa, Monali Gulhane, Sandeep Kumar, Nitin Rakesh,
Sudhanshu Maurya, and Chanderdeep Tandon. "Integrating Machine
Learning for Personalized Kidney Stone Risk Assessment: A
Prospective Validation Using CLDN11 Genetic Data and Clinical
Factors." Genomics at the Nexus of AI, Computer Vision, and Machine
Learning (2025): 59-85.
[12] Kumar, S., Sharma, K., Kumar, P. A., Jain, A., Bhagat, S. K., & Singh,
P. (2024, September). An improved particle swarm approach for
energy-aware location-aided routing in mobile ad-hoc network. In
2024 7th International Conference on Contemporary Computing and
Informatics (IC3I) (Vol. 7, pp. 1119-1124). IEEE.
[13] Verma P (2024) A Foodie’s Proselytization mediates Lifestyle and
Affective Commitment: An application of Affect Heuristics in the
hospitality sector. International Journal of Hospitality & Tourism
Administration.
25(5)
875-895.
DOI
https://doi.org/10.1080/15256480.2023.2175288.
[14] Patnaik, S., Wang, J. Y., Sadiq, F. U., & Sharma, K. (2025, December).
Nutritional Interventions in Head and Neck Cancer Patients
Undergoing Chemoradiotherapy: A Systematic Review and MetaAnalysis. In Healthcare (Vol. 13, No. 24, p. 3324).
[15] Vadisetty, R. (2026). Bio-inspired AI Algorithms for Autonomous
Agents: Revolutionizing Decision-Making, Resource Allocation, and
Adaptability in Cloud Networks Through Nature-Inspired Models. In:
Swaroop, A., Virdee, B., Correia, S.D., Polkowski, Z. (eds)
Proceedings of Data Analytics and Management. ICDAM 2025.
Lecture Notes in Networks and Systems, vol 1600. Springer, Cham.
https://doi.org/10.1007/978-3-032-03072-6_40
[16] Bansal, Shonak, Sandeep Kumar, Arpit Jain, Vinita Rohilla, Krishna
Prakash, Anupma Gupta, Tanweer Ali et al. "Design and TCAD
analysis of few-layer graphene/ZnO nanowires heterojunction-based
photodetector in UV spectral region." Scientific Reports 15, no. 1
(2025): 7762.
[17] Singh, G., Sharma, S., Dhanny, B. H. S., & Garg, V. (Eds.). (2024). HR
4.0 Practices in the Post-COVID-19 Scenario. CRC Press..
[18] Hareesh, B., Moses C. John, and MVV Prasad Kantipudi. "VLSI
Architectures of Booth Multiplication Algorithms--A Review."
International Journal of Computing and Digital Systems 11, no. 1
(2022): 265-276.
[19] Jonnala, Naga Surekha, Renuka Chowdary Bheemana, Krishna
Prakash, Shonak Bansal, Arpit Jain, Vaibhav Pandey, Mohammad
Rashed Iqbal Faruque, and K. S. Al-Mugren. "DSIA U-Net: deep
shallow interaction with attention mechanism UNet for remote sensing
satellite images." Scientific Reports 15, no. 1 (2025): 549.
[20] Singh, A., Luthra, A., Garg, S., & Sharma, V. (2025). Metaverse
Adoption Among Banking Users: A Developing Nation's Perspective.
In The AI Metaverse Revolution: Transforming Multi-business
Scenarios (Volume 1) (pp. 95-114). Emerald Publishing Limite.
[21] Vadisetty, R., Polamarasetti, A., Varadarajan, V., Kalla, D.,
Ramanathan, G.K. (2026). Cyber Warfare and AI Agents:
Strengthening National Security Against Advanced Persistent Threats
(APTs). In: Dhoska, K., Spaho, E. (eds) AI and Digital Transformation:
Opportunities, Challenges, and Emerging Threats in Technology,
Business, and Security. ICITTBT 2025. Communications in Computer
and Information Science, vol 2669. Springer, Cham.
https://doi.org/10.1007/978-3-032-07373-0_43.
5
Authorized licensed use limited to: Chandigarh University. Downloaded on April 21,2026 at 09:24:50 UTC from IEEE Xplore. Restrictions apply.