Optimizing Mental Health Detection in Indian Armed Forces Personnel through Feature Engineering Driven Dataset Reduction, Addressing Suicide, Depression, and Stress

Authors

  • Sudipto Roy Research Scholar, Department of Computer Science, Shri Vaishnav Vidyapeeth Vishwavidyalaya, Indore, India Author
  • Jigyasu Dubey Head of Department of Computer Science, Shri Vaishnav Vidyapeeth Vishwavidyalaya, Indore, India Author

DOI:

https://doi.org/10.32628/CSEIT241026

Keywords:

Machine Learning, Psychometric Test, Feature Engineering, Exploratory Data Analysis, imensionality Reduction, Principal Components Analysis

Abstract

Within the realm of machine learning, the construction of high-quality datasets stands as a crucial factor profoundly influencing model performance. This research aims to furnish a comprehensive guide for enhancing the accuracy and efficiency of dataset construction. It achieves this by integrating multi-variate reduction techniques and innovative feature engineering strategies, implemented within the Python programming ecosystem. As the landscape of datasets becomes increasingly diverse and complex, the imperative to optimize precision grows more critical. This study explores the judicious application of dimensionality reduction methods, such as Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE), alongside various feature selection approaches to strategically streamline datasets while preserving vital information. In conjunction with these reduction techniques, the research introduces novel feature engineering methods to amplify the discriminative power of remaining features, thereby enriching the dataset's representational capacity. The exploration spans a spectrum of multi-variate reduction techniques and delves into feature engineering methodologies, including polynomial feature creation, interaction term generation, and domain-specific transformation functions. Practical implementations of these techniques are demonstrated through Python, showcasing their applicability across diverse domains. Empirical evaluations on real-world datasets underscore the efficacy of the proposed methodology, revealing superior accuracy and efficiency compared to conventional dataset construction approaches. The insights derived from this research contribute significantly to the broader discourse in machine learning, presenting a generic yet potent framework for enhancing precision in datasets. Beyond deepening our understanding of multi-variate reduction and feature engineering, the findings offer a practical guide for researchers and practitioners seeking to optimize precision in various machine learning applications.              

Downloads

Download data is not yet available.

References

Kosinski, M., Stillwell, D., and Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. U.S.A. 110, 5802–5805, (2013). DOI: https://doi.org/10.1073/pnas.1218772110

Monaro, M., Galante, C., Spolaor, R., Li, Q. Q., Gamberini, L., Conti, M., et al. Covert lie detection using keyboard dynamics. Scientific Reports 8 (1976). DOI: https://doi.org/10.1038/s41598-018-20462-6

Vieira, S., Pinaya, H., and Mechelli, A. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neurosci. Biobehav. Rev. 74(Part A), 58–75, (2017). DOI: https://doi.org/10.1016/j.neubiorev.2017.01.002

Obermeyer, Z., and Emanuel, E. J. Predicting the future: big data, machine learning, and clinical medicine. N. Engl. J. Med. 375, 1216–1219, (2016). DOI: https://doi.org/10.1056/NEJMp1606181

Pace, G., Orrù, G., Merylin, M., Francesca, G., Roberta, V., Boone, K. B., Malingering detection of cognitive impairment with the B test is boosted using machine learning. Front. Psychol. 10:1650 (2019). DOI: https://doi.org/10.3389/fpsyg.2019.01650

Navarin, N., and Costa, F. An efficient graph kernel method for noncoding RNA functional prediction. Bioinformatics 33, 2642–2650, (2017). DOI: https://doi.org/10.1093/bioinformatics/btx295

Seidenberg, M. S. Connectionist models of word reading. Curr. Dir. Psychol. Sci. 14, 238–242(2005). DOI: https://doi.org/10.1111/j.0963-7214.2005.00372.x

Pashler, H., and Wagenmakers, E. J. Editors’ introduction to the special section on reliability in psychological science: a crisis of confidence? Perspect. Psychol. Sci. 7, 528–530(2012). DOI: https://doi.org/10.1177/1745691612465253

Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231, (2001). DOI: https://doi.org/10.1214/ss/1009213726

Ioannidis, J. P., Tarone, R., and McLaughlin, J. K. The false-positive to false-negative ratio in epidemiologic studies. Epidemiology 24, 450–456. (2011). DOI: https://doi.org/10.1097/EDE.0b013e31821b506e

Zhang, J. M., Harman, M., Ma, L., and Liu, Y. Machine learning testing: survey, landscapes and horizons. arXiv [Pre-print]. (2019).

Stef van Buuren, Karin Groothuis- Oudshoorn, “MICE: Multivariate Imputation by Chained Equations in R”. Journal of Statistical Software 45: 1-67, (2011). DOI: https://doi.org/10.18637/jss.v045.i03

Roderick J, A Little and Donald B Rubin “Statistical Analysis with Missing Data”. John Wiley & Sons, Inc., New York, NY, USA, (1986).

Domański, P.D. ‘Study on Statistical Outlier Detection and Labelling’. Int. J. Autom. Computing. 17, 788–811, (2020). DOI: https://doi.org/10.1007/s11633-020-1243-2

Jishan S.T., Rashu R.I., Mahmood A., Billah F., Rahman R.M. “Application of Optimum Binning Technique in Data Mining Approaches to Predict Students’ Final Grade in a Course”. Computational Intelligence in Information Systems. Vol 331. Springer, Cham, (2015). DOI: https://doi.org/10.1007/978-3-319-13153-5_16

Jajuga, Krzysztof, and Marek Walesiak. "Standardisation of data set under different measurement scales." In Classifica-tion and information processing at the turn of the millennium, pp. 105-112. Springer, Berlin, Heidelberg, (2000). DOI: https://doi.org/10.1007/978-3-642-57280-7_11

Reddy, G. Thippa, et al. "Analysis of dimensionality reduction techniques on big data." IEEE Access 8, (2020). DOI: https://doi.org/10.1109/ACCESS.2020.2980942

Mladenić, Dunja. "Feature selection for dimensionality reduction." International Statistical and Optimization Perspectives Workshop" Subspace, Latent Structure and Feature Selection". Springer, Berlin, Heidelberg, (2005). DOI: https://doi.org/10.1007/11752790_5

Pan, Sinno Jialin, James T. Kwok, and Qiang Yang. "Transfer learning via dimensionality reduction." AAAI. Vol. 8. (2008).

Peluffo, Diego H., John A. Lee, and Michel Verleysen. "Recent methods for dimensionality reduction: A brief compara-tive analysis." ESANN, (2014).

Khalid, Samina, Tehmina Khalil, and Shamila Nasreen. "A survey of feature selection and feature extraction techniques in machine learning." 2014 Science and Information Conference. IEEE, (2014). DOI: https://doi.org/10.1109/SAI.2014.6918213

Ajzen, I. ‘The Theory of Planned Behaviour. Organizational Behaviour and Human Decision Processes’, 50, 179-211. (1991). DOI: https://doi.org/10.1016/0749-5978(91)90020-T

Clark, L. A., & Watson, D. Constructing validity: Basic issues in objective scale development. Psychological Assess-ment, 7, 309–319, (1995). DOI: https://doi.org/10.1037//1040-3590.7.3.309

Kyriazos, T. A., & Stalikas, A. Applied Psychometrics: The Steps of Scale Development and Standardization Process. Psychology, 9, 2531-2560, (2018). DOI: https://doi.org/10.4236/psych.2018.911145

Fabrigar, L. R., & Ebel-Lam, A. Questionnaires. In N. J. Salkind (Ed.), Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: Sage, pp. 808-812 (2007).

Dorans, N. J. Scores, Scales, and Score Linking. The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development, V.II, pp. 573-606, (2018).

Chadha, N. K. Applied Psychometry. New Delhi, IN: Sage Publications. (2009). DOI: https://doi.org/10.4135/9788132108221

Price, L. R., Psychometric Methods: Theory into Practice. New York: The Guilford Press. (2017).

Dorans, N. J. “Scores, Scales, and Score Linking. The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development”, V.II (pp. 573-606), (2018). DOI: https://doi.org/10.1002/9781118489772.ch19

DeVellis, R. F. ‘Scale Development: Theory and Applications’ (4th ed.). Thousand Oaks, CA: Sage. (2017).

Jenkins, G. D., & Taber, T. D. ‘A Monte Carlo Study of Factors Affecting Three Indices of Composite Scale Reliability’. Journal of Applied Psychology, 62, 392-398. (1977). DOI: https://doi.org/10.1037//0021-9010.62.4.392

Streiner, D. L., Norman, G. R., & Cairney, J. ‘Health Measurement Scales: A Practical Guide to Their Development and Use’ (5th ed.). Oxford, UK: Oxford University, (2015). DOI: https://doi.org/10.1093/med/9780199685219.001.0001

Dimitrov, D. M. “Statistical Methods for Validation of Assessment Scale Data in Counselling and Related Fields”. Alexandria, VA: American Counselling Association. (2012).

Morrison, K. M., & Embretson, S. ‘Item Generation. In P. Irwing, T. Booth, & D. J. Hughes (Eds.), The Wiley Hand-book of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development”, V.I (pp. 46-96), (2018). DOI: https://doi.org/10.1002/9781118489772.ch3

Demaio, T., & Landreth, A. “Do Different Cognitive Interview Methods Produce Different Results”, Questionnaire Development and Testing Methods. Hoboken, NJ: Wiley. (2004). DOI: https://doi.org/10.1002/0471654728.ch5

Raykov, T. “Scale Construction and Development Using Structural Equation Modelling”. R. H. Hoyle (Ed.), Handbook of Structural Equation Modeling (pp. 472-492). New York: Guilford Press. (2012).

Downloads

Published

06-03-2024

Issue

Section

Research Articles

Similar Articles

1-10 of 482

You may also start an advanced similarity search for this article.