International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF
Volume 74 | Issue 4 | Year 2026 | Article Id. IJCTT-V74I4P104 | DOI : https://doi.org/10.14445/22312803/IJCTT-V74I4P104

Lung Cancer Risk Prediction using Random Forest and Logistic Regression


Bhavya Mittal, Pari Sharma, Princy Sharma, Priya Dwivedi, Ravindra Chauhan

Received Revised Accepted Published
28 Feb 2026 30 Mar 2026 18 Apr 2026 30 Apr 2026

Citation :

Bhavya Mittal, Pari Sharma, Princy Sharma, Priya Dwivedi, Ravindra Chauhan, "Lung Cancer Risk Prediction using Random Forest and Logistic Regression," International Journal of Computer Trends and Technology (IJCTT), vol. 74, no. 4, pp. 45-51, 2026. Crossref, https://doi.org/10.14445/22312803/IJCTT-V74I4P104

Abstract

Lung cancer remains one of the leading causes of death worldwide, making early detection important for improving patient outcomes. This study presents a machine learning-based method to predict lung cancer risk by using clinical symptoms along with environmental air quality factors. The system was developed using a dataset of 3000 balanced records containing patient symptoms, Air Quality Index (AQI), and PM2.5 levels. Feature engineering was used to create combined indicators such as smoke anxiety, breath-cough patterns, and pollution exposure. Random Forest and Logistic Regression models were compared, giving accuracies of 56% and 52%, respectively. Although the accuracy is moderate, the results show that environmental factors can contribute to early risk assessment. Among the two models, Random Forest performed better because it captured nonlinear relationships more effectively. This work provides a simple approach that may support preliminary lung cancer risk screening.

Keywords

Lung Cancer Diagnosis, Air Quality Index (AQI), Machine Learning, Feature Engineering, Random Forest.

References

[1] Satya Prakash Maurya et al., “Performance of Machine Learning Algorithms for Lung Cancer Prediction: A Comparative Approach,” Scientific Reports, vol. 14, pp. 1-11, 2024.
[
CrossRef] [Google Scholar] [Publisher Link]

[2] Matanel Levi et al., “Machine Learning Computational Model to Predict Lung Cancer Using Electronic Medical Records,” Cancer Epidemiology, vol. 92, 2024.
[
CrossRef] [Google Scholar] [Publisher Link]

[3] Alexander J. Didier et al., “Application of Machine Learning for Lung Cancer Survival Prognostication—A Systematic Review and Meta-Analysis,” Frontiers in Artificial Intelligence, vol. 7, pp. 1-10, 2024.
[
CrossRef] [Google Scholar] [Publisher Link]

[4] Nath, Lung Cancer Dataset, Kaggle, 2024. [Online]. Available: https://www.kaggle.com/datasets/akashnath29/lung-cancer-dataset

[5] Hasib Al Muzdadid, Global Air Pollution Dataset, Kaggle, 2024. [Online]. Available: https://www.kaggle.com/datasets/hasibalmuzdadid/global-air-pollution-dataset

[6] Rebecca L. Siegel et al., “Cancer statistics, 2023,” CA: A Cancer Journal for Clinicians, vol. 73, no. 1, pp. 17-48, 2023.
[
CrossRef] [Google Scholar] [Publisher Link]

[7] Asghar Ali Shah et al., “Deep Learning Ensemble 2D CNN Approach towards the Detection of Lung Cancer,” Scientific Reports, vol. 13, pp. 1-15, 2023.
[
CrossRef] [Google Scholar] [Publisher Link]

[8] World Health Organization, Cancer, 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cancer

[9] World Health Organization, Air Quality, Energy and Health, 2021. [Online]. Available: https://www.who.int/teams/environment-climate-change-and-health/air-quality-energy-and-health

[10] Rajkomar et al., “Machine Learning in Medicine,” New England Journal of Medicine, vol. 380, no. 14, pp. 1347-1358, 2019.
[
CrossRef] [Google Scholar] [Publisher Link]

[11] Andre Esteva et al., “A Guide to Deep Learning in Healthcare,” Nature Medicine, vol. 25, no. 1, pp. 24-29, 2019.
[
CrossRef] [Google Scholar] [Publisher Link]

[12] Geert Litjens et al., “A Survey on Deep Learning in Medical Image Analysis,” Medical Image Analysis, vol. 42, pp. 60-88, 2017.
[
CrossRef] [Google Scholar] [Publisher Link]

[13] David W. Hosmer, Stanley Lemeshow, and Rodney X. Sturdivant, Applied Logistic Regression, 3rd Ed., Hoboken, NJ: Wiley, 2013.
[
Google Scholar] [Publisher Link]

[14] Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd Ed., San Francisco, CA: Morgan Kaufmann, 2012.
[
Google Scholar] [Publisher Link]

[15] C. Arden Pope, Richard T. Burnett, and Michael J. Thun, “Lung Cancer, Cardiopulmonary Mortality, and Long-Term Exposure to Fine Particulate Air Pollution,” JAMA, vol. 287, no. 9, pp. 1132-1141, 2002.
[
CrossRef] [Google Scholar] [Publisher Link]

[16] Leo Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
[
CrossRef] [Google Scholar] [Publisher Link]

[17] UCI Machine Learning Repository, Lung Cancer Dataset. [Online]. Available: https://archive.ics.uci.edu/dataset/62/lung+cancer

[18] Scikit-Learn Developers, Scikit-Learn: Machine Learning in Python. [Online]. Available: https://scikit-learn.org/