Hybrid Methods for Feature Selection Algorithms in the Field of Medical Records


  • Yuda Syahidin Manajemen Informasi Kesehatan, Piksi Ganesha Politechnic, Bandung, Indonesia.
  • Ade Irma Suryani Rekam Medis dan Informasi Kesehatan Piksi Ganesha Politechnic, Bandung, Indonesia.




Predictive,, Selection Feature,, Hybrid Method


Big data growth in the healthcare community, accurate analysis of medical data supports early disease detection, patient care and community services. However, the accuracy of the analysis decreases when the quality of the medical data is incomplete. Feature selection is a process that selects a subset of features that are relevant for a predictive modeling problem. This method can identify and remove unnecessary, irrelevant, and redundant attributes from the dataset, which do not contribute to the accuracy of the model or reduce the accuracy of the model. The challenge in research in the field of medical records is in structured and unstructured data which results in a method being needed to assist the algorithm in selecting good features. This paper provides an overview of the proposed hybrid method that can be used for feature selection algorithms in the field of medical records.


B. Shickel, P. J. Tighe, A. Bihorac, dan P. Rashidi, “Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis,” arXiv, vol. 22, no. 5, hal. 1589–1604, 2017.
A. Panesar, Machine Learning and AI for Healthcare. .
P. B. Jensen, L. J. Jensen, dan S. Brunak, “Mining electronic health records: Towards better research applications and clinical care,” Nat. Rev. Genet., vol. 13, no. 6, hal. 395–405, 2012, doi: 10.1038/nrg3208.
C. Xiao, E. Choi, dan J. Sun, “Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review,” Journal of the American Medical Informatics Association, vol. 25, no. 10. hal. 1419–1428, 2018, doi: 10.1093/jamia/ocy068.
I. D. Dinov, Data science and predictive analytics: Biomedical and health applications using R. 2018.
Y. Bengio, A. Courville, dan P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, hal. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.
N. Polyzotis, S. Roy, S. E. Whang, dan M. Zinkevich, “Data lifecycle challenges in production machine learning: A survey,” SIGMOD Rec., vol. 47, no. 2, hal. 17–28, 2018, doi: 10.1145/3299887.3299891.
B. A. Goldstein, A. M. Navar, M. J. Pencina, dan J. P. A. Ioannidis, “Opportunities and challenges in developing risk prediction models with electronic health records data: A systematic review,” J. Am. Med. Informatics Assoc., vol. 24, no. 1, hal. 198–208, 2017, doi: 10.1093/jamia/ocw042.
N. G. Weiskopf, G. Hripcsak, S. Swaminathan, dan C. Weng, “Defining and measuring completeness of electronic health records for secondary use,” J. Biomed. Inform., vol. 46, no. 5, hal. 830–836, 2013, doi: 10.1016/j.jbi.2013.06.010.
J. Latif, C. Xiao, S. Tu, S. U. Rehman, A. Imran, dan A. Bilal, “Implementation and Use of Disease Diagnosis Systems for Electronic Medical Records Based on Machine Learning: A Complete Review,” IEEE Access, vol. 8, hal. 150489–150513, 2020, doi: 10.1109/ACCESS.2020.3016782.
K. Yan dan D. Zhang, “Feature selection and analysis on correlated gas sensor data with recursive feature elimination,” Sensors Actuators, B Chem., vol. 212, hal. 353–363, 2015, doi: 10.1016/j.snb.2015.02.025.
T. M. Le, T. M. Vo, T. N. Pham, dan S. V. T. Dao, “A Novel Wrapper-Based Feature Selection for Early Diabetes Prediction Enhanced with a Metaheuristic,” IEEE Access, vol. 9, hal. 7869–7884, 2021, doi: 10.1109/ACCESS.2020.3047942.
A. U. Haq, D. Zhang, H. Peng, dan S. U. Rahman, “Combining Multiple Feature-Ranking Techniques and Clustering of Variables for Feature Selection,” IEEE Access, vol. 7, hal. 151482–151492, 2019, doi: 10.1109/ACCESS.2019.2947701.
R. Ghorbani, R. Ghousi, A. Makui, dan A. Atashi, “A New Hybrid Predictive Model to Predict the Early Mortality Risk in Intensive Care Units on a Highly Imbalanced Dataset,” IEEE Access, vol. 8, hal. 141066–141079, 2020, doi: 10.1109/ACCESS.2020.3013320.
T. Poongodi, D. Sumathi, P. Suresh, dan B. Balusamy, “Deep learning techniques for electronic health record (EHR) analysis,” Stud. Comput. Intell., vol. 903, hal. 73–103, 2021, doi: 10.1007/978-981-15-5495-7_5.
A. Zheng dan A. Casari, Feature Engineering for Machine Learning. .
K. J. Max Kuhn, Feature Engineering and Selection: A Practical Approach for Predictive Models. (Chapman & Hall/CRC Data Science Series) 1st Edition, 2019.
K. Yu dan X. Xie, “Predicting Hospital Readmission: A Joint Ensemble-Learning Model,” IEEE J. Biomed. Heal. Informatics, vol. 24, no. 2, hal. 447–456, 2020, doi: 10.1109/JBHI.2019.2938995.
H. Wang, Z. Cui, Y. Chen, M. Avidan, A. Ben Abdallah, dan A. Kronzer, “Predicting Hospital Readmission via Cost-Sensitive Deep Learning,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 15, no. 6, hal. 1968–1978, 2018, doi: 10.1109/TCBB.2018.2827029.
H. Wang, Z. Cui, Y. Chen, M. Avidan, A. Ben Abdallah, dan A. Kronzer, “Predicting Hospital Readmission via Cost-Sensitive Deep Learning,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 15, no. 6, hal. 1968–1978, 2018, doi: 10.1109/TCBB.2018.2827029.
J. W. Baek dan K. Chung, “Context Deep Neural Network Model for Predicting Depression Risk Using Multiple Regression,” IEEE Access, vol. 8, hal. 18171–18181, 2020, doi: 10.1109/ACCESS.2020.2968393.
T. Ruan dkk., “Representation learning for clinical time series prediction tasks in electronic health records,” BMC Med. Inform. Decis. Mak., vol. 19, no. Suppl 8, hal. 1–14, 2019, doi: 10.1186/s12911-019-0985-7.
M. Jamshidi dkk., “Artificial Intelligence and COVID-19: Deep Learning Approaches for Diagnosis and Treatment,” IEEE Access, vol. 8, no. December 2019, hal. 109581–109595, 2020, doi: 10.1109/ACCESS.2020.3001973.
C. Zhou, Y. Jia, dan M. Motani, “Optimizing Autoencoders for Learning Deep Representations from Health Data,” IEEE J. Biomed. Heal. Informatics, vol. 23, no. 1, hal. 103–111, 2019, doi: 10.1109/JBHI.2018.2856820.
X. Shi dkk., “Multiple Disease Risk Assessment with Uniform Model Based on Medical Clinical Notes,” IEEE Access, vol. 4, hal. 7074–7083, 2016, doi: 10.1109/ACCESS.2016.2614541.
S. J. Pasha dan E. S. Mohamed, “Novel Feature Reduction (NFR) model with machine learning and data mining algorithms for effective disease risk prediction,” IEEE Access, vol. 8, hal. 184087–184108, 2020, doi: 10.1109/ACCESS.2020.3028714.
Q. Zhenya dan Z. Zhang, “A hybrid cost-sensitive ensemble for heart disease prediction,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, hal. 1–18, 2021, doi: 10.1186/s12911-021-01436-7.
B. Wang dkk., “A Multi-Task Neural Network Architecture for Renal Dysfunction Prediction in Heart Failure Patients with Electronic Health Records,” IEEE Access, vol. 7, hal. 178392–178400, 2019, doi: 10.1109/ACCESS.2019.2956859.
G. Du, J. Zhang, Z. Luo, F. Ma, L. Ma, dan S. Li, “Knowledge-Based Systems Joint imbalanced classification and feature selection for hospital readmissions,” Knowledge-Based Syst., vol. 200, hal. 106020, 2020, doi: 10.1016/j.knosys.2020.106020.
M. Khalilia, S. Chakraborty, dan M. Popescu, “Predicting disease risks from highly imbalanced data using random forest,” 2011.
Y. Maali, O. Perez-concha, E. Coiera, D. Roffe, R. O. Day, dan B. Gallego, “Predicting 7-day, 30-day and 60-day allcause unplanned readmission: a case study of a Sydney hospital,” hal. 1–11, 2018, doi: 10.1186/s12911-017-0580-8.
S. B. Golas dkk., “A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: A retrospective analysis of electronic medical records data,” BMC Med. Inform. Decis. Mak., vol. 18, no. 1, 2018, doi: 10.1186/s12911-018-0620-z.
J. L. Speiser, “A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data,” J. Biomed. Inform., vol. 117, no. October 2020, hal. 103763, 2021, doi: 10.1016/j.jbi.2021.103763.
T. Wanyan dkk., “Relational Learning Improves Prediction of Mortality in COVID-19 in the Intensive Care Unit,” IEEE Trans. Big Data, no. December 2020, 2020, doi: 10.1109/TBDATA.2020.3048644.
G. Kong, K. Lin, dan Y. Hu, “Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU,” vol. 2, hal. 1–10, 2020.
Y. Ren, H. Fei, X. Liang, D. Ji, dan M. Cheng, “A hybrid neural network model for predicting kidney disease in hypertension patients based on electronic health records,” vol. 19, no. Suppl 2, 2019, doi: 10.1186/s12911-019-0765-4.
N. L. Fitriyani, M. Syafrudin, G. Alfian, dan J. Rhee, “Development of Disease Prediction Model Based on Ensemble Learning Approach for Diabetes and Hypertension,” IEEE Access, vol. 7, hal. 144777–144789, 2019, doi: 10.1109/ACCESS.2019.2945129.
Y. Sun dan D. Zhang, “Diagnosis and Analysis of Diabetic Retinopathy Based on Electronic Health Records,” IEEE Access, vol. 7, hal. 86115–86120, 2019, doi: 10.1109/ACCESS.2019.2918625.
Y. Hao, M. Usama, J. Yang, M. S. Hossain, dan A. Ghoneim, “Recurrent convolutional neural network based multimodal disease risk prediction,” Futur. Gener. Comput. Syst., vol. 92, hal. 76–83, 2019, doi: 10.1016/j.future.2018.09.031.
A. Hajjam, E. Hassani, E. Andr, dan A. K. G, “Classification models for heart disease prediction using feature selection and PCA,” vol. 19, 2020, doi: 10.1016/j.imu.2020.100330.
S. Galli, Python Feature Engineering Cookbook. Packt, 2020.