Clipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction

dc.contributor.authorKaradeniz, M. B.
dc.contributor.authorEfeoglu, Ebru
dc.contributor.authorCelik, Burak
dc.contributor.authorKocyigit, Adem
dc.contributor.authorTuretken, Bahattin
dc.date.accessioned2025-05-20T18:59:16Z
dc.date.issued2025
dc.departmentBilecik Şeyh Edebali Üniversitesi
dc.description.abstractThe exponential rise in clinical research costs can potentially be mitigated by half through the implementation of machine learning-driven efficient data processing techniques. Traditional methods like data preprocessing and hyperparameter tuning, which are effective for model optimization, often introduce complexities that can diminish the benefits of machine learning integration. To overcome this issue, we present Clipper: a novel, cluster-based data pruning approach designed specifically for biomedical data, aiming to enhance the predictive accuracy of machine learning models. Clipper's key advantage lies in its ability to automate the data pruning process, optimizing accuracy without the need for manual hyperparameter adjustments-a typically cumbersome aspect of machine learning tasks. Upon comprehensive comparative analysis, the proposed Clipper methodology demonstrates superior performance across various medical and biological datasets. Our experiments reveal Clipper's consistent superiority over baseline models, with significant accuracy improvements: 44% for Heart Disease, 7% for Breast Cancer, 40% for Parkinson's, and 20% for Raisin classification. Specifically, the model achieves remarkable predictive accuracy, with classification rates of 99.5% for Heart Disease, 99.64% for Breast Cancer, 99.47% for Parkinson's Disease, and 93% for Raisin Classification, thereby substantially outperforming contemporary state-of-the-art computational techniques. The empirical evidence suggests that Clipper serves as an effective accuracy enhancer for baseline models, eliminating the need for parameter tuning or complex preprocessing steps. Furthermore, Clipper produces robust outputs even at very low split rates, where baseline models typically perform poorly.
dc.identifier.doi10.1016/j.eij.2025.100641
dc.identifier.issn1110-8665
dc.identifier.issn2090-4754
dc.identifier.scopus2-s2.0-105000287047
dc.identifier.scopusqualityQ1
dc.identifier.urihttps://doi.org/10.1016/j.eij.2025.100641
dc.identifier.urihttps://hdl.handle.net/11552/8294
dc.identifier.volume30
dc.identifier.wosWOS:001452395900001
dc.identifier.wosqualityQ1
dc.indekslendigikaynakWoS
dc.indekslendigikaynakScopus
dc.indekslendigikaynakWoS - Science Citation Index Expanded
dc.language.isoen
dc.publisherCairo Univ, Fac Computers & Information
dc.relation.ispartofEgyptian Informatics Journal
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.snmzKA_WOS_20250518
dc.subjectMachine learning
dc.subjectClustering
dc.subjectBiomedical data
dc.subjectPruning
dc.titleClipper: An efficient cluster-based data pruning technique for biomedical data to increase the accuracy of machine learning model prediction
dc.typeArticle

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Makale.pdf
Boyut:
1.6 MB
Biçim:
Adobe Portable Document Format