Model Construction for Cataract Prediction Based on Public Expression Database and Clinical Cohort
1. The Fourth Hospital of Cangzhou City (Nanpi County People’s Hospital), Cangzhou 061500, Hebei, China; 2. Cangzhou Eye Hospital, Cangzhou 061000, Hebei, China
Abstract:A cataract prediction model was established and evaluated by screening differential genes based on the Gene Expression Omnibus (GEO) database. Firstly, the microarray data related to cataract were scree-ned from GEO by bioinformatics method and analyzed by GEO2R and NetworkAnalyst software to obtain the most significant differentially expressed genes. Then, based on the cataract screening cohort of the Health Management Center of our hospital, a cataract risk prediction model was constructed and the nomogram was drawn by using Cox proportional hazards regression. The degrees of differentiation and calibration, predictive ability and benefit of the model were evaluated through C-index, calibration curve, the receiver operating characteristic (ROC) curve and decision curve analysis (DCA). In GSE5645, GSE193629 and GSE161701 da-tasets, protamine 1 (PRM1) is a high expression gene, and serotonin 2C receptor (HTR2C) a low expression gene. There were statistically significant differences (P<0.05) between the cataract and non-cataract groups in the age, body mass, systolic blood pressure, contrast sensitivity (CS), objective scatter index (OSI), modulation transfer function (MTF) cut off, Strehl ratio (SR), dynamic vision, PRM1, HTR2C and CX46. Five variables including age, OSI, MTF cut off, PRM1 and HTR2C were finally included in the prediction model (P<0.05). The final prediction model was log[h(t)/h0(t)]=2.689 2+0.012×age+1.320×OSI-0.041×MTF cut off+0.029×PRM1-6.549×HTR2C. The C-index of the model was 0.875, confidence interval (CI) was 0.862~0.886, and the predicted probability was close to the actual probability. The area under the curve (AUC) was 0.904 (95% CI: 0.884~0.923), and the sensitivity and specificity were 82.4% and 92.3%, respectively. The average AUC by the ten-fold crossover method was 0.911. The DCA diagram showed that, when the high risk threshold of the model was 0.25~0.75, the net return rate would be greater than 0. The cataract clinical prediction model established in this study proved to have good differentiation, calibration, prediction ability, internal effective-ness and clinical benefit, and would possess high clinical application value.
郭志强, 张立友, 许利娟, 宫美娜, 韩 笑. 基于公共基因表达数据库和临床样本队列构建白内障预测模型[J]. 生命科学研究, 2023, 27(5): 447-454. GUO Zhiqiang, ZHANG Liyou, XU Lijuan, GONG Meina, HAN Xiao. Model Construction for Cataract Prediction Based on Public Expression Database and Clinical Cohort. Life Science Research, 2023, 27(5): 447-454.