Introduction

First described in 1906 by German psychiatrist Alois Alzheimer, Alzheimer’s disease (AD) is a progressive neurodegenerative disorder of the central nervous system that results in the gradual loss of memory and other cognitive capacities. The hallmarks of AD are the accumulation of extracellular protein plaques containing amyloid and intracellular neurofibrillary tangles (NTs) containing hyperphosphorylated tau protein. However, it is still uncertain if these plaques and NTs actually cause the associated neuronal and functional impairments. Moreover, epidemiological, neuroimaging, and neuropathological studies suggest that AD results from a combination of genetic, lifestyle, and environmental triggers and that multiple pathomechanisms contribute to neurodegeneration. Alzheimer’s disease is the most prominent cause of dementia1 and the sixth most prevalent cause of premature death worldwide. Over 55 million people globally currently suffer from dementia, and this number is expected to rise above 150 million by 2050 due to population aging. Further, more than 75\(\%\) of these patients are undiagnosed and lack access to treatment and care2. Only four drugs have been approved for AD therapy, and these only treat symptoms to improve patient quality of life, but do not stop the progression of the underlying neurodegenerative processes3. Moreover, drug development for AD has a high failure rate, presumably due to the many unknown aspects of disease pathogenesis. Therefore, there is an urgent need to identify novel pathogenic mechanisms and therapeutic targets4. Mounting evidence suggests that normal age-related functional decline and diseases of aging arise at least in part from the accumulation of molecular damage due to oxidative stress and inflammation (5,6 for reviews). The oxidative stress (OS) theory of aging proposes that the slow and steady accumulation of macromolecular damage from reactive oxygen species (ROS) eventually leads to the functional impairment or irreversible loss of terminally differentiated cells7. Related theories include the mitochondrial theory of aging8, the cellular senescence theory of aging9, and the molecular inflammation theory of aging10. In all these theories, the accumulation of ROS-induced damage is posited as a potential mechanism leading to age-related functional decline, failure of endogenous repair mechanisms, and neurodegenerative diseases, of which AD is the most common11,12,13. It has been demonstrated that products of protein, DNA, RNA, and lipid oxidation (such as carbonyls, 3-nitrotyrosine, 8-oxo-7,8-dihydroguanine, and lipid peroxides) are substantially elevated in the brains of AD patients14,15,16,17. Moreover, oxidative damage to macromolecules is among the earliest signs of AD onset and an index of disease progression. In this study, we identified multiple OS-associated biomarkers in AD for the first time by comparing brain tissue gene expression profiles between patients and matched controls from the Gene Expression Omnibus (GEO), investigated the functional annotations of differentially expressed genes (DEGs), and constructed gene co-expression and protein-protein interaction (PPI) networks to define hub genes. We then constructed a diagnostic model based on these hub genes and used the miRNET database to predict hub gene targeted drugs and miRNAs as novel potential treatments.

Results

Identification of DEGs and module genes

A total of 11046 differentially expressed genes (DEGs) were found between AD and control samples from the GEO GSE132903 dataset, of which 5805 were upregulated and 5241 downregulated (Fig. 1). The top 15 most significantly upregulated DEGs were TMEM132E-DT, KRAS, CHP1, NDUFAB1, ATP5PD, TUBB2A, SLC16A14, RAB2A, NAP1L5, PFN2, MRPL15, COPS8, NEDD8, HSPB3 and NRN1, and the top 15 most downregulated DEGs were KATNIP, STAG1, NIPBL, ZNF438, MZF1, TMEM150A, ZHX3, SSH3, TNIP1, HBP1, NFKB1, MCM7, CERS1, RBPJ and NFKBIA (Fig. 2).

Figure 1
figure 1

Volcano plot of differential gene expression between AD patients and matched controls from the GEO database.

Figure 2
figure 2

Heatmap of differentially expressed genes (DEGs) between AD patients and controls from the GEO database.

In weighted gene co-expression network cluster analysis, the gene sets of all samples were included in the dendrogram (i.e., there were no outliers) (Figs. 3,  4). The soft threshold was set at 17 (R2 \(=\) 0.85) to construct a scale-free network (Fig. 5), and 20 modules were identified (Fig. 6). Based on the criteria (|Cor| > 0.4), blue, turquoise, yellow, and black modules, collectively including 7098 genes, were selected for further analysis (Fig. 7).

Figure 3
figure 3

Outlier sample detection. The gene sets from all samples are contained in the dendrogram.

Figure 4
figure 4

Clustering tree.

Figure 5
figure 5

Soft threshold screening.

Figure 6
figure 6

WGCNA modules.

Figure 7
figure 7

Module-disease correlations.

Screening and functional enrichment of candidate genes

Analysis of the overlap among the 11046 DEGs, 7098 module genes, and 446 OSRGs yielded 156 candidate hub genes associated with OS and AD (Fig. 8a). These candidate hub genes were enriched in 1577 GO terms (1453 BP, 43 CC, 81 MF), including biological process (BP) terms “response to oxidative stress,” “cellular response to oxidative stress”, “cellular response to chemical stress”, and “response to reactive oxygen species,” cellular component (CC) terms “membrane raft”, “membrane microdomain”, and “mitochondrial matrix” (Fig. 10a), and molecular function (MF) terms “antioxidant activity”, “ubiquitin-like protein ligase binding”, and “ubiquitin protein ligase binding” (See Table 1 in the Supplementary Material). Additionally, candidate genes were enriched in 237 KEGG pathways, including “FOXO signaling pathway”, “MicroRNAs in cancer”, “AGE-RAGE signaling pathway in diabetic complications”, and “Shigellosis” (Fig. 10b and see Table 2 in the Supplementary Material).

Figure 8
figure 8

(a) Wayne diagram of candidate hub gene screening. (b) Chromosomal locations of candidate hub genes. The gene expression petterns of these genes in the GSE132903 datasets are represented in the inner circular heatmaps. Red indicates gene up-regulation, blue represents downregulation, and the outer circle of heatmap represents the control group, the inner circle of heatmap represents the disease group. The outer circle represents chromosomes; lines coming from each gene point to their specific chromosomal locations.

Figure 9
figure 9

Manhattan plot of candidate genes.

Figure 10
figure 10

(a) GO enrichment analysis of candidate hub genes. (b) KEGG pathway enrichment analysis of candidate hub genes.

With two types of AD, early-onset and late-onset, both associated with autosomal dominant or sporadic inheritance18, studying chromosomal location can help researchers gain a deeper insight into genetic factors. The visualization of gene expression patterns and chromosomal locations of candidate hub genes in AD and normal controls were exhibited in Figs. 8b and 9, suggesting that these genes were distributed in all chromosomes except for chromosome Y, and there were most genes in chromosome 1.

Identification of hub genes by co-expression analysis

To identify hub genes (those most strongly interacting with the greatest number of other DEGs), a PPI network was constructed using the STRING website (Fig. 11a) and analyzed using Cytoscape, which yielded a sub-network with 11 candidates (MAPK3 (on chromosome 16), MAPK1 (on chromosome 22), SP1 (on chromosome 12), CASP3 (on chromosome 4), MAPT (on chromosome 17), BCL2 (on chromosome 18), APP (on chromosome 21), FOXO1 (on chromosome 13), ETS1 (on chromosome 11), MAPK9 (on chromosome 5), and EDN1 (on chromosome 6)) (Figs. 8b,  11b). We further validated these hub genes with screening thresholds p. adj < 0.01 and abs(logFC) > 0.58519, and the results demonstrate the reliability of these hub genes in terms of statistical significance (See Table 6 in the Supplementary Material). Of these, MAPK3 and MAPK1 were the most strongly connected to the rest of the network with connectivity scores of 15, followed by SP1 (10) and CASP3 (10) (See Fig. S1 in the Supplementary Material).

Figure 11
figure 11

(a) Protein–protein interaction network constructed from candidate hub genes. (b) Hub sub-networks.

All 11 candidate hub genes were expressed at different levels in AD and normal samples of the GSE132903 dataset, with APP, MAPK1, MAPK9, and MAPT expression levels higher in control samples and the rest higher in AD samples (Fig. 12). The capacity of each hub gene to distinguish AD from control samples according to expression level was then examined by ROC curve analysis, and the seven candidates with best discrimination performance (area under the ROC curve > 0.7), namely SP1, CASP3, MAPT, BCL2, FOXO1, ETS1, and MAPK9, were selected for further analyses (Fig. 13a).

Figure 12
figure 12

Expression levels of candidate hub genes in patients and matched controls.

Figure 13
figure 13

(a) ROC curve analysis of candidate hub genes. The seven with best performance for distinguishing AD from control samples according to area under the curve (AUC) were selected for further analyses. (b) Correlations between expression levels of various candidate hub gene pairs. (c) Correlation between FOXO1 and BCL2 expression. (d) Correlation between SP1 and MAPK9 expression.

Of these seven candidate hub genes in the GSE118553 dataset, all except MAPT were significantly differentially expressed between AD and normal samples, with CASP3 and MAPK9 expressed at higher levels in normal samples and SP1, BCL2, FOXO1, and ETS1 expressed at higher levels in the AD samples (see Fig. S5). The AUCs of the ROC curves for SP1, CASP3, BCL2, and MAPK9 were greater than 0.7 (see Fig. S6 in the supplementary material). Moreover, SP1, BCL2, FOXO1, ETS1, and MAPK9 were differentially expressed between AD and control samples from the GSE5281 dataset (see Fig. S7 in the Supplementary Material). Also, the AUCs of the ROC curves for SP1, BCL2, FOXO1, ETS1, and MAPK9 were greater than 0.7 (See Fig. S8 in the Supplementary Material). Based on these analyses, MAPK9, FOXO1, BCL2, ETS1, and SP1 were identified as hub genes. Co-expression analysis revealed that BCL2 and FOXO1 exhibited the strongest positive correlation, while MAPK9 and SP1 demonstrated the strongest negative correlation (Fig. 13b–d).

Correlations between immune cell infiltration scores and hub gene expression levels

To examine the effects of hub gene expression on neuroinflammation, we assessed associations with the brain infiltration of specific immune cell types. Expression of BCL2 was significantly correlated with the infiltration scores of 24 immune cell types (all considered except for Type 2 T helper cells, regulatory T cells, eosinophils, and central memory CD4 T cells). The correlations were positive with T follicular helper cell, natural killer cell, natural killer T cell, and effector memory CD8 T cell infiltration scores, and negative with effector memory CD4 T cell and Activated CD4 T cell infiltration scores (Fig. 14a). There was no significant correlation between ETS1 expression and either Type 2 T helper cell or activated CD4 T cell infiltration scores. ETS1 expression was negatively correlated only with effector memory CD4 T cell infiltration score, while all other correlations were positive (Fig. 14b). The expression levels of BCL2, ETS1, FOXO1, and SP1 were positively correlated with infiltration scores for the majority of immune cells (See Figs. S2, S3 in the Supplementary Material). In contrast, MAPK9 expression was positively correlated only with Type 2 T helper cell, regulatory T cell, effector memory CD4 T cell, central memory CD4 T cell, and activated CD4 T cell infiltration scores, but negatively correlated with the infiltration scores of most other immune cells (See Fig. S4 in the Supplementary Material).

Figure 14
figure 14

Correlations of immune cell infiltration scores with expression levels of BCL2 and ETS1.

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) was then conducted to explore the molecular functions of these hub genes. All five hub genes are included in the “Alzheimer’s disease pathway’ (i.e., the Alzheimer’s disease pathway is overrepresented in the hub gene list), with MAPK9 downregulated and the remaining four upregulated. However, all five genes are also included in the list of genes implicated in “Parkinson’s disease’ and high expression levels of BCL2, ETS1, and FOXO1 are associated with “ribosome,” “chronic myeloid leukemia,” and “small cell lung cancer’ (Fig.  15a–e).

Figure 15
figure 15

Gene set enrichment analysis for (a) BCL2, (b) ETS1, (c) FOXO1, (d) MAPK9, and (e) SP1.

Drugs, genes, and transcription factors regulating hub genes

78 target drugs were predicted by FOXO1, SP1, MAPK9 and BCL2 (Fig. 16a and see Table 3 in the Supplementary material). For instance, there were 3 target drugs (Fluorouracil, Cyclophosphamide, epirubicin) of FOXO1, and 2 target drugs (Definite, Definite) of SP1. There were 21 drugs predicted by MAPK9, such as Camptothecin, Chembl406845, OSI-632, PD-0166285 and SB-220025, and 43 drugs were predicted by BCL2, including Hydroquinone, Pentoxifylline, Edaravone, Gossypol, etc.

The Drug-Gene Interaction database identified 78 small-molecule drugs potentially influencing hub gene expression or function. Moreover, a hub gene-miRNA regulatory network and a hub gene-TF network generated using the miRNet database identified 43 targeted miRNAs (Fig. 16b and see Table 4 in the Supplementary Material) and 36 TFs. For example, GATA2 and NFIC are TFs controlling SP1, ETS1, and MAPK9 expression (Fig. 16c and Table 5 in the Supplementary Material).

Figure 16
figure 16

(a) Small-molecule drugs influencing hub gene expression or function. (b) MicroRNAs targeting hub genes. (c) Transcription factors regulating hub gene expression.

Diagnostic model based on 5 hub genes

A diagnostic model was then constructed based on expression levels of the five hub genes using LASSO regression and ROC analyses. For the optimal model developed using the training dataset, lambda.min = 0.00071235 (Fig. 17), and the coefficients of the five genes were BCL2 \(= -1.9400531\), ETS1 \(=\) 1.3313010, FOXO1 \(=\) 3.6540838, MAPK9 \(=\) 0.6012298, and SP1 \(=\) 1.5144388. Moreover, the confusion matrix showed that the specificity of the model reached 0.74, accuracy 0.75, and recall 0.76 for the training set (Fig. 18a). Further, the AUC of the ROC curve was 0.818, indicating good discrimination performance (Fig. 18b).

Figure 17
figure 17

LASSO regression model \(\lambda\)-screening.

Model performance was then validated using the GEO dataset GSE5281 (Fig. 18c). The confusion matrix showed that the model achieved similar precision (0.94), accuracy (0.786), and recall (0.75) on this dataset. The AUC of the ROC curve was also similar (0.88), underscoring the general utility of the diagnostic model (Fig. 18d).

Figure 18
figure 18

(a) Confusion matrix for training set. (b) ROC curve for training set. (c) Confusion matrix for validation set. (d) ROC curve for validation set.

Discussion

A prominent early feature of AD is the accumulation of OS markers20,21,22, which results from an imbalance between endogenous ROS generation and cellular antioxidant capacity. The primary endogenous sources of ROS, such as mitochondrial respiration and various enzymatic reactions, and the physiological mechanisms regulating endogenous antioxidants are well described. Moreover, many exogenous factors are known to directly scavenge ROS and/or enhance endogenous antioxidant capacity. Therefore, limiting OS and associated damage may be a particularly promising area of AD drug research and development4. Here we identified numerous potential OS markers by comparing gene expression profiles between patients and matched controls. X chromosome aberrations have been confirmed by previous studies23 to be highly associated with oxidative stress and could be a breakthrough for future disease-related investigations. Meanwhile, mutations in three different genes on chromosomes 1, 14, and 21 were reported to result in autosomal dominant forms of familial Alzheimer’s disease, that is24, the APP gene on chromosome 21, the PS1 gene on chromosome 14, and the PS2 gene onchromosome 123,25. Based on the visualization of the chromosomal location of candidate hub genes (Figs. 8b and 9), it was showed that the AD-related differential genes identified were distributed on these crutial chromosomes, with the most differential candidates on chromosome 1. Meanwhile, chromosome 6 locus and chromosome 12 are highly associated with late-onset Alzheimer’s disease26, while AD is also highly neuropathologically similar to familial British dementia and familial Danish dementia, which are highly associated with chromosome 1327. It further indicates that these chromosomes are significantly associated with AD development.

Then we defined five major hub genes of a network associated with oxidative stress in AD patients. These detected hub genes included the stress-associated signaling factor MAPK9, the negative regulator of apoptosis BCL2, and three transcription factors, FOXO1, ETS1, and SP1, associated with various stress and protective responses. A diagnostic model based on the expression levels of these five genes distinguished patients from controls with high sensitivity and selectivity.

Expression of MAPK9 was consistently higher in normal samples while the expression levels of FOXO1, BCL2, ETS1, and SP1 were higher in AD samples (See Fig. S5 in the Supplementary Material). Construction of hub gene-miRNA and hub gene-TF networks identified multiple regulatory pathways contributing to these differences in expression, and many of the interactions of DEGs with TFs and miRNAs we identified were demonstrated experimentally in past studies28,29,30,31,32. These DEGs and upstream TFs are involved in a variety of biological processes, such as the immune response, apoptosis, and vascular morphogenesis. Previous studies have strongly implicated BCL2 and FOXO1 in neuronal survival, while FOXO and MAPK signaling pathways are known as important regulators of the cell cycle and autophagy/apoptosis33. Previous studies have also reported BCL2 overexpression in the fontal cortex of AD patients and suggested that this reflects a compensatory anti-apoptotic response that is stronger in frontal cortex than in other brain regions34,35,36. There is also compelling evidence for a relationship between dysregulated BCL2 expression and impaired learning and memory. High FOXO1 expression may be linked to age-related neurodegeneration through regulation of advanced glycation end product (AGE) signaling pathways, cellular senescence mechanisms, autophagic pathways, and apoptotic pathways induced by accumulation of misfolded proteins or cellular damage37,38. Elevated ETS1 expression may be associated with microglia activation and proliferation, a critical initiating event in neuroinflammation39,40. The transcription factor SP1 is strongly implicated in the regulation of OS response genes, and notably also in the expression of amyloid precursor protein (APP) and tau protein. Moreover, SP1 has been found to be significantly upregulated in the frontal cortex of AD patients30,41,41,42. Finally, MAPK9 (also known as JNK2) plays an important role in the oxidative stress response43, and many studies have confirmed that JNK signaling is associated with aging44,45.

A diagnostic model constructed based on the expression levels of these five genes (MAPK9, BCL2, FOXO1, ETS1, and SP1) using LASSO regression and ROC analysis demonstrated high accuracy for distinguishing patients from matched controls in both training and validation datasets. Therefore, this model may have clinical value for the early diagnosis of AD. BCL2 is a primary negative regulator of apoptosis that may be induced in response to stressors such as DNA damage. In contrast, however, neurons with neurofibrillary tangles (NTs), a hallmark of AD, exhibit downregulated BCL2 expression and thus potentially greater susceptibility to apoptosis46. Recent research has confirmed that BCL2 also regulates intracellular Ca\(^2+\) signaling by influencing mitochondrial- and endoplasmic reticulum-based mechanisms for Ca2 homeostasis, and aberrant neuronal Ca\(^2+\) signaling has long been implicated in AD as well as other neurodegenerative disorders. Therefore, anti-apoptotic BCL2 and its derived peptides may have potential for therapeutic prevention of AD pathogenesis47. ETS1 is a pro-angiogenic transcription factor, and enhanced intravascular immunoreactivity as well as extravasation have been documented in the cortical microvasculature of AD brain. The expression of ETS1 is also strongly correlated with expression of tumor necrosis factor-α (TNF-α), an important pro-inflammatory cytokine released by reactive microglia (ref). Therefore the expression pattern of ETS1 in the brain of AD patients is likely of great importance to disease progression39. The FOXO transcription factors regulate the balance between cell longevity and tumor suppression by controlling the expression of proteins involved in cell cycle arrest, stress resistance, and apoptosis. Therefore, among these downstream proteins are numerous potential molecular targets for the treatment of AD48. The c-Jun N-terminal kinase (JNK) signaling cascade is triggered in response to a broad spectrum of stressors49. Of particular relevance to AD, JNK-MAPK proteins are involved in the regulation of apoptosis during oxidative damage through phosphorylation of effector proteins or transcription factors, which may lead to dysregulation of tau phosphorylation and accumulation of hyperphosphorylated tau, the major component of NTs50,51. The transcription factor SP1 also responds to inflammatory signals, such as those induced by neuronal damage. Elevated expression of SP1 in AD samples may be a protective response, so it would be intriguing to investigate whether regulation of SP1 by drugs such as mithramycin can improve AD symptoms such as memory dysfunction52. The known functions of these hub genes are also consistent with the strong correlations with immune infiltration scores and enrichment analyses using GO and DEGG databases. The identification of these hub genes may thus facilitate the development of novel disease-modifying therapies as well as symptom-targeted drugs.

In this study, we firstly identified five hub genes that target oxidative stress (OS) process in Alzheimer’s disease (AD) through bioinformatic analysis, which were then examined for biological processes and pathways, associations with brain infiltration of specific immune cells, and for their capacity as a diagnostic model to distinguish patients from controls. While further experimental validation is required to verify OS- and AD-related functions, the high performance of the diagnostic model and previous investigations strongly implicate these five hub genes, their downstream targets for transcriptional regulation, and upstream regulators (including numerous miRNAs) in the OS response in AD. It is worth that 78 drugs identified that target these hub genes have been not proven to mitigate AD, and no EST1-modulating drugs were detected, suggesting that testing of these agents in AD animal models and development of small-molecule EST1 regulators are potentially productive areas for future research. The distribution of candidate hub genes on chromosomes may provide insights for future studies on potential pathogenic mechanisms underlying potential pathogenesis on other chromosomes. Also, it is vital to supplement this model to distinguish AD from Parkinson’s disease.

Methods

Data sources

Datasets GSE132903, GSE118553, and GSE5281 were obtained from the National Center for Biotechnology Information database (https://www.ncbi.nlm.nih.gov/). The GSE132903 dataset contains expression profiles of middle temporal gyrus from 97 Alzheimer’s disease patients and 98 non-demented controls, GSE118553 includes transcriptomic data of brain tissues from 52 AD patients and 31 controls, and GSE5281 comprises gene expression data of brain samples from 16 AD patients and 11 controls. Moreover, 446 oxidative stress-related genes (OSRGs) were obtained from the Gene Ontology (GO) database (Data cohort: GO:0006979). The minimum age of study subjects in these datasets is 70 years.

Identification and functional enrichment analysis of candidate genes influencing the oxidative stress response in Alzheimer’s disease

The Limma package (version 3.50.1) was used to perform differential expression analysis between AD and control samples of the GSE132903 dataset. In this study, since the training set is microarray data with overall small logFC values, the screening threshold of DEGs was adjusted p value (p. adj) < 0.0553. Considering that a large number of genes were available in the original dataset, the adjusted p-values provided a good balance between finding statistically significant genes and limiting the occurrence of false positives by implementing a Benjamini-Hochberg (BH) correction54 to control the false discovery rate of the entire significant genes, and considered for multiple testing. Further, the traits of AD patients and normal controls of the GSE132903 dataset were including in WGCNA analysis (WGCNA version 1.70-3)55 to define gene modules related to AD. Briefly, samples were first clustered to determine whether there were outliers that needed to be removed. Then, the optimal soft threshold was defined to ensure that gene interactions maximally conformed to a scale-free distribution. Based on the optimal soft threshold, a gene dendrogram was plotted using the dynamic tree cutting algorithm. After calculating the correlations between each module and traits, the modules with |Cor| > 0.4 were selected as key modules and the component genes were included as module genes for subsequent analyses. Candidate genes potentially influencing the OS response in AD were then identified by the overlap among DEGs, module genes, and OSRGs. The GO and KEGG enrichment analyses of candidate genes were performed using clusterProfiler (version 4.2.2), with a significance criterion of p < 0.05. Furthermore, genomic location distributions and expression levels of the candidate genes were visualized using OmicCircos (version 1.32.0).

Identification and correlation analysis of hub genes

A PPI network of the hub candidate genes was constructed using the STRING website, and core network genes selected using the MCODE plug-in of Cytoscape (version 3.8.2). The genes in the core network were then evaluated for distinguishing AD from normal samples using ROC curve analysis, and those with AUC > 0.7 were selected. These candidate hub genes were further verified by ROC analysis using GSE118553 and GSE5281 dataset samples. Candidates showing reliable group classification performance in all datasets were selected as the final hub genes. Subsequently, the correlation in expression level between each hub gene pair was calculated and scatter plots drawn for the gene pairs with strongest positive and negative correlations.

Immune infiltration analysis and GSEA

Neuroinflammation is a major driver of AD progression, so we examined if these hub genes were associated with the infiltration of various immune cells56. Correlations with the infiltration of 28 immune cell types in samples from GSE132903 were estimated by single-sample (ss)GSEA using GSVA (version 1.42.1). To further explore the functions of these hub genes, samples with high and low expression were analyzed using The Molecular Signatures Database (MSigDB). Among the significantly enriched KEGG pathways (p. adj < 0.05), the top five with normalized enrichment score (NES) > 0 and NES < 0 were selected for presentation.

Regulatory networks and target drugs of hub genes

The miRNAs and transcription factors (TFs) regulating hub genes were predicted using the miRNET database57. To improve the clarity of network diagrams, only miRNAs targeting at least three hub genes were screened and visualized. Additionally, drugs influencing expression of hub genes were predicted using the Drug–Gene Interaction database (DGIdb)58, which includes drug–gene interaction data from 30 sources.

Construction of a diagnostic model

A diagnostic model based on expression levels of the five hub genes was then established using GSE132903 as the training set. The expression levels of all hub genes were extracted and subjected to LASSO regression analysis to derive coefficients for diagnostic model construction. To evaluate the effectiveness of the diagnostic model, all samples in GSE132903 were categorized as disease or control, and the effectiveness of this categorization verified by plotting a confusion matrix and ROC curves. In addition, these analyses were repeated using GSE5281 as an external validation set.