Predicting severity in COVID-19 disease using sepsis blood gene expression signatures | Scientific Reports –

Mechanisms of sepsis severity and mortality in COVID-19 patients

To identify COVID-19 specific severity mechanisms, we initially compared the whole blood gene expression profiles associated with defined severity groups from a cohort of 124 patients recruited at various times relative to hospital admission. Patient severity was assessed using two measures, one based on the requirement for respiratory support and the second based on whether a patient died in hospital. Based on principal component analysis (PCA) of global gene expression profiles, patients could be well separated according to respiratory support requirement group and eventual in-hospital mortality on two dimensions (Supplemental Fig. 1a). The association of several parameters (including respiratory support requirement group, various comorbidities, and time of sampling relative to hospital admission) with gene expression principal components was also explored using multiple linear regression (Supplemental Fig. 1b). These results indicated strong association with the second principal component, PC2, for eventual mortality (p value = 0.000059; multiple linear regression) and respiratory support group (p value = 0.0062), with PC2 explaining 17.8% of the total gene expression variance. Interestingly, other influences on gene expression, including sex, non-COVID-19 active infections, active cancer diagnosis, etc., were not associated with the most variable PCs (i.e., PC1-4).

Differential expression analysis comparing patient severity groups, as defined by need for respiratory support, indicated several thousands of genes were dysregulated in severe COVID-19 disease. Most notably, when comparing severe vs. moderate patients, there were 2671 differentially expressed (DE) genes, with 1794 up-regulated and 877 down-regulated (≥ ± 1.5-fold change; adjusted p ≤ 0.05). These were analyzed for over-representation of Reactome22 and Hallmark gene-sets23, which capture pathways and well-defined biological processes, respectively. Up-regulated genes differentiating severe vs. moderately ill patients were enriched in several pathways, including those involving neutrophil degranulation, interferon α/β/γ signaling, interleukin (IL)-4 and IL-13 signaling, oxidative stress, and signaling by NOTCH. Furthermore, canonical inflammation hallmarks such as TNF-α signaling via NFκ-B and the Inflammatory response were also enriched (Fig. 1). Down-regulated DE genes were enriched in other pathways/hallmarks, including adaptive immune pathways such as programmed cell death-1 (PD-1) signaling, generation of second messenger molecules, TCR signaling, and allograft rejection. These results indicated that while anti-viral responses were enhanced, T-cell responses and activity were likely reduced during severe COVID-19 disease, consistent with severe all-cause sepsis12,28. The severe vs. intermediate comparison also showed a significant dysregulation of gene expression (535 up-regulated, 159 down-regulated); with enriched pathways/hallmarks including neutrophil degranulation, interferon α/β/γ signalling, coagulation, and complement. The intermediate vs. moderate comparison yielded relatively few DE genes (136 up-regulated and 33 down-regulated) and enriched pathways/hallmarks, indicating these groups were most similar based on gene expression (or conversely that the intermediate group showed some heterogeneity).

Figure 1
figure 1

Functional characterization of the gene expression differences between severity groups and eventual mortality in Quebec COVID-19 patients. Shown are the top 10 enriched pathways/hallmarks for each comparison, direction, and gene set database assessed (i.e., lowest p values). Severity groups were assessed using the respiratory support required at time of sampling with 54 severe, 28 intermediate, and 42 moderate patients. There were 20 patients who died and 100 patients who survived. The severe vs. moderate, severe vs. intermediate, and intermediate vs. moderate, died vs. survived, comparisons yielded 2671 (1794 up, 877 down), 694 (535 up, 159 down), 169 (136 up, 33 down), and 1708 (1006 up, 702 down) DE genes, respectively (displaying ≥  ± 1.5-fold change; adjusted p ≤ 0.05). Age, sex, batch, and time of sampling relative to hospital admission were included in the underlying DESeq2 model to correct for their contribution to gene expression. A post-hoc power analysis (performed with the online tool RnaSeqSampleSize53) using parameters estimated from previous RNA-Seq studies, indicated our study was sufficiently powered (p = 0.8) to detect DE gene differences in gene expression (at least 15% of transcriptome) between patient groups.

When comparing patients who succumbed in the hospital to those who survived, there were 1006 up-regulated and 702 down-regulated genes. The majority of these genes overlapped with the DE genes obtained when comparing severe vs. moderate patients (1122 overlapping genes) (Fig. 1). As such, neutrophil degranulation, interleukin signaling, down-regulation of T cell signaling pathways, and inflammatory hallmarks were commonly enriched between respiratory support severity and eventual mortality comparisons. Interestingly, heme metabolism, cholesterol homeostasis, and p53 pathways were uniquely enriched in DE genes from the mortality comparison, indicating that additional mechanisms became prominently dysregulated when mortality was imminent. Indeed, heme metabolism was the most significantly enriched hallmark among genes DE between patients who died vs. survived, and was downregulated (Fig. 1). This suggests that the down-regulation of heme metabolism is likely a specific feature of severe Covid-19 disease and was most evident among patients who succumbed to this disease. We further examined gene expression data of previously published COVID-19 positive and negative patients from Toronto, Ontario, who were recruited on the first day of ICU admission for COVID-19 specific pathways (GEO accession GSE185263)12. Highly enriched interferon-α/β/γ signaling (up-regulated) and heme metabolism (down-regulated) was apparent in COVID-19 positive patients when compared to negative patients as well as in COVID-19 positive compared to uninfected healthy controls (Supplemental Fig. 2).

The common enrichment of the neutrophil degranulation pathway across severity comparisons indicated that neutrophil activity and/or abundance was a marked feature of severe COVID-19 sepsis. This feature has been described in previous studies of all-cause sepsis, with several neutrophil and granulocyte genes and pathways being dysregulated in all-cause sepsis patients progressing to organ dysfunction/failure12,29. The ratio of neutrophils to lymphocytes cellular abundance in patients has been touted as a moderately accurate (AUC ≈ 0.695) and easily measurable prognostic biomarker of sepsis severity and outcomes in patients with advanced sepsis30. We estimated the neutrophil to lymphocyte ratio using the gene expression-based deconvolution program CIBERSORT31 and from cell differential values measured in hospital in order to explore their association to respiratory support group (Supplemental Fig. 3). It was clear that patients with severe and intermediate COVID-19 had the highest neutrophil to lymphocyte ratio which significantly differed from that of moderate patients (p = 9.2 × 10–7 by the Kruskal–Wallis test using CIBERSORT estimated ratios).

Although our initial motivation was to identify COVID-19 specific markers of patient severity, it was apparent from our analyses that existing comorbidities in patients contributed to the observed gene expression values (Supplemental Fig. 1b), including chronic renal, lung and heart insufficiencies and active cancer. It is well-accepted that existing comorbidities influence the risk of sepsis and sepsis severity32, however, few studies have examined the role of comorbidities on the transcriptomic profiles in COVID-19 sepsis. Accordingly, we performed DE analysis, followed by pathway/hallmark analysis, to identify mechanisms dysregulated between COVID-19 patients with existing chronic insufficiencies vs. those without (Supplemental Fig. 4). The largest dysregulation observed was among patients with chronic lung insufficiencies, which is notable considering SARS-CoV-2 infection is characterized by progression of the infection into the lower respiratory tract (i.e., lung tissue). This comparison yielded down-regulated MSigDB hallmarks, including complement, coagulation, and inflammatory responses. Interestingly, patients with chronic renal insufficiencies, cf. those with no chronic renal issues, also exhibited dysregulation of unique mechanistic subsets, including an up-regulation of interferon α/β/γ signaling, which was also observed among comparisons of Covid-19 severity. When comparing those with chronic heart insufficiencies vs. not, few DE genes and pathways/hallmarks arose. These results indicate that overall specific gene expression differences occurred due to pre-existing comorbidities during COVID-19 disease, highlighting the possibility of adjusting therapeutic interventions according to comorbidity status.

COVID-19 patients displayed dysregulation of all-cause sepsis severity signatures

Several mechanisms were dysregulated between COVID-19 severity groups, and many of these could be biomarkers of COVID-19 pathogenesis and thus valuable for patient prognostication. Some of these mechanisms have been implicated in sepsis, including those reflecting neutrophil activity, inflammatory cytokines, and adaptive immune pathways strengthening the notion that severe COVID-19 is a type of sepsis. To further indicate the extent to which severe COVID-19 overlaps with all-cause sepsis, we took the reverse approach, specifically determining whether prognostic gene expression signatures uncovered in all-cause sepsis cohorts are dysregulated in COVID-19 patients.

We previously described reduced 8–12 gene mechanistic signatures of sepsis severity uncovered in all-cause sepsis training cohorts11,12. These included the cellular reprogramming (CR; 8 genes), Organ Dysfunction (12 genes), and Mortality (10 genes) signatures, which were identified and validated in early sepsis patients at first clinical presentation to the ER as well as in the ICU. The CR and Organ Dysfunction signatures were shown to be highly accurate in predicting impending severe sepsis and organ dysfunction. Similarly, the Mortality signature was strongly associated with 30-day mortality and reasonably predictive of impending mortality. We determined whether these signatures were dysregulated between COVID-19 severity groups using ROAST, a supervised gene-set DE algorithm that assesses whether a set of genes, as a unit, is significantly up- or down-regulated25.

When analyzing all patients among the Quebec cohort, the CR, Organ Dysfunction, and Mortality signatures were strongly associated with the assessed endpoints (Table 2). In particular, when comparing severe vs. moderate patients each signature displayed p values < 0.0001, indicating a strong degree of signature dysregulation between these patient groups. This suggested that all-cause sepsis signatures were also highly relevant to COVID-19 pathogenesis in severely-afflicted patients. It is important to note that this cohort was comprised of patients sampled early and late in their hospital stays, and, as indicated in Supplemental Fig. 1b, sampling to admission time played a significant role in gene expression variability. Accordingly, we assessed the gene-set level DE of the severity signatures among patients’ groups separated by sampling time (i.e., early and late). This was done to minimize the role that disease progression might have had in the performance of the signatures. Patients were stratified into early (sampled within 1–5 days) and late (sampled between 6 and 20 days) post-hospital admission groups to assess whether gene expression signatures exhibited altered behaviour when accounting for this source of heterogeneity. The results indicated that across early and late sampled patients, the signatures were still significantly dysregulated between severe vs. moderate and dead vs. surviving patients. Severe vs. intermediate and intermediate vs. moderate patients exhibited lesser dysregulation, and this was most evident in late sampled patients. The gene set level dysregulation of the full CR, Organ Dysfunction, and Mortality signatures12 was also assessed (Supplemental Table 1), and displayed similar results to those obtained using the reduced signatures (Table 2).

Table 2 Gene-set differential expression of the reduced cellular reprogramming/CR (8 genes), organ dysfunction (12 genes), and mortality (10 genes) signatures in COVID-19 patient cohorts.

We also determined the predictive accuracy of the CR, Organ Dysfunction, and Mortality signatures when used to directly predict patient severity groups. Logistic regression models, for which the reduced signature gene sets provided input into the model as covariates/predictors (Fig. 2a), were trained to predict respiratory support group and eventual mortality. In the case of respiratory support group, a binary classification scheme was trained to predict severe vs. intermediate + moderate patients. We combined the intermediate and moderate patients because their gene expression profiles were most similar when analyzed using differential expression analysis. Due to the modest size of these predictive reduced signatures (8–12 genes), they reflected subsets with the greatest potential for translation, since smaller signatures can be more easily assessed in clinics using qRT-PCR. A tenfold repeated cross validation was used to estimate predictive performance. The Organ Dysfunction (mean AUC = 0.79, sensitivity = 0.71, specificity = 0.68) and CR (mean AUC = 0.86, sensitivity = 0.80, specificity = 0.79) signatures displayed good performance in early sampled patients. Among late sampled patients, the Organ Dysfunction (mean AUC = 0.92, sensitivity = 0.80, specificity = 0.77) and CR (mean AUC = 0.91, sensitivity = 0.83, specificity = 0.81) signatures also displayed excellent performance.

Figure 2
figure 2

Predictive performance of the reduced CR, Organ Dysfunction, and Mortality signatures among patients of the COVID-19 patient cohorts. (a) Accuracy/AUCs across a tenfold repeated cross validation scheme (i.e., total 100 folds) using logistic regression. (b) Kaplan Meier curves of patients with High or Low Mortality signature enrichment in all patients (i.e., early and late sampled patients). Time to mortality was assessed over 30 days, with the baseline time being the day of sampling. Patients were stratified into either High (score ≥ 0; n = 60) or Low (score < 0; n = 64) signature expression based on GSVA enrichment statistics, and this was used as a predictor of 30-day survival.

The Mortality signature also displayed good performance in predicting eventual mortality in early (mean AUC = 0.86, sensitivity = 0.77, specificity = 0.75) and late (mean AUC = 0.89, sensitivity = 0.78, specificity = 0.86) sampled patients. We also explored the association of the Mortality signature with 30-day mortality using Kaplan Meier survival curves across all patients. The patient-level enrichment/expression scores were estimated for the Mortality signatures using GSVA that summarizes the expression of a gene set when compared to all other genes (i.e., genes not in the set) within each sample. Patients were separated into High (enrichment score ≥ 0; n = 60) and Low (enrichment score < 0; n = 64) signature enrichment categories to test their association with 30-day mortality. There was a significant association of high mortality signature enrichment with 30-day mortality (full mortality signature: log-rank p value = 0.00062; reduced mortality signature: log-rank p value = 0.0013) (Fig. 2b).

COVID-19 patients exhibited evidence of all-cause sepsis endotypes

Our group also recently described, in ER patients at first clinical presentation, five sepsis endotypes. Endotypes are distinct disease groupings characterized by unique pathobiological mechanisms and DE gene expression signatures12. Importantly, endotypes are hypothesized to accurately explain the heterogeneous nature of the sepsis syndrome in contrast to earlier paradigms considering sepsis as a single disease. The early sepsis endotypes included two poor prognosis endotypes, namely the Neutrophilic-Suppressive (NPS) and Inflammatory (INF) endotypes, that were respectively associated with neutrophil activation/immune suppression and increased pro-inflammatory responses. Three additional fair-prognosis endotypes were also uncovered, namely the Innate Host Defences (IHD), Interferon (IFN), and Adaptive (ADA) endotypes. To predict endotype status in other cohorts, five 200-gene endotype-specific signatures were derived. Evidence of sepsis endotypes among COVID-19 patients would further support the notion that severe COVID-19 disease is the same as, or similar to, all-cause sepsis.

Initially, to characterize endotypes in patient whole blood RNA-Seq data, we determined, using a data-driven approach, whether the gene expression data supported the partitioning of these patients into 5 discrete clusters. Cluster stability metrics inferred that 4–6 clusters could be reliably discerned among all patients analyzed together and also among patients separated into early and late sampled groups (Supplemental Fig. 5). We next sought to classify patients into one of the five previously-described all-cause sepsis endotypes. A better assessment of severity signatures was obtained by stratifying patients into the early and late cohort patients, which partially resolved gene expression heterogeneity due to sampling time relative to hospital admission. Accordingly, endotype status was separately predicted in early and late sampled patients (Fig. 3) by calculating a sample-wise gene-set enrichment score for each 200-gene endotype-specific signature using GSVA (v1.30.0). Patients were then assigned to the endotype for which they exhibited the highest expression/enrichment score; this is depicted in Fig. 3 as blocks of red (demonstrating enrichment) in the bottom five rows of each subfigure.

Figure 3
figure 3

Heatmap depicting GSVA endotype signature enrichment scores and endotype classification COVID-19 patients. (a) Endotype classification of early sampled patients. (b) Endotype classification of late sampled patients. Endotype enrichment scores are indicated in the bottom five rows in each subfigure with red blocks indicating positive enrichment. Various metrics are indicated in the top five bars in each subgraph including endotype assignment, patient severity (as assessed by respiratory support requirement at sampling), impending mortality, and ICU admission, as well as Early (top subfigure) or Late (bottom subfigure) sampling. In both the early and late groups, the NPS endotype displayed the worst patient severity and highest mortality rates.

Across both early and late sampled patients, endotype classification worked well to separate patients into five endotypes of similar size. In early sampled patients, these (predicted) endotypes positively correlated with assignment to severe respiratory support requirement group (p value = 0.0015), qSOFA (p value = 0.045), and Glasgow Coma Score (p value = 0.017) (Table 3; Supplemental Table 2). The predicted NPS patients were primarily comprised of patients belonging to the severe respiratory support group (77%; 13/17) and also had the highest mortality, with 29.4% (5/17) of patients succumbing. The INF endotype also displayed a high number of patients belonging to the severe respiratory support group (37.5%; 6/16) in addition to the highest qSOFA scores (recorded early in hospitalization). In early sampled patients, the clinical trends of the predicted NPS and INF endotypes were generally consistent with the all-cause sepsis discovery cohort12. However, a subgroup of more severely afflicted patients (higher severity and mortality) was also evident in the IFN and ADA endotypes, which differed from trends observed in the smaller all-cause sepsis discovery and validation cohorts examined previously12. For example, the patients predicted as part of the generally less severe ADA endotype, displayed high proportions of severe respiratory support patients (29.4%; 5/17).

Table 3 Clinical differences between predicted endotypes in the early sampled group.

In late sampled patients, the predicted NPS endotype patients again displayed the highest rates of mortality (60%) and ICU care (90%) (Supplemental Table 3). Beyond this observation, the patients assigned to other predicted endotypes showed less consistent clinical parameters relative to endotype assignment when compared to the early sampled patients (Table 3) and all-cause sepsis patients12, possibly reflecting the fact that these endotypes were originally predicted at first clinical presentation (ER prior to ICU admission and any diagnosis). This was most apparent in ADA patients, who showed very high proportions requiring ICU care and severe respiratory support. Across early and late sampled patients, we reasoned that these inconsistencies could be attributed to mechanistic/gene-expression influences that were additional to the endotype signatures in COVID-19 patients (for example, Covid-specific gene expression, COVID-19 specific endotype signatures and/or mixed endotypes).

Comparing clinical data between endotypes indicated that sepsis endotypes might be useful for clinical risk stratification in COVID-19 disease. We next examined global gene expression differences between predicted endotypes to characterize highly dysregulated endotype-specific genes. It is reasonable to hypothesize that these genes might have regulatory functions and could be targeted for endotype-specific therapies. We specifically focused on characterizing the NPS and INF-predicted endotypes of the early sampled group, since detection and treatment would be most impactful in these endotypes and disease stage. DE gene expression analysis was performed comparing each endotype to all others (i.e., in a one vs. rest scheme) to identify global gene expression differences between endotypes. The NPS endotype displayed 2728 (1352 upregulated, 1376 down-regulated) unique DE genes while the INF endotype had 573 (520 upregulated, 53 down-regulated genes) unique DE genes.

These differences were explored in the context of protein:protein interaction (PPI) networks, specifically drawing from the International Molecular Exchange Consortium (ImeX) data, which includes the InnateDB database33 (version 5.4) of manually curated immune interactions (Fig. 4). PPI networks reveal known direct-molecular/binding, metabolic or regulatory interactions between individual proteins, and thus represent function-based interactions in cells.

Figure 4
figure 4

Function-based, minimally connected first order protein:protein interaction networks of the top endotype specific DE genes. (a) NPS endotype network. (b) INF endotype network. The top 200 endotype-specific DE genes (by absolute value fold changes) were input into NetworkAnalyst54. Each DE gene set formed a well-connected minimum connected first-order network, indicating that the genes involved are functionally related and likely collectively regulate or play key roles in one or more related biological mechanisms. Red and green nodes are genes with increased and decreased expression specific to the endotype, respectively, while grey nodes are interconnecting first-order interaction nodes. The size of the nodes indicates their connectivity (hub degree) within the network, i.e., how well interconnected any given node is to other nodes in the Network (NB. hubs are key molecules in signalling since they are highly interconnected; they receive and integrate multiple signals and pass them on to downstream nodes). Lines represent edges that indicate known (experimentally-determined) protein:protein interactions derived from (version 5.4).

It was apparent that DE genes from the NPS endotype included both up- (red) and down- (green) regulated genes, consistent with the notion that the endotype captures concurrent neutrophil stimulation and immunosuppressive mechanisms (e.g., down-regulated IL-8, IL7R). Of most relevance was the identification of network hubs that indicate highly connected proteins/genes (shown by the size of the nodes/circles) and reflect their importance in the network as key signaling molecules that receive and integrate multiple biological signals. Importantly, these hubs reflect potential endotype-specific drug targets, which, when targeted, could disrupt the network structure and thus the endotype’s pathophysiological basis of severity. Hubs of the NPS endotype network included peroxisome proliferator-activated receptor gamma (PPARG), synuclein alpha (SNCA), suppressor of cytokine signaling 3 (SOCS3), and C-X-C motif chemokine ligand 8/interleukin-8 (CXCL8/IL-8), among others. The INF endotype DE genes were primarily comprised of up-regulated genes. Hubs of the INF network included forkhead box O3 (FOXO3), galectin 3 (LGALS3), transglutaminase 2 (TGM2), ELAV Like RNA Binding Protein 1 (ELAV-1), and interferon-stimulated gene 15 (ISG15). Many of these can be targeted by existing drugs as compiled in Gene-Cards ( and thus are addressable with repurposed drugs (Supplemental Table 4).