Introduction
The incidence of surgical cardiovascular disease is 238.9 per 100,000 population worldwide [1]. Cardiac surgeries are among the main methods for the treatment of cardiovascular diseases: approximately 1.5 million procedures are performed worldwide each year [2], more than 250,000 of which are performed in China [3]. Despite advances in medical technology, the perioperative mortality of such surgical interventions remains 1.8%–7.8%, depending on the type of surgery [4]. The identification of high risk patients and related risk factors may facilitate the establishment of individualized treatment, thus guiding clinical decision-making and improving patient prognosis.
In the past 30 years, many predictive models, based primarily on the use of large databases, have been established to screen preoperative risk factors and identify high risk patients. Unfortunately, the application of most of these models is limited by insufficient methodological rigor, lack of external verification, or poor external verification results. The most commonly used models worldwide are the European System for Cardiac Operative Risk Evaluation (EuroSCORE II) [5] and the American Society of Thoracic Surgeons (Society of Thoracic Surgeons Cardiac Surgery Risk Model, STS) [6–9]. The former model covers almost all types of cardiac surgery, whereas the latter model independently establishes different prediction algorithms for each type of surgery. In China, the most widely applied risk assessment score for coronary artery bypass grafting (CABG), Sino System for Coronary Operative Risk Evaluation (SinoSCORE), was established on the basis of the cardiovascular surgery registry [10]. By analyzing relevant literature, this review preliminarily compares the three risk prediction models (Table 1).
Comparison of Research Methods for Three Risk-Prediction Models.
EuroSCORE II | STS score | SinoSCORE | |
---|---|---|---|
Type of surgery | Major cardiac surgery | Seven types | CABG |
Study population | 22,381 | 965,063* | 9564 |
Number of centers involved | 154 | 819 | 43 |
Number of variables | 18 | Four categories† | 11 |
Primary outcome | Mortality at the base hospital | Operative mortality | In-hospital mortality |
Secondary outcomes | Mortality at 30 and 90 days | Renal failure, stroke, reoperation because of any cause prolonged ventilation, deep sternal wound infection, composite major morbidity or mortality, prolonged length of stay (>14 days), short length of stay (<6 days and alive) | None |
*Total of 774,881 coronary artery bypass grafting surgeries, 88,521 isolated valve surgeries, and 101,661 valve plus coronary artery bypass grafting surgeries. †Continuous variables, binary variables, categorical variables, interaction terms. Abbreviations: CABG (coronary artery bypass grafting).
EuroSCORE II
EuroSCORE II is an update to EuroSCORE [11] that was launched in 1999. The original EuroSCORE was derived from a European cardiac database including patients who had undergone cardiac surgery before the end of 1995, most of whom underwent CABG, and nearly one-third of whom underwent valve surgery [12]. The risk assessment system weighted 17 risk factors: nine patient-related, four derived from preoperative cardiac status, and four dependent on the timing and nature of the procedure. The total risk score was stratified to predict 30-day postoperative mortality on the basis of patient scores. EuroSCORE, despite having been widely accepted and used in Europe, North America, and Asia in the first few years of its development, tends to overestimate the risk of death in low risk patients while underestimating the actual risk of death in high risk patients [5]. EuroSCORE II, released in 2012, builds on the previous version, but its new dataset includes 22,381 patients from 154 centers in 43 countries in Europe. In this updated model version, certain prediction modalities have been reconstructed by adding, deleting, or redefining variables designed to estimate in-hospital mortality.
The new model is based on the assessment of the following risk factors: patient-related–NYHA, angina pectoris grade 4, insulin-dependent diabetes mellitus, age, sex, peripheral vascular disease, chronic obstructive pulmonary disease, reduced mobility (nerve or muscle dysfunction), renal insufficiency (substitute creatinine clearance for legacy creatinine), previous cardiac-related surgery, LV function, active endocarditis, recent myocardial infarction, pulmonary artery pressure, surgery-related emergencies and three surgical approaches (single non-CABG surgery, combination of two surgeries, and combination of three or more surgeries). These factors can be scored with an online tool (http://www.euroscore.org/calc) and have shown highly satisfactory results in both model calibration and discriminatory ability in the internal validation.
Application of EuroSCORE in Western Countries
Biancari et al. [13] have evaluated the performance of EuroSCORE in the prediction of postoperative mortality and adverse events in 1027 Finnish patients receiving CABG. This assessment model performed well in the prediction of death, postoperative dialysis, prolonged use of inotropes, and prolonged ICU stay, with AUC values of 0.852, 0.805, 0.748, and 0.793, respectively.
In another study, Garcia et al. [14] have tested the validity of EuroSCORE and EuroSCORE II in a Spanish population and found that the mortality predicted by EuroSCORE II was closer to the actual value. The actual observed value was 6.5%, and the value predicted by EuroSCORE II was 5.7% (AUC = 0.79, HL: P < 0.001), whereas the value predicted by EuroSCORE was 9.8% (AUC = 0.77, HL: P < 0.001). Thus, EuroSCORE II was considered an acceptable predictor of in-hospital mortality after cardiac surgery in the Spanish population, with a tendency to underestimate risk.
Singh et al. [15] have prospectively collected clinical data for 1666 patients receiving cardiac surgery, and grouped them according to the type of surgery for validation analysis of EuroSCORE II. EuroSCORE II showed strong discriminatory power and good calibration for 30-day mortality in the patients in the New Zealand patient cohort receiving CABG (AUC = 0.934, H–L: P = 0.848). Although the discriminatory ability and calibration were good for the overall cohort (AUC = 0.831, H-L: P = 0.317), the models had poor performance in discriminating among valve, thoracic aortic, and complex combined cardiac surgery.
Osnabrugge et al. [16] have evaluated the performance of EuroSCORE II in a large multicenter database in the United States and compared it with the STS score. The prediction differentiation of EuroSCORE II was poorer than that of the STS score, with AUC values of 0.77 and 0.81, respectively.
EuroSCORE II has been widely used in European countries to assess the risk of cardiac surgery and has important predictive value in various postoperative outcomes.
Application of EuroSCORE II in Eastern Countries
EuroSCORE II has also been extensively implemented in China. A multicenter retrospective [17] study in 11,170 patients from four medical centers (Beijing Fuwai Hospital, Shanghai Changhai Hospital, Shanghai Fudan University Zhongshan Hospital, and Guangzhou Chuangdong Cardiovascular Research Institute) was conducted from January 2008 to December 2011. Data from patients undergoing heart valve surgery were used to assess the predictive performance of EuroSCORE II. The updated model version showed better prediction accuracy: the actual mortality was 2.02%, whereas the predicted mortality was 2.62% (C-index = 0.72). In addition, a reasonable distinction was observed between postoperative ARDS (C index = 0.75) and prolonged ventilation time (C index = 0.70), but a poor distinction was found between acute renal failure (C index = 0.65) and prolonged ICU stay (C index = 0.66). Xinpei et al. have found that EuroSCORE II was not suitable for predicting prolonged ventilation time in a study of 110 patients receiving CABG; its AUC value of <0.6 might have been due to an insufficiently small sample size. Another study validating the performance of this predictive model in patients receiving CABG obtained good results in predicting in-hospital mortality (AUC = 0.762, H-L: P = 0.191, O/E = 1.24) [18]. On the basis of the study results, the researchers considered EuroSCORE II suitable for use in Chinese patients receiving CABG. However, Bai et al. [19] have found that the model underestimated the mortality of high risk patients receiving CABG (O/E = 1.58) and thus was not suitable for risk assessment in high risk populations.
EuroSCORE II has also been validated in other Eastern countries. A study in India [20] has indicated that the model underestimated mortality, but had better discriminatory ability and calibration than the previous version; its discriminatory power was good in all cardiac and valve surgeries combined with CABG, and it scored the best in valve surgery.
A Japanese study evaluating EuroSCORE II [21] has indicated poor discriminatory power in predicting death (AUC = 0.66) and underestimation of the risk of death in high risk patients (O/E = 1.44).
A validation study of EuroSCORE II in Iran [22] has also revealed poor prediction of mortality in local patients undergoing cardiac surgery, with poor discriminatory power and model fitting (AUC = 0.667, HL: P < 0.01).
Studies increasingly suggest that the predictive power of this updated model can be increased by considering other combinations of risk factors. In this regard, Li et al. [23] have found that an increase in cardiac troponin T, which specifically and accurately reflects myocardial injury, had a higher predictive ability than EuroSCORE II. Bai et al. [24] have argued that the composite SYNTAX anatomical score improved the predictive power of EuroSCORE II for mortality in patients receiving CABG. Therefore, in the future, more factors must be considered for inclusion in this predictive model to improve its present effectiveness and accuracy, and extend its existing capabilities.
The predictive effects of EuroSCORE II are generally good in European populations; however, in other regions, the predictive performance is weak. This model also has predictive value for other postoperative complications except postoperative death. Because EuroSCORE II did not perform risk stratification by population before modeling, it has insufficient potential for predicting mortality in high-risk groups and generally underestimates the risk.
STS Score
The STS score is a model based on the American Association of Thoracic Surgeons Adult Cardiac Surgery Database. According to the type of surgery, it is divided into seven models, including a simple CABG model, a simple valve surgery model, and a valve combined with CABG model [6–8]. This score is calculated online through a website (http://risk-calc.sts.org/STSWebRisk-Calc273/de.aspx). The model includes nine predicted endpoints: operative mortality, stroke, renal failure, prolonged ventilation, reoperation, deep sternal infection, a composite of these outcomes, and prolonged or shortened postoperative hospital stay. Compared with other models, the STS score is more targeted, and the prediction range included eight outcomes in addition to mortality.
Application of the STS Score in Western Countries
Since its establishment, the STS score has been widely implemented in cardiac surgery risk assessment worldwide.
Ad et al. [25], when evaluating the mortality risk of cardiac surgery patients in the United States, have confirmed that the predicted mortality of patients with the STS score was closest to the actual value when all scores was performed excellent. The actual mortality rate of the patients was 1.8%, and the predicted value of the STS score was 2.7%, whereas the scores predicted by EuroSCORE II and EuroSCORE were 3.3% and 7.8%, respectively.
The STS and EuroSCORE II have been established by Kirmani et al. [26] to have equivalent discriminatory ability in predicting mortality in the UK population.
In another investigation by Singh et al. [27] in 933 patients from New Zealand who received simple CABG, a validation analysis of STS scores indicated excellent discriminatory ability and calibration of the STS score (AUC = 0.921, HL: P = 0.294). These findings suggest that the STS score has good predictive value for postoperative 30-day mortality in patients in this region.
Balan et al. [28] have reported good predictive value of the model for 30-day mortality after aortic valve replacement and have shown the model’s suitability for assessing the risk of death in patients undergoing percutaneous aortic valve replacement. In a study performed by Duchnowski et al. [29], EuroSCORE II and the STS score had satisfactory differentiation in predicting 30-day and 1-year mortality in patients undergoing aortic valve replacement, with both AUC values exceeding 0.8.
Application of the STS Score in Eastern Countries
The effects of application of the STS score in China are not ideal. Fuwai et al. [30] have verified the adaptability of the STS score in China, and assessed the risk of in-hospital mortality in 9846 patients undergoing valve surgery (AUC = 0.734, HL: P = 0.58), and have found poor predictive effects for patients with multi-valve surgery and for patients overall. Wang et al. [31] have also analyzed the data for 4493 patients older than 16 years who had undergone single-valve surgery, and assessed their STS scores, to establish the incidence of cerebrovascular accidents, renal failure, prolonged ventilation time, reoperation, and postoperative hospital stay. Good calibration and differentiation were achieved for most major complications: cerebrovascular accident (AUC = 0.714, HL: P = 0.052), renal failure (AUC = 0.724, HL: P = 0.474), prolonged ventilation time (AUC = 0.717, HL: P = 0.468), and prolonged postoperative hospital stay (AUC = 0.713, HL: P = 0.712). The STS score was deemed suitable for evaluating the major postoperative complications in Chinese single-valve surgery patients, except for cases of reoperation and shortening of the prolonged postoperative period.
Zhang et al. [32] have collected data for 1333 patients who underwent heart valve surgery in Fuwai Hospital and used the STS score to predict the risk of prolonged ICU stay (≥124 h). The model was not well differentiated and calibrated (AUC = 0.70, HL: P ≤ 0.001). The researchers also collected the data for 1559 patients who underwent simple CABG to evaluate the predictive role of the STS score in terms of postoperative 30-day mortality, and found that the model had poor discriminatory power (AUC = 0.619, HL: P > 0.1) [33]. Ma et al. [34] have also confirmed that the STS score had unsatisfactory performance in predicting the 30-day mortality of patients receiving CABG in East China (AUC = 0.681, H–L: P > 0.05). Nevertheless, further research is needed to explore whether this score model is suitable for the assessment of Chinese patients receiving CABG.
Gao et al. [35] have established that the STS score had poor predictive performance for aortic and mitral valve replacement. In the aortic valve group, the actual mortality was 1.84%, whereas the predicted mortality was 0.98% (AUC = 0.600, HL: P < 0.05). In the mitral valve group, the observed mortality was 1.55%, and the predicted mortality was 1.32% (AUC = 0.650, HL: P > 0.05). In another study, Kuwaki et al. [21] have found that the STS model score overestimated the mortality of low risk patients (actual value = 1.8%, predicted value = 3.5%, O/E = 0.51) and thus was not suitable for evaluating the mortality risk of patients undergoing aortic valve replacement for aortic stenosis, in contrast to the results obtained in Western countries.
The applicability of the STS score in India is also poor. The results of an investigation by Borde et al. [36], assessing the performance of EuroSCORE II and STS score in India, have revealed that both models overestimated the risk of death after cardiac surgery, with AUC values less than 0.7.
The STS score has high application value in Western countries, particularly in the United States, but generally performs poorly in most Eastern countries, because of racial differences. In 2018, in response to continuous changes in patient characteristics, risk profiles, and surgical practices, STS developed a new set of risk models based on contemporary patient data [37, 38], in which the predicted endpoints were consistent with those in the older version. In the selection of variables, various preoperative potential risk factors and their interactions were fully considered. However, no external validation has been performed to enable international application of the new model.
SinoSCORE
SinoSCORE [10], established in 2010, is a risk assessment model based on the perioperative data for 9564 patients receiving CABG at 43 heart centers in China. This model implements logistic regression analysis and is accessible via an online assessment tool (http://www.cvs-China.com/sino.asp). The model is based on 11 predictors used to assess the risk of postoperative mortality in patients receiving CABG and is applied mainly in China.
In the first few years after its release, SinoSCORE showed good performance in predicting mortality and surgical complications after cardiac surgery in different Chinese populations.
A verification study has indicated that SinoSCORE was applicable to southwestern China, on the basis of data primarily from patients undergoing valve surgery [39]. The actual in-hospital mortality rate was 2.25%, and the predicted value was 2.35%. Although the risk of death was overestimated, the discriminatory ability and calibration were favorable (AUC = 0.75, H–L: P = 0.582).
Su et al. [40] have used SinoSCORE to predict postoperative in-hospital mortality and major complications after CABG, such as renal failure, multiple organ failure, and perioperative implantation of aortic balloon counterpulsation. The model had good predictive value, with AUC values of 0.81, 0.768, 0.832, and 0.737, respectively. The P-values of the Hosmer-Lemeshow goodness-of-fit test all exceeded 0.05.
Bai et al. [41] have obtained similar results indicating that SinoSCORE has good predictive value for complications, such as postoperative mortality, low cardiac output, cerebrovascular accident, multiple organ failure, tracheotomy, and insertion of an intra-aortic counterpulsation balloon. All AUC values were greater than 0.7, thereby indicating SinoSCORE’s suitability for patients receiving CABG at that center.
Recent studies have found that the clinical predictive power of SinoSCORE has gradually weakened with changes in the population’s health status, medical technology level, and distribution of surgical types. Li et al. [42] have divided patients receiving CABG into four groups according to the SinoSCORE prediction of mortality: group I (≤0), group II (<2%), group III (2%–5%), and group IV (≥5%). SinoSCORE was found to underestimate the mortality of the patients in group I, but to overestimate the mortality of the other three groups and the overall patient mortality. After all patients were grouped according to whether patients undergoing CABG were combined with other surgeries, and their data were analyzed, all AUC values were less than 0.7, thus indicating SinoSCORE’s poor discriminatory power. Lin et al. [43] have applied SinoSCORE to evaluate the in-hospital mortality of 1976 patients receiving CABG with preoperative heart failure, and have found poor performance, with an AUC of 0.698; the actual mortality rate was 1.41, and the predicted mortality rate was 7.66.
Wang et al. [44] have combined SinoSCORE with preoperative serum preprotein, which is associated with patient nutritional status, to predict the mortality of patients receiving CABG, and have found that the AUC increased by 0.091, showing a statistical difference with respect to SinoSCORE alone.
The decline in the predictive power of SinoSCORE suggests that the risk factors must be updated, and a new predictive model must be established for more regional populations, considering various risk levels and different surgical types.
In 2020, Hu et al. [45] developed and validated a new predictive model suitable for in-hospital all-cause mortality of patients receiving CABG by using the database of the Chinese Cardiac Surgery Registry. This model established 16 independent variables including 21 risk factors, and achieved good differentiation with a C index of 0.79 in the training set and 0.78 in the internal validation set. The model included 56,775 patients in the database who underwent CABG, including combined surgery, with a predicted endpoint of in-hospital death from January 2013 to December 2016. The model was based on data from 2013 to 2015 and validated with data from 2016. The following predictive variables were finally determined: age, NYHA, sex, history of previous MI (21 days before surgery), preoperative critical state, renal function, chronic obstructive pulmonary disease, angina, left ventricular ejection fraction, non-elective surgery, combined valve surgery, combined other surgery, history of cerebrovascular accident, and history of previous PCI. Compared with SinoSCORE, this risk model is not confined to isolated CABG, and the final predictive variables have been increased to 16, thus increasing the applicability of the model to a larger range of populations. However, the model requires further external validation before it can be used in wider clinical practice.
Comparison of EuroSCORE, STS Score, and SinoSCORE in a Chinese Population
Zhou et al. [46] have compared the predictive performance of the EuroSCORE II, STS score, and SinoSCORE models for postoperative clinical outcomes of patients undergoing cardiac valve surgery in Heilongjiang Province. The actual in-hospital mortality rate was 4.1%, the prediction result of EuroSCORE II model was 3.30% (AUC = 0.803, HL: P = 0.005), the prediction result of STS was 1.35% (AUC = 0.657, HL: P = 0.381), and the prediction result of SinoSCORE was 5.6% (AUC = 0.812, HL: P = 0.161). Among the three models, SinoSCORE had the best predictive discrimination for post-cardiac surgery death, prolonged hospital stay, prolonged mechanical ventilation, and renal failure in this region. The STS model had the worst predictive discrimination, whereas all models had poor predictive discrimination for reoperation.
Zhang et al. [47] have collected clinical data for 1047 patients with pure CABG in the Jiangsu area and evaluated the performance of the three models in predicting postoperative mortality. The predicted value of EuroSCORE II was the closest to the actual value, and the calibration evaluation of the three models was good. In terms of discrimination, EuroSCORE II and the STS score performed better (both AUC > 0.75) than SinoSCORE (AUC = 0.70). Ma et al. [34] have compared the accuracy of these three models in predicting death in 1616 patients receiving CABG in East China, and found that SinoSCORE achieved excellent discrimination (AUC = 0.888), followed by the STS risk assessment system (AUC = 0.844) and EuroSCORE II (AUC 0.814).
Another study [48] has reported that in older patients receiving CABG (age ≥70 years) in East China, all three models had poor mortality prediction performance. Nevertheless, the three models underestimated the mortality rates, although SinoSCORE predictive efficiency was better than EuroSCORE II and STS risk evaluation systems in patients age over 70. The actual mortality value in our study was 2.52%, whereas the expected mortality values of SinoSCORE, EuroSCORE II, and STS for the entire cohort were 0.78%, 1.43%, and 0.78%, respectively. The application of the three models is insufficient for risk prediction among the older population in China, and further research is needed to establish a predictive model suitable for this type of high risk population with poor cardiac prognosis (Table 2).
Comparison of Studies Examining Predictive Ability for Mortality in Cardiac Surgery.
Author | Population | Region | Conclusion | AUC | ||
---|---|---|---|---|---|---|
SinoSCORE | STS score | EuroSCORE II | ||||
Ma X/2017 [34] | CABG 1616 | East China | SinoSCORE achieves the best discrimination. | 0.888 | 0.844 | 0.814 |
Zhou Y/2020 [46] | cardiac valve surgery 1011 | Heilongjiang Province | SinoSCORE performs better than STS score and EuroSCORE II. | 0.812 | 0.657 | 0.803 |
Zhang WR/2014 [47] | CABG 1047 | Jiangsu Province | The predicted value of EuroSCORE II is closest to the actual value. | 0.734 | 0.759 | 0.763 |
Shan L/2018 [48] | CABG 1946 | East China | SinoSCORE achieves better predictive efficiency than the other models. | 0.829 | 0.790 | 0.769 |
Abbreviations: AUC (area under the receiver operating characteristic curve); CABG (coronary artery bypass grafting).
Several studies have noted that many factors have not been incorporated into the existing models. Some researchers believe that frailty is a potential predictor of death and complications after cardiac surgery, and preoperative frailty assessment is important [49–51]. Polineni et al. [52] have shown that biomarkers such as NT-ProBNP can be used to identify patients at higher risk of in-hospital mortality after cardiac surgery. Khan et al. [53] have found that hyponatremia was a previously overlooked risk factor for poor outcomes after cardiac surgery. Risk factors associated with cardiac surgery should be fully considered in future model establishment to achieve better predictive effects.
Summary
To date, the most widely used predictive models worldwide are EuroSCORE II and STS score, and the model most widely used domestically is SinoSCORE. In contrast to the other two models, the SinoSCORE data came from China; therefore, this model has shown better predictive performance in various domestic studies, although its performance has been gradually weakening (see Figure 1). The reasons for the decrease in its performance are as follows. SinoSCORE was established more than 10 years ago, and the modeling objects only include patients with coronary artery bypass surgery whose data scale is small. In addition, during this period, the health status of patients undergoing cardiac surgery, the distribution of surgery types, and the level of medical technology have considerably changed.
According to the survey results, the volume of valve operations in cardiac surgery in China has been increasing for many years and considerably exceeds that of CABG [3]. Tse et al. [54] have shown that TR proportionality (the ratio of the pre-procedural effective regurgitant orifice area to the right ventricular end-diastolic area) outperforms guideline-based classification of TR severity in outcome prediction, and adds incremental prognostic value to current risk-predictive models. Therefore, the existing models currently cannot meet the needs for risk assessment of cardiac surgery in China. Therefore, potential risk factors before surgery must be fully considered, to establish a comprehensive and effective risk assessment model to effectively guide clinical practice.