*********************************************************************************************************************************** Score and predict probabilities for reduced ejection fraction ***********************************************************************************************************************************; ** only keep the 34 predictors in the dataset to be scored. Analysts must ensure that the SAS names in their dataset match names below for each variable for the score codes to work; data to_be_scored; set input_data; keep male index_dx_out age dx_defibrillator hosp_chf rx_ace rx_antagonist rx_bblocker rx_digoxin rx_loop_diuretic rx_nitrates rx_thiazide dx_afib dx_anemia dx_cabg dx_cardiomyopathy dx_copd dx_depression dx_htn_nephropathy dx_hyperlipidemia dx_hypertension dx_hypotension dx_mi dx_obesity dx_oth_dysrhythmia dx_psychosis dx_rheumatic_heart dx_sleep_apnea dx_stable_angina dx_valve_disorder hf_systolic hf_diastolic hf_left hf_unspecified ; run; ** score using a code file; data Scores; set to_be_scored; %inc 'C:\Users\rjd48\Dropbox (Partners HealthCare)\Work\Since Apr 2015\CV projects\CHF\Bayer\Write ups\Algorithms paper\Revision\Round 2\Score codes.txt'; drop I_hf_category U_hf_category; ** these two variables classify patients based in the class with a higher (>0.5) predicted probability, we do not need that. Instead, we have selected a different cutoff to classify that maximizes prediction accuracy in the training dataset. We apply that below; run; data predicted_cat; set scores; predicted_cat='pEF'; if P_hf_category1 > 0.4686 then predicted_cat='rEF'; run; proc freq data=predicted_cat; table predicted_cat; run;