Team Members:

  • Loon Hai Qi
  • Nurul Afiqah Binte Rashid
  • Vincent Yeo Yong Kwang

Problem Statement

Given the first 48 hours of ICU patients' data, predict per patient their:

    1. In-hospital mortality (1: will die, 0: will not die)
    1. Length of stay (days in hospital)

So as to evaluate the severity of patients and priortise them for optimal resources allocation to maximise monitoring and support to those who need them more.

Static Data that are collected at the time the patient is admitted to the ICU:

  • RecordID (a unique integer for each ICU stay)
  • Age (years)
  • Gender (0: female, or 1: male)
  • Height (cm)
  • ICUType (1: Coronary Care Unit, 2: Cardiac Surgery Recovery Unit, 3: Medical ICU, or 4: Surgical ICU)
  • Weight (kg)*.
In [1]:
# OS folder management
import os
from datetime import datetime 

import warnings
warnings.filterwarnings('ignore')

# Basic Python dataframe packages
import pandas as pd
import numpy as np

# Visualization packages
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

import seaborn as sns
sns.set(font_scale=2)

import math
from statistics import mean

1. Data Exploration & Visualization

Data Consolidation

Below the data is extracted from Project_Data folder. Within this folder, the data is separated into 4 folds.

  • Fold1 Folder
  • Fold2 Folder
  • Fold3 Folder
  • Fold4 Folder
  • Fold1_Outcomes.csv
  • Fold2_Outcomes.csv
  • Fold3_Outcomes.csv
  • Fold4_Outcomes.csv
In [2]:
data_dir = os.path.join(os.getcwd(), "Project_Data")
os.listdir(data_dir)
Out[2]:
['Fold1',
 'Fold1_Outcomes.csv',
 'Fold2',
 'Fold2_Outcomes.csv',
 'Fold3',
 'Fold3_Outcomes.csv',
 'Fold4',
 'Fold4_Outcomes.csv']

Below, the data from each file is extracted and collated into these variables:

  • dirpaths - to collect all directory pathnames
  • outcomes - to collect all csv files
  • folds - to collect all txt.files in each Folds' folder (i.e. Fold1, Fold2, Fold3, Fold4), which contains the patients' data
In [3]:
isFirst = True
dirpaths = []
outcomes = [] # collect csv files
folds = {} # collect lists of txt.files

for (dirpath, dirnames, filenames) in os.walk(data_dir):
    dirpaths.append(dirpath)
    if isFirst:
        outcomes = filenames
        outcomes.sort()
        dirnames.sort()
        folds = {key:[] for key in dirnames}
        isFirst = False
    
    else:
        folds[dirpath[-5:]] = filenames

# print(dirpaths)
# print("Number of outcome files: ", outcomes)

for key, value in folds.items():
    print("Number of patients for", key, 'is', len(value))
Number of patients for Fold1 is 1000
Number of patients for Fold2 is 1000
Number of patients for Fold3 is 1000
Number of patients for Fold4 is 1000

Below, the data from each file is extracted and collated into these variables:

  • static_variables => List of all static variables
  • temporal_variables => List of all temporal variables
In [4]:
# define static and temporal variables
static_variables = ['RecordID', 'Age', 'Gender', 'Height', 'ICUType', 'Weight']
print("Number of static variables: " + str(len(static_variables)))

temporal_variables = {'Albumin': 'Serum Albumin (g/dL)',
    'ALP': 'Alkaline phosphatase (IU/L)',
    'ALT': 'Alanine transaminase (IU/L)',
    'AST': 'Aspartate transaminase (IU/L)',
    'Bilirubin': 'Bilirubin (mg/dL)',
    'BUN': 'Blood urea nitrogen (mg/dL)',
    'Cholesterol': 'Cholesterol (mg/dL)',
    'Creatinine': 'Serum creatinine (mg/dL)',
    'DiasABP': 'Invasive diastolic arterial blood pressure (mmHg)',
    'FiO2': 'Fractional inspired O2 (0-1)',
    'GCS': 'Glasgow Coma Score (3-15)',
    'Glucose': 'Serum glucose (mg/dL)',
    'HCO3': 'Serum bicarbonate (mmol/L)',
    'HCT': 'Hematocrit (%)',
    'HR': 'Heart rate (bpm)',
    'K': 'Serum potassium (mEq/L)',
    'Lactate': 'Lactate (mmol/L)',
    'Mg': 'Serum magnesium (mmol/L)',
    'MAP': 'Invasive mean arterial blood pressure (mmHg)',
    'MechVent': 'Mechanical ventilation respiration (0:false or 1:true)',
    'Na': 'Serum sodium (mEq/L)',
    'NIDiasABP': 'Non-invasive diastolic arterial blood pressure (mmHg)',
    'NIMAP': 'Non-invasive mean arterial blood pressure (mmHg)',
    'NISysABP': 'Non-invasive systolic arterial blood pressure (mmHg)',
    'PaCO2': 'partial pressure of arterial CO2 (mmHg)',
    'PaO2': 'Partial pressure of arterial O2 (mmHg)',
    'pH': 'Arterial pH (0-14)',
    'Platelets': 'Platelets (cells/nL)',
    'RespRate': 'Respiration rate (bpm)',
    'SaO2': 'O2 saturation in hemoglobin (%)',
    'SysABP': 'Invasive systolic arterial blood pressure (mmHg)',
    'Temp': 'Temperature (°C)',
    'TroponinI': 'Troponin-I (μg/L)',
    'TroponinT': 'Troponin-T (μg/L)',
    'Urine': 'Urine output (mL)',
    'WBC': 'White blood cell count (cells/nL)',
    'Weight': 'Weight (kg)'}

print("Number of temporal variables: " + str(len(temporal_variables.keys())))
Number of static variables: 6
Number of temporal variables: 37
In [5]:
# function to extract a patient's data from txt file and store as dataframe in long form
def extractPatientRecords(temp_dir):    
    data = pd.read_csv(temp_dir, sep=",")
    
    patient_id = temp_dir[-10:].split('.')[0] #assume recordID is 6 digits
    
    return (patient_id, data)
In [6]:
# assume patient is unique:
# Extract all patients records from txt files 

all_patients = {} # {patient id: raw dataframe from txt.file}
cv_fold = {} # {fold# : list of all patients'id}

t1 = datetime.now()
print("Started Data Extraction at", t1.strftime('%Y-%m-%d %H:%M:%S'))

for path in dirpaths[1:]:
    # get last 4 chracters from path
    key = path[-5:]
    print("\nData Extraction on", key, "has begun")
    cv_fold[key] = []
    
    # get list from folds
    patients_files = folds[key]
   
    # iterate one file at a time 
    for filename in patients_files:
        patient_id, data = extractPatientRecords(os.path.join(path,filename))        
        if data.empty:
            print("Empty Dataframe found at " + filename)
            continue
        else:
            cv_fold[key].append(patient_id)
            all_patients[patient_id] = data
        
    t2 = datetime.now()
    diff = t2-t1
    print(key, "is done.\nTook about", diff.seconds, "seconds from start")

t2 = datetime.now()
diff = t2-t1
print("\nEntire Data Extraction completed after", diff.seconds, "seconds from start")

print("\nTotal number of folds is", len(cv_fold))
print("Fold1:", len(cv_fold['Fold1']))
print("Fold2:", len(cv_fold['Fold2']))
print("Fold3:", len(cv_fold['Fold3']))
print("Fold4:", len(cv_fold['Fold4']))
print("Total number of patients is", len(all_patients))
Started Data Extraction at 2019-11-01 14:01:28

Data Extraction on Fold1 has begun
Fold1 is done.
Took about 5 seconds from start

Data Extraction on Fold2 has begun
Fold2 is done.
Took about 8 seconds from start

Data Extraction on Fold3 has begun
Fold3 is done.
Took about 11 seconds from start

Data Extraction on Fold4 has begun
Fold4 is done.
Took about 14 seconds from start

Entire Data Extraction completed after 14 seconds from start

Total number of folds is 4
Fold1: 1000
Fold2: 1000
Fold3: 1000
Fold4: 1000
Total number of patients is 4000
In [7]:
#  extract static data frame of a patient
def getPatientStaticData(data):
    static_df = data.loc[data['Time'] == '00:00', :]
    static_df = static_df.loc[data['Parameter'].isin(static_variables)]
    del static_df['Time']
    static_df = static_df.transpose().reset_index(drop=True)
    
    header = static_df.iloc[0] 
    static_df = static_df[1:] 
    static_df.columns = header
    return static_df
In [8]:
# get patients' static dataframe(s) in 4 folds - for training and testing
all_static_dfs_folds = {}

for key, ids_list in cv_fold.items():
    
    all_static_dfs_folds[key] = pd.DataFrame()
    print(key, "has started extracting static data")
    for patient_id in ids_list:
        all_static_dfs_folds[key] = all_static_dfs_folds[key].append(getPatientStaticData(all_patients[patient_id]), ignore_index=True)

    print(key, "has completed\n")

print(len(all_static_dfs_folds), "folds of patients' static data has been extracted")
Fold1 has started extracting static data
Fold1 has completed

Fold2 has started extracting static data
Fold2 has completed

Fold3 has started extracting static data
Fold3 has completed

Fold4 has started extracting static data
Fold4 has completed

4 folds of patients' static data has been extracted
In [9]:
# get patients' static dataframe in one dataframe - for data exploration

all_static_dfs = pd.DataFrame() 

for key, ids_list in cv_fold.items():
    
    print(key, "has started extracting static data")
    for patient_id in ids_list:
        all_static_dfs = all_static_dfs.append(getPatientStaticData(all_patients[patient_id]), ignore_index=True)

    print(key, "has completed\n")

print(len(all_static_dfs), "folds of patients' static data has been extracted")
Fold1 has started extracting static data
Fold1 has completed

Fold2 has started extracting static data
Fold2 has completed

Fold3 has started extracting static data
Fold3 has completed

Fold4 has started extracting static data
Fold4 has completed

4000 folds of patients' static data has been extracted
In [10]:
# last non-nan value of each column.
def findMostRecent(data):
    if data.last_valid_index() is None:
        return np.nan
    else:
        return data[data.last_valid_index()] 
In [11]:
def findEarliest(data):
    if data.first_valid_index() is None:
        return np.nan
    else:
        return data[data.first_valid_index()]
In [12]:
# return a patient's temporal data in a single row given a patient's dataframe of table
# for data exploration
# frequency, most recent, earliest, overall_changes of temporal data
def extractPatientTemporalData(data, aggregate_by="freq"):
    # get Record ID and Overwrite it
    record_id = data.loc[data['Time'] == '00:00', :]['Value'][0]
    age = data.loc[data['Time'] == '00:00', :]['Value'][1] 
    gender = data.loc[data['Time'] == '00:00', :]['Value'][2] 
    height = data.loc[data['Time'] == '00:00', :]['Value'][3] 
    icu_type = data.loc[data['Time'] == '00:00', :]['Value'][4] 
    weight = data.loc[data['Time'] == '00:00', :]['Value'][5]
    
    if aggregate_by == "freq":

        data_1 = data[['Parameter', 'Value']].groupby(['Parameter']).count()
        data_1 = data_1.T
        data_1['RecordID'] = record_id
        data_1['Age'] = age
        data_1['Gender'] = gender
        data_1['Height'] = height
        data_1['ICUType'] = icu_type
        data_1['Weight'] = weight

        return data_1
    
    else:
        data_1 = data[['Time', 'Parameter', 'Value']].groupby(['Time', 'Parameter']).median().reset_index()
        data_pivot = data_1.pivot(index='Time', columns='Parameter', values='Value')
        
        if aggregate_by == "most_recent":
            return data_pivot.apply(findMostRecent, axis=0)
    
        if aggregate_by == "earliest":
            return data_pivot.apply(findEarliest, axis=0)
        
        # difference between the most recent and the earliest
        if aggregate_by == "overall_changes":
            data_pivot = data_pivot.apply(findMostRecent, axis=0) - data_pivot.apply(findEarliest, axis=0)
            data_pivot['RecordID'] = record_id
            data_pivot['Age'] = age
            data_pivot['Gender'] = gender
            data_pivot['Height'] = height
            data_pivot['ICUType'] = icu_type
            data_pivot['Weight'] = weight
            return data_pivot
In [13]:
# Collate all observations into 4 dataframe of patients data of 4 folds - for training and testing
def getAllTemporalDataFrameByAggregationTypeInFolds(cv_fold, all_patients, aggregator="freq"):
    
    all_temporal_dfs_folds = {}
    
    for key, ids_list in cv_fold.items():
        
        all_temporal_dfs_folds[key] = pd.DataFrame()
        print(key, "has started extracting temporal data")
        
        for patient_id in ids_list:
            all_temporal_dfs_folds[key] = all_temporal_dfs_folds[key].append(extractPatientTemporalData(all_patients[patient_id], aggregator), ignore_index=True)

        print(key, "has completed\n")

    print(len(all_temporal_dfs_folds), "folds of patients' Temporals data has been extracted with aggregator", aggregator)
    
    return all_temporal_dfs_folds
In [14]:
# Collate all observations into 1 dataframe of patients data - for data exploration
def getAllTemporalDataFrameByAggregationType(cv_fold, all_patients, aggregator="freq"):
    
    dataframe = pd.DataFrame()
    
    for key, ids_list in cv_fold.items():

        print(key, "has started extracting temporal data")

        for patient_id in ids_list:
            dataframe = dataframe.append(extractPatientTemporalData(all_patients[patient_id], aggregator), ignore_index=True)

        print(key, "has completed\n")

    print(len(dataframe), "patients' Temporals data has been extracted with aggregator", aggregator)
    
    return dataframe

Data Structures

Storage of temporal variables for 2 different aspects of the project:

  1. For model building and evaluation - in 4 folds
  2. Data exploration - 4 folds consolidated into one dataframe

Extracting Frequency of Monitored Temporal Variables

Assumption:

  • Higher frequency of temporal variables may indicate higher significance of that variable for the patients' health condition in length of stay and mortality rate.
  • Higher frequency may indicate lower likelihood of missing data.
In [15]:
# in 4 folds - frequency of monitoring temporal variables
all_temporal_dfs_folds__freq = getAllTemporalDataFrameByAggregationTypeInFolds(cv_fold, all_patients, "freq")
all_temporal_dfs_folds__freq
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4 folds of patients' Temporals data has been extracted with aggregator freq
Out[15]:
{'Fold1':      ALP  ALT  AST   Age  Albumin  BUN  Bilirubin  Cholesterol  Creatinine  \
 0    NaN  NaN  NaN  54.0      NaN  2.0        NaN          NaN         2.0   
 1    NaN  NaN  NaN  76.0      NaN  3.0        NaN          NaN         3.0   
 2    2.0  2.0  2.0  44.0      2.0  3.0        2.0          NaN         3.0   
 3    1.0  1.0  1.0  68.0      1.0  3.0        1.0          NaN         3.0   
 4    NaN  NaN  NaN  88.0      1.0  2.0        NaN          NaN         2.0   
 5    1.0  2.0  2.0  64.0      NaN  4.0        1.0          1.0         4.0   
 6    NaN  NaN  NaN  68.0      NaN  4.0        NaN          NaN         4.0   
 7    1.0  1.0  1.0  78.0      1.0  5.0        1.0          NaN         5.0   
 8    NaN  NaN  NaN  64.0      NaN  2.0        NaN          NaN         2.0   
 9    NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 10   1.0  1.0  1.0  64.0      1.0  3.0        1.0          NaN         3.0   
 11   NaN  NaN  NaN  71.0      NaN  2.0        NaN          NaN         2.0   
 12   NaN  NaN  NaN  66.0      NaN  2.0        NaN          NaN         2.0   
 13   1.0  1.0  1.0  84.0      NaN  3.0        1.0          NaN         3.0   
 14   NaN  NaN  1.0  77.0      1.0  2.0        NaN          NaN         2.0   
 15   NaN  NaN  NaN  78.0      NaN  3.0        NaN          NaN         3.0   
 16   NaN  NaN  NaN  65.0      NaN  9.0        NaN          NaN        15.0   
 17   NaN  NaN  NaN  84.0      1.0  4.0        NaN          NaN         4.0   
 18   2.0  2.0  2.0  78.0      2.0  7.0        2.0          1.0         7.0   
 19   NaN  NaN  NaN  40.0      NaN  3.0        NaN          NaN         3.0   
 20   3.0  3.0  3.0  48.0      1.0  2.0        3.0          NaN         2.0   
 21   NaN  NaN  NaN  58.0      NaN  2.0        NaN          NaN         3.0   
 22   NaN  NaN  NaN  81.0      NaN  2.0        NaN          NaN         2.0   
 23   NaN  NaN  NaN  35.0      NaN  8.0        NaN          NaN         8.0   
 24   NaN  NaN  NaN  26.0      NaN  4.0        NaN          NaN         4.0   
 25   NaN  NaN  NaN  66.0      NaN  2.0        NaN          NaN         2.0   
 26   NaN  NaN  NaN  80.0      NaN  3.0        NaN          NaN         3.0   
 27   3.0  3.0  3.0  53.0      1.0  4.0        3.0          NaN         4.0   
 28   NaN  NaN  NaN  74.0      NaN  4.0        NaN          NaN         4.0   
 29   NaN  NaN  NaN  80.0      NaN  2.0        NaN          NaN         2.0   
 ..   ...  ...  ...   ...      ...  ...        ...          ...         ...   
 970  1.0  1.0  1.0  59.0      1.0  4.0        1.0          NaN         4.0   
 971  NaN  NaN  NaN  80.0      NaN  3.0        NaN          NaN         3.0   
 972  2.0  2.0  2.0  81.0      2.0  5.0        2.0          1.0         5.0   
 973  2.0  2.0  2.0  43.0      1.0  3.0        1.0          2.0         3.0   
 974  1.0  1.0  1.0  69.0      1.0  6.0        1.0          NaN         6.0   
 975  NaN  NaN  NaN  84.0      1.0  5.0        NaN          NaN         5.0   
 976  1.0  1.0  1.0  60.0      1.0  3.0        1.0          NaN         3.0   
 977  NaN  NaN  NaN  82.0      NaN  4.0        NaN          NaN         4.0   
 978  NaN  NaN  NaN  83.0      NaN  4.0        NaN          NaN         4.0   
 979  NaN  NaN  NaN  80.0      NaN  3.0        NaN          NaN         3.0   
 980  2.0  2.0  2.0  84.0      1.0  6.0        1.0          NaN         6.0   
 981  NaN  NaN  NaN  71.0      NaN  NaN        NaN          NaN         NaN   
 982  NaN  NaN  NaN  89.0      1.0  3.0        NaN          NaN         3.0   
 983  1.0  1.0  1.0  65.0      NaN  3.0        1.0          NaN         3.0   
 984  1.0  1.0  1.0  69.0      1.0  3.0        1.0          1.0         3.0   
 985  NaN  NaN  NaN  50.0      NaN  3.0        NaN          NaN         3.0   
 986  NaN  NaN  NaN  82.0      NaN  4.0        NaN          NaN         4.0   
 987  1.0  1.0  1.0  59.0      1.0  7.0        1.0          NaN         7.0   
 988  NaN  NaN  NaN  19.0      NaN  3.0        NaN          NaN         3.0   
 989  1.0  1.0  1.0  79.0      1.0  2.0        1.0          NaN         2.0   
 990  2.0  2.0  2.0  84.0      1.0  4.0        2.0          NaN         4.0   
 991  NaN  NaN  NaN  66.0      NaN  NaN        NaN          NaN         NaN   
 992  NaN  NaN  NaN  84.0      NaN  NaN        NaN          NaN         NaN   
 993  NaN  NaN  NaN  90.0      NaN  3.0        NaN          NaN         3.0   
 994  NaN  NaN  NaN  71.0      NaN  3.0        NaN          NaN         3.0   
 995  1.0  1.0  1.0  35.0      1.0  2.0        1.0          NaN         2.0   
 996  NaN  NaN  NaN  73.0      NaN  3.0        NaN          NaN         3.0   
 997  NaN  NaN  NaN  81.0      NaN  6.0        NaN          NaN         6.0   
 998  NaN  NaN  NaN  63.0      NaN  3.0        NaN          NaN         3.0   
 999  NaN  NaN  NaN  82.0      NaN  2.0        NaN          NaN         2.0   
 
      DiasABP  ...  RespRate  SaO2  SysABP  Temp  TroponinI  TroponinT  Urine  \
 0        NaN  ...      42.0   NaN     NaN  14.0        NaN        NaN   38.0   
 1       68.0  ...       NaN   6.0    68.0  46.0        NaN        NaN   41.0   
 2       16.0  ...       NaN   1.0    16.0  14.0        NaN        NaN   41.0   
 3        NaN  ...      59.0   NaN     NaN  13.0        NaN        NaN    6.0   
 4        NaN  ...      48.0   NaN     NaN  15.0        NaN        NaN   38.0   
 5       45.0  ...       NaN   5.0    45.0   9.0        1.0        NaN   30.0   
 6       54.0  ...      60.0   NaN    54.0  13.0        2.0        NaN   34.0   
 7       60.0  ...       NaN   5.0    60.0  36.0        2.0        NaN   37.0   
 8        NaN  ...      46.0   NaN     NaN  10.0        NaN        NaN    5.0   
 9       56.0  ...       NaN   3.0    56.0  30.0        NaN        NaN   47.0   
 10       NaN  ...      64.0   NaN     NaN  24.0        NaN        NaN   35.0   
 11      48.0  ...       NaN   4.0    48.0  51.0        NaN        NaN   47.0   
 12       NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   32.0   
 13       NaN  ...      40.0   1.0     NaN  13.0        NaN        NaN   28.0   
 14       NaN  ...      55.0   NaN     NaN  17.0        1.0        NaN   30.0   
 15      71.0  ...       NaN  13.0    71.0  50.0        NaN        NaN   48.0   
 16      46.0  ...      48.0   2.0    46.0  14.0        NaN        NaN   32.0   
 17       NaN  ...      54.0   NaN     NaN  15.0        NaN        NaN   34.0   
 18      45.0  ...       NaN   NaN    45.0  19.0        NaN        3.0   35.0   
 19      57.0  ...       NaN   3.0    57.0  15.0        NaN        NaN   43.0   
 20       NaN  ...      38.0   NaN     NaN   8.0        NaN        NaN    NaN   
 21      63.0  ...       NaN   4.0    63.0  42.0        NaN        NaN   48.0   
 22       NaN  ...      51.0   NaN     NaN  12.0        NaN        2.0   36.0   
 23       NaN  ...      42.0   NaN     NaN  11.0        NaN        4.0   36.0   
 24       NaN  ...       NaN   NaN     NaN   NaN        NaN        NaN    NaN   
 25       NaN  ...      53.0   NaN     NaN  11.0        3.0        NaN    NaN   
 26      71.0  ...       NaN   6.0    71.0  17.0        NaN        NaN   40.0   
 27      52.0  ...       NaN   1.0    52.0  13.0        NaN        1.0   46.0   
 28      65.0  ...       NaN   8.0    65.0  32.0        NaN        NaN   45.0   
 29       NaN  ...       NaN   NaN     NaN  10.0        NaN        NaN   11.0   
 ..       ...  ...       ...   ...     ...   ...        ...        ...    ...   
 970     29.0  ...       NaN   4.0    29.0  33.0        NaN        NaN   13.0   
 971      NaN  ...       NaN   NaN     NaN  12.0        NaN        1.0   45.0   
 972     43.0  ...       NaN   1.0    43.0   9.0        NaN        NaN   27.0   
 973     25.0  ...       NaN  10.0    25.0   8.0        NaN        NaN   23.0   
 974     53.0  ...       NaN   2.0    53.0   9.0        NaN        1.0   47.0   
 975     42.0  ...       NaN   NaN    42.0  13.0        NaN        NaN   40.0   
 976     65.0  ...       NaN   3.0    65.0  13.0        NaN        3.0   26.0   
 977     58.0  ...       NaN   6.0    58.0  62.0        NaN        NaN   48.0   
 978     76.0  ...       NaN   3.0    76.0  49.0        NaN        1.0   49.0   
 979      NaN  ...      61.0   NaN     NaN  15.0        NaN        2.0   19.0   
 980     51.0  ...       NaN   7.0    51.0  11.0        3.0        NaN   28.0   
 981     56.0  ...       NaN   NaN    56.0  57.0        NaN        NaN   36.0   
 982     49.0  ...       NaN   NaN    49.0  11.0        NaN        NaN   34.0   
 983     19.0  ...       NaN   3.0    19.0  17.0        NaN        2.0   44.0   
 984     60.0  ...       NaN   NaN    60.0  19.0        2.0        NaN   45.0   
 985     50.0  ...       NaN   NaN    50.0  13.0        NaN        NaN   35.0   
 986     40.0  ...      56.0   2.0    40.0  11.0        NaN        NaN   37.0   
 987     53.0  ...       NaN   NaN    53.0  21.0        NaN        NaN   36.0   
 988     45.0  ...       NaN   2.0    45.0  11.0        NaN        NaN   43.0   
 989     59.0  ...       NaN   NaN    59.0  21.0        NaN        NaN   35.0   
 990      NaN  ...      53.0   NaN     NaN  13.0        NaN        3.0   26.0   
 991      NaN  ...      35.0   NaN     NaN   9.0        NaN        NaN    8.0   
 992     50.0  ...       NaN   NaN    50.0  13.0        NaN        NaN   33.0   
 993      NaN  ...       NaN   NaN     NaN   NaN        NaN        NaN    NaN   
 994     13.0  ...       NaN   NaN    13.0  11.0        NaN        NaN   18.0   
 995      NaN  ...      51.0   NaN     NaN  12.0        NaN        3.0   38.0   
 996     20.0  ...       NaN   NaN    20.0   8.0        NaN        NaN   17.0   
 997     26.0  ...      53.0   1.0    26.0  12.0        NaN        NaN   39.0   
 998     46.0  ...       NaN   4.0    46.0  11.0        NaN        NaN   47.0   
 999     71.0  ...       NaN   4.0    71.0  48.0        NaN        NaN   45.0   
 
      WBC  Weight    pH  
 0    2.0    -1.0   NaN  
 1    3.0    76.0   8.0  
 2    3.0    56.7   4.0  
 3    3.0    84.6   NaN  
 4    2.0    -1.0   NaN  
 5    4.0   114.0   7.0  
 6    3.0     3.0   NaN  
 7    3.0   111.0  15.0  
 8    1.0    60.7   NaN  
 9    3.0    66.1  10.0  
 10   4.0    65.0   NaN  
 11   2.0    56.0   6.0  
 12   2.0    84.5   NaN  
 13   3.0   102.6   NaN  
 14   2.0    90.1   NaN  
 15   4.0    63.0  15.0  
 16   2.0    66.3   2.0  
 17   5.0    82.5   NaN  
 18   8.0    72.8  11.0  
 19   2.0    84.7  11.0  
 20   4.0    42.3   NaN  
 21   3.0    98.0  13.0  
 22   2.0    63.7   NaN  
 23   3.0    71.8   NaN  
 24   3.0    -1.0   3.0  
 25   2.0    82.0   NaN  
 26   2.0    60.0   7.0  
 27   4.0    73.5   7.0  
 28   3.0    75.9  13.0  
 29   2.0    70.0   2.0  
 ..   ...     ...   ...  
 970  3.0    81.6   9.0  
 971  2.0    80.0   NaN  
 972  5.0    84.0   1.0  
 973  3.0   113.0   7.0  
 974  4.0    75.0  14.0  
 975  5.0    82.3   6.0  
 976  3.0   107.0   8.0  
 977  5.0    64.7  10.0  
 978  3.0    70.0   4.0  
 979  3.0    62.6   NaN  
 980  5.0    -1.0   7.0  
 981  NaN    79.0   NaN  
 982  2.0    55.3   2.0  
 983  6.0    77.0   9.0  
 984  3.0    63.7  15.0  
 985  3.0    54.0   3.0  
 986  2.0    66.0   3.0  
 987  5.0    95.0   5.0  
 988  3.0    87.5   8.0  
 989  3.0    55.5   6.0  
 990  5.0    73.0   NaN  
 991  NaN    -1.0   NaN  
 992  NaN    60.0   5.0  
 993  5.0    -1.0   NaN  
 994  3.0    61.0   1.0  
 995  3.0    39.8   NaN  
 996  3.0   104.0   1.0  
 997  4.0    95.4   5.0  
 998  2.0    70.5  12.0  
 999  2.0    70.4   6.0  
 
 [1000 rows x 42 columns],
 'Fold2':      ALP  ALT  AST   Age  Albumin  BUN  Bilirubin  Cholesterol  Creatinine  \
 0    1.0  1.0  1.0  56.0      NaN  4.0        1.0          NaN         4.0   
 1    NaN  NaN  NaN  72.0      2.0  6.0        NaN          NaN         6.0   
 2    NaN  NaN  NaN  68.0      NaN  3.0        NaN          NaN         3.0   
 3    NaN  NaN  NaN  77.0      NaN  3.0        NaN          NaN         3.0   
 4    NaN  NaN  NaN  66.0      NaN  1.0        NaN          NaN         1.0   
 5    1.0  1.0  1.0  35.0      1.0  2.0        1.0          NaN         2.0   
 6    NaN  NaN  NaN  79.0      NaN  2.0        NaN          NaN         2.0   
 7    2.0  3.0  3.0  44.0      NaN  4.0        1.0          NaN         4.0   
 8    NaN  NaN  NaN  21.0      NaN  3.0        NaN          NaN         3.0   
 9    4.0  4.0  4.0  71.0      2.0  6.0        4.0          NaN         6.0   
 10   2.0  2.0  2.0  90.0      2.0  5.0        2.0          1.0         5.0   
 11   NaN  NaN  NaN  53.0      NaN  4.0        NaN          NaN         4.0   
 12   NaN  NaN  NaN  70.0      NaN  5.0        1.0          NaN         5.0   
 13   NaN  NaN  NaN  70.0      NaN  2.0        NaN          NaN         2.0   
 14   NaN  NaN  NaN  47.0      NaN  2.0        1.0          NaN         2.0   
 15   NaN  NaN  NaN  47.0      NaN  3.0        NaN          NaN         3.0   
 16   NaN  NaN  NaN  57.0      NaN  2.0        NaN          1.0         2.0   
 17   2.0  2.0  2.0  88.0      2.0  3.0        2.0          NaN         3.0   
 18   NaN  1.0  1.0  90.0      1.0  2.0        1.0          NaN         2.0   
 19   1.0  1.0  1.0  68.0      2.0  3.0        1.0          NaN         3.0   
 20   NaN  NaN  NaN  51.0      NaN  1.0        NaN          NaN         1.0   
 21   1.0  1.0  1.0  52.0      1.0  3.0        1.0          NaN         3.0   
 22   1.0  1.0  1.0  49.0      NaN  2.0        1.0          NaN         2.0   
 23   NaN  NaN  NaN  66.0      NaN  2.0        NaN          NaN         2.0   
 24   NaN  NaN  NaN  78.0      NaN  2.0        NaN          NaN         2.0   
 25   2.0  2.0  2.0  45.0      2.0  3.0        2.0          NaN         3.0   
 26   1.0  1.0  1.0  90.0      1.0  1.0        1.0          NaN         1.0   
 27   NaN  NaN  NaN  83.0      NaN  3.0        NaN          NaN         3.0   
 28   2.0  2.0  2.0  51.0      2.0  3.0        2.0          NaN         3.0   
 29   2.0  2.0  2.0  49.0      2.0  4.0        2.0          NaN         4.0   
 ..   ...  ...  ...   ...      ...  ...        ...          ...         ...   
 970  1.0  1.0  1.0  85.0      1.0  2.0        2.0          NaN         2.0   
 971  NaN  NaN  NaN  60.0      NaN  2.0        NaN          NaN         2.0   
 972  1.0  1.0  1.0  23.0      1.0  5.0        1.0          NaN         5.0   
 973  NaN  NaN  NaN  63.0      NaN  2.0        NaN          NaN         2.0   
 974  NaN  NaN  NaN  65.0      NaN  2.0        NaN          1.0         2.0   
 975  1.0  1.0  1.0  74.0      1.0  1.0        1.0          NaN         1.0   
 976  NaN  NaN  NaN  24.0      NaN  4.0        NaN          NaN         4.0   
 977  NaN  NaN  NaN  32.0      NaN  2.0        NaN          NaN         2.0   
 978  1.0  1.0  1.0  40.0      NaN  2.0        1.0          NaN         2.0   
 979  NaN  NaN  NaN  43.0      1.0  2.0        NaN          NaN         2.0   
 980  1.0  1.0  1.0  79.0      1.0  2.0        1.0          NaN         2.0   
 981  NaN  NaN  NaN  48.0      NaN  2.0        NaN          NaN         2.0   
 982  NaN  NaN  NaN  41.0      NaN  2.0        NaN          NaN         2.0   
 983  1.0  1.0  1.0  81.0      1.0  6.0        1.0          NaN         6.0   
 984  NaN  NaN  NaN  67.0      NaN  2.0        NaN          NaN         2.0   
 985  NaN  NaN  NaN  55.0      NaN  2.0        NaN          NaN         2.0   
 986  NaN  NaN  NaN  22.0      NaN  5.0        NaN          NaN         5.0   
 987  3.0  3.0  3.0  80.0      2.0  8.0        3.0          NaN         8.0   
 988  4.0  4.0  4.0  90.0      2.0  4.0        4.0          NaN         4.0   
 989  3.0  3.0  3.0  65.0      3.0  6.0        3.0          NaN         6.0   
 990  NaN  NaN  NaN  63.0      NaN  3.0        NaN          1.0         3.0   
 991  3.0  3.0  3.0  63.0      1.0  3.0        3.0          NaN         3.0   
 992  NaN  NaN  NaN  64.0      NaN  4.0        NaN          NaN         4.0   
 993  1.0  1.0  1.0  40.0      2.0  6.0        2.0          NaN         6.0   
 994  NaN  NaN  NaN  80.0      1.0  2.0        NaN          NaN         2.0   
 995  1.0  1.0  1.0  87.0      1.0  3.0        1.0          1.0         3.0   
 996  1.0  1.0  1.0  90.0      NaN  4.0        NaN          NaN         4.0   
 997  NaN  NaN  NaN  79.0      NaN  3.0        NaN          NaN         3.0   
 998  NaN  NaN  NaN  88.0      NaN  2.0        NaN          NaN         2.0   
 999  NaN  NaN  NaN  61.0      NaN  3.0        NaN          NaN         3.0   
 
      DiasABP  ...  RespRate  SaO2  SysABP  Temp  TroponinI  TroponinT  Urine  \
 0       28.0  ...       NaN   NaN    28.0  12.0        NaN        NaN   26.0   
 1       44.0  ...       NaN   NaN    44.0  16.0        NaN        NaN   47.0   
 2       49.0  ...       NaN   NaN    49.0  18.0        NaN        4.0   50.0   
 3       52.0  ...       NaN   1.0    52.0  65.0        NaN        NaN   46.0   
 4       41.0  ...       NaN   4.0    41.0  46.0        NaN        NaN   43.0   
 5        NaN  ...      29.0   NaN     NaN   9.0        NaN        NaN    9.0   
 6        NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   11.0   
 7       59.0  ...       NaN   1.0    59.0  41.0        NaN        NaN   37.0   
 8       24.0  ...       NaN   NaN    24.0  16.0        NaN        NaN   31.0   
 9       59.0  ...       NaN  19.0    59.0  31.0        NaN        NaN   43.0   
 10       4.0  ...      51.0   NaN     4.0  12.0        NaN        4.0   44.0   
 11       NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   14.0   
 12      34.0  ...       NaN  10.0    35.0  15.0        NaN        4.0   34.0   
 13      67.0  ...       NaN   2.0    67.0  64.0        NaN        NaN   44.0   
 14       NaN  ...       NaN   NaN     NaN  14.0        NaN        NaN   25.0   
 15      56.0  ...       NaN   2.0    56.0  41.0        NaN        NaN   42.0   
 16       NaN  ...      17.0   NaN     NaN  10.0        NaN        NaN   16.0   
 17      43.0  ...       NaN   NaN    43.0  14.0        NaN        NaN   43.0   
 18       NaN  ...      56.0   NaN     NaN  13.0        NaN        3.0   32.0   
 19       NaN  ...       NaN   NaN     NaN  11.0        NaN        NaN   31.0   
 20       3.0  ...      45.0   NaN     3.0   7.0        NaN        NaN    8.0   
 21       6.0  ...       NaN   NaN     6.0  16.0        NaN        NaN   42.0   
 22       NaN  ...       NaN   NaN     NaN   NaN        NaN        NaN    NaN   
 23       NaN  ...       NaN   2.0     NaN  11.0        NaN        NaN   37.0   
 24      46.0  ...       NaN   5.0    46.0  12.0        NaN        NaN   41.0   
 25      71.0  ...       NaN  10.0    71.0  22.0        NaN        NaN   43.0   
 26       NaN  ...      34.0   NaN     NaN   9.0        NaN        1.0    7.0   
 27       NaN  ...      45.0   NaN     NaN   8.0        NaN        2.0    7.0   
 28      32.0  ...       NaN   2.0    32.0  15.0        NaN        NaN   30.0   
 29       NaN  ...       NaN   NaN     NaN  29.0        NaN        1.0   40.0   
 ..       ...  ...       ...   ...     ...   ...        ...        ...    ...   
 970      NaN  ...      44.0   NaN     NaN  18.0        NaN        NaN   31.0   
 971     43.0  ...       NaN   1.0    43.0  39.0        NaN        NaN   31.0   
 972     64.0  ...       NaN   NaN    64.0  22.0        NaN        NaN   49.0   
 973     33.0  ...       NaN   1.0    33.0  34.0        NaN        NaN   43.0   
 974     15.0  ...       NaN   NaN    15.0  11.0        NaN        NaN   21.0   
 975      NaN  ...      57.0   NaN     NaN  10.0        NaN        NaN   30.0   
 976      NaN  ...      29.0   NaN     NaN  12.0        NaN        NaN   25.0   
 977      NaN  ...      34.0   NaN     NaN  11.0        NaN        NaN   15.0   
 978     63.0  ...      74.0   1.0    63.0   9.0        NaN        NaN    8.0   
 979     51.0  ...       NaN   1.0    51.0  14.0        NaN        2.0   46.0   
 980     54.0  ...       NaN   1.0    54.0  12.0        NaN        NaN   27.0   
 981     24.0  ...       NaN   NaN    24.0  31.0        NaN        NaN   43.0   
 982      NaN  ...      27.0   NaN     NaN  11.0        NaN        NaN   23.0   
 983     67.0  ...       NaN   4.0    67.0  66.0        NaN        1.0   45.0   
 984     54.0  ...       NaN   NaN    54.0  17.0        NaN        3.0   29.0   
 985     73.0  ...       NaN   4.0    73.0  22.0        NaN        NaN   46.0   
 986     61.0  ...       NaN   NaN    61.0  21.0        NaN        NaN   50.0   
 987     74.0  ...       NaN   3.0    74.0  36.0        NaN        2.0   40.0   
 988      NaN  ...       NaN   NaN     NaN  11.0        NaN        7.0   46.0   
 989     42.0  ...       NaN   3.0    42.0  11.0        NaN        4.0   41.0   
 990     47.0  ...       NaN   8.0    47.0  50.0        2.0        NaN   39.0   
 991      NaN  ...      46.0   NaN     NaN  18.0        NaN        NaN   32.0   
 992      NaN  ...      39.0   NaN     NaN  10.0        NaN        NaN   25.0   
 993     65.0  ...       NaN   1.0    65.0  21.0        NaN        NaN   42.0   
 994     45.0  ...      45.0   NaN    45.0  12.0        NaN        1.0   41.0   
 995     59.0  ...       NaN   NaN    59.0  12.0        NaN        5.0   45.0   
 996      NaN  ...      70.0   NaN     NaN  25.0        NaN        NaN   37.0   
 997     77.0  ...       NaN   9.0    77.0  57.0        NaN        NaN   42.0   
 998     99.0  ...     113.0   NaN    99.0  11.0        NaN        NaN   44.0   
 999      NaN  ...       NaN   NaN     NaN  11.0        NaN        NaN   19.0   
 
       WBC  Weight    pH  
 0     3.0  110.00   4.0  
 1     3.0  220.00  14.0  
 2     4.0  100.00   4.0  
 3     4.0   77.50  15.0  
 4     NaN   65.00   9.0  
 5     2.0   -1.00   NaN  
 6     2.0   81.70   NaN  
 7     7.0   70.00  15.0  
 8     3.0   84.00   3.0  
 9     9.0  119.00  27.0  
 10    6.0   55.50   NaN  
 11    3.0    1.30   3.0  
 12    6.0   75.60   9.0  
 13    2.0   91.60  10.0  
 14    4.0   56.00   NaN  
 15    2.0   75.30   9.0  
 16    2.0   70.20   NaN  
 17    3.0   73.80   4.0  
 18    2.0   56.40   NaN  
 19    3.0  135.20   NaN  
 20    2.0   75.20   4.0  
 21    3.0   71.00   1.0  
 22    4.0   -1.00   NaN  
 23    2.0   96.60   4.0  
 24    2.0  103.00   5.0  
 25    3.0  110.00  17.0  
 26    1.0  128.60   NaN  
 27    3.0   80.00   NaN  
 28    3.0   73.40   6.0  
 29    2.0   88.60   1.0  
 ..    ...     ...   ...  
 970   2.0   -1.00   NaN  
 971   3.0   91.57   8.0  
 972   4.0   -1.00  11.0  
 973   2.0  139.00   7.0  
 974   2.0   70.80   7.0  
 975   1.0   59.20   NaN  
 976   3.0  131.90   1.0  
 977   2.0  102.00   NaN  
 978   2.0   72.00   1.0  
 979   2.0   80.00   6.0  
 980   2.0   88.60   6.0  
 981   2.0    2.00   4.0  
 982   2.0  143.80   NaN  
 983  10.0   70.00   9.0  
 984   2.0    3.00   4.0  
 985   3.0   79.30  11.0  
 986   3.0   90.00  12.0  
 987  10.0   71.00  15.0  
 988   5.0   65.00   7.0  
 989   5.0   62.90   6.0  
 990   4.0   98.00  14.0  
 991   3.0   78.00   NaN  
 992   3.0   -1.00   NaN  
 993  11.0   71.00   7.0  
 994   2.0   -1.00   NaN  
 995   2.0   39.20   8.0  
 996   3.0   83.30   NaN  
 997   3.0   65.80  14.0  
 998   2.0   59.00   1.0  
 999   4.0   72.10   NaN  
 
 [1000 rows x 42 columns],
 'Fold3':      ALP  ALT  AST   Age  Albumin  BUN  Bilirubin  Cholesterol  Creatinine  \
 0    NaN  NaN  NaN  57.0      NaN  4.0        NaN          NaN         4.0   
 1    NaN  NaN  NaN  87.0      NaN  3.0        NaN          NaN         3.0   
 2    1.0  1.0  1.0  73.0      NaN  4.0        1.0          NaN         4.0   
 3    NaN  NaN  NaN  72.0      NaN  3.0        NaN          NaN         3.0   
 4    NaN  NaN  NaN  76.0      NaN  2.0        NaN          NaN         2.0   
 5    1.0  1.0  1.0  59.0      1.0  4.0        1.0          NaN         4.0   
 6    1.0  1.0  1.0  76.0      1.0  4.0        1.0          NaN         4.0   
 7    NaN  NaN  NaN  43.0      NaN  3.0        NaN          NaN         3.0   
 8    NaN  NaN  NaN  60.0      NaN  4.0        NaN          NaN         4.0   
 9    NaN  NaN  NaN  60.0      1.0  5.0        NaN          NaN         6.0   
 10   NaN  NaN  NaN  60.0      NaN  1.0        NaN          NaN         1.0   
 11   1.0  1.0  1.0  69.0      1.0  6.0        1.0          NaN         6.0   
 12   NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 13   1.0  1.0  1.0  78.0      1.0  4.0        1.0          NaN         4.0   
 14   NaN  NaN  NaN  82.0      NaN  3.0        NaN          NaN         3.0   
 15   NaN  NaN  NaN  24.0      NaN  2.0        NaN          NaN         2.0   
 16   1.0  1.0  1.0  87.0      1.0  3.0        1.0          NaN         3.0   
 17   NaN  NaN  NaN  90.0      NaN  3.0        NaN          NaN         3.0   
 18   1.0  1.0  1.0  68.0      NaN  3.0        1.0          NaN         3.0   
 19   1.0  1.0  1.0  72.0      1.0  3.0        1.0          NaN         3.0   
 20   NaN  NaN  NaN  81.0      NaN  3.0        NaN          NaN         3.0   
 21   NaN  NaN  NaN  75.0      NaN  3.0        NaN          NaN         3.0   
 22   NaN  NaN  NaN  70.0      NaN  3.0        NaN          NaN         3.0   
 23   1.0  1.0  1.0  77.0      NaN  4.0        1.0          NaN         4.0   
 24   1.0  1.0  1.0  72.0      1.0  3.0        1.0          NaN         3.0   
 25   NaN  NaN  NaN  68.0      NaN  2.0        NaN          NaN         2.0   
 26   1.0  1.0  1.0  63.0      1.0  4.0        1.0          NaN         4.0   
 27   3.0  3.0  3.0  68.0      1.0  4.0        3.0          NaN         4.0   
 28   NaN  NaN  NaN  76.0      NaN  2.0        NaN          NaN         3.0   
 29   NaN  NaN  NaN  49.0      NaN  2.0        NaN          NaN         2.0   
 ..   ...  ...  ...   ...      ...  ...        ...          ...         ...   
 970  1.0  1.0  1.0  68.0      NaN  3.0        1.0          NaN         3.0   
 971  NaN  NaN  NaN  70.0      NaN  3.0        NaN          NaN         3.0   
 972  NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 973  1.0  1.0  1.0  79.0      1.0  2.0        1.0          NaN         2.0   
 974  NaN  NaN  NaN  55.0      NaN  3.0        NaN          NaN         3.0   
 975  1.0  1.0  1.0  48.0      1.0  3.0        1.0          NaN         3.0   
 976  NaN  NaN  NaN  74.0      1.0  3.0        1.0          1.0         3.0   
 977  NaN  NaN  NaN  69.0      NaN  7.0        NaN          NaN         8.0   
 978  NaN  NaN  NaN  79.0      NaN  2.0        NaN          NaN         2.0   
 979  NaN  NaN  NaN  25.0      NaN  3.0        NaN          NaN         3.0   
 980  NaN  1.0  1.0  76.0      2.0  4.0        NaN          1.0         4.0   
 981  NaN  NaN  NaN  73.0      NaN  5.0        NaN          NaN         5.0   
 982  NaN  NaN  NaN  90.0      NaN  2.0        NaN          NaN         2.0   
 983  2.0  2.0  2.0  51.0      2.0  2.0        2.0          NaN         2.0   
 984  NaN  NaN  NaN  75.0      NaN  8.0        NaN          NaN         8.0   
 985  2.0  2.0  2.0  57.0      2.0  4.0        2.0          NaN         4.0   
 986  3.0  3.0  3.0  37.0      NaN  3.0        3.0          NaN         3.0   
 987  NaN  NaN  NaN  35.0      NaN  3.0        NaN          NaN         3.0   
 988  1.0  1.0  1.0  26.0      NaN  3.0        1.0          NaN         3.0   
 989  NaN  NaN  NaN  84.0      1.0  2.0        NaN          NaN         2.0   
 990  NaN  NaN  NaN  76.0      NaN  1.0        NaN          NaN         1.0   
 991  1.0  1.0  1.0  90.0      1.0  5.0        1.0          NaN         5.0   
 992  NaN  NaN  NaN  59.0      NaN  6.0        NaN          NaN         6.0   
 993  NaN  NaN  NaN  55.0      NaN  2.0        NaN          NaN         2.0   
 994  4.0  4.0  4.0  66.0      2.0  4.0        4.0          NaN         4.0   
 995  NaN  NaN  NaN  63.0      NaN  2.0        NaN          NaN         2.0   
 996  2.0  2.0  2.0  26.0      2.0  3.0        2.0          NaN         3.0   
 997  NaN  NaN  NaN  78.0      NaN  3.0        NaN          NaN         3.0   
 998  NaN  NaN  NaN  77.0      NaN  3.0        NaN          NaN         3.0   
 999  NaN  NaN  NaN  38.0      NaN  3.0        NaN          NaN         3.0   
 
      DiasABP  ...  RespRate  SaO2  SysABP  Temp  TroponinI  TroponinT  Urine  \
 0       73.0  ...       NaN   8.0    73.0  63.0        NaN        NaN   45.0   
 1        2.0  ...      41.0   NaN     2.0  13.0        NaN        NaN   31.0   
 2       67.0  ...       NaN  16.0    67.0  64.0        NaN        NaN   49.0   
 3        NaN  ...       NaN   NaN     NaN  13.0        NaN        NaN    NaN   
 4       62.0  ...       NaN   3.0    62.0  62.0        NaN        NaN   38.0   
 5       50.0  ...       NaN  11.0    50.0  12.0        NaN        3.0   42.0   
 6       41.0  ...       NaN   2.0    41.0  14.0        NaN        NaN   36.0   
 7        NaN  ...      48.0   NaN     NaN  15.0        NaN        NaN   45.0   
 8        NaN  ...      45.0   NaN     NaN  16.0        NaN        NaN   10.0   
 9       78.0  ...       NaN   NaN    78.0  16.0        NaN        NaN   39.0   
 10      45.0  ...      48.0   NaN    45.0  13.0        NaN        NaN    7.0   
 11       NaN  ...       NaN   1.0     NaN  12.0        NaN        2.0   34.0   
 12      58.0  ...       NaN   8.0    58.0  57.0        NaN        NaN   46.0   
 13      41.0  ...       NaN   NaN    41.0  12.0        NaN        2.0    NaN   
 14      59.0  ...      59.0   3.0    59.0  11.0        NaN        NaN   49.0   
 15       NaN  ...      10.0   NaN     NaN   4.0        NaN        NaN   14.0   
 16       NaN  ...      48.0   NaN     NaN  13.0        NaN        3.0   30.0   
 17      72.0  ...       NaN  10.0    72.0  43.0        NaN        NaN   35.0   
 18      41.0  ...       NaN   5.0    41.0  52.0        NaN        NaN   50.0   
 19      47.0  ...       NaN   NaN    47.0  12.0        NaN        1.0   31.0   
 20       NaN  ...      46.0   NaN     NaN  12.0        NaN        2.0   38.0   
 21      63.0  ...       NaN   4.0    63.0  41.0        NaN        NaN   41.0   
 22      22.0  ...       NaN   1.0    22.0  10.0        NaN        NaN   32.0   
 23      69.0  ...       NaN   9.0    69.0  68.0        NaN        3.0   41.0   
 24      65.0  ...      66.0   NaN    65.0  14.0        NaN        NaN   41.0   
 25      81.0  ...       NaN   3.0    81.0  77.0        NaN        NaN   48.0   
 26      61.0  ...       NaN   8.0    61.0  11.0        NaN        NaN   36.0   
 27      53.0  ...       NaN   NaN    53.0  13.0        NaN        NaN   32.0   
 28      56.0  ...       NaN   7.0    56.0  57.0        2.0        NaN   45.0   
 29      63.0  ...       NaN   3.0    63.0  42.0        NaN        NaN   44.0   
 ..       ...  ...       ...   ...     ...   ...        ...        ...    ...   
 970     55.0  ...       NaN   2.0    55.0  13.0        3.0        NaN   42.0   
 971     55.0  ...      55.0   4.0    55.0  50.0        NaN        NaN   38.0   
 972     56.0  ...       NaN   1.0    56.0  47.0        NaN        NaN   51.0   
 973      NaN  ...      59.0   NaN     NaN  12.0        NaN        NaN   39.0   
 974     66.0  ...       NaN  12.0    66.0  45.0        NaN        NaN   51.0   
 975     46.0  ...       NaN   NaN    46.0  16.0        NaN        NaN   33.0   
 976      NaN  ...       NaN   NaN     NaN  12.0        NaN        4.0   28.0   
 977     61.0  ...       NaN   NaN    61.0  13.0        NaN        3.0   37.0   
 978      NaN  ...      52.0   1.0     NaN  47.0        NaN        NaN   40.0   
 979      NaN  ...      51.0   NaN     NaN  13.0        NaN        NaN   30.0   
 980     57.0  ...       NaN   NaN    57.0  12.0        NaN        2.0   40.0   
 981      NaN  ...       NaN   NaN     NaN  13.0        NaN        NaN   24.0   
 982      NaN  ...      38.0   NaN     NaN  10.0        NaN        NaN   21.0   
 983      NaN  ...      47.0   NaN     NaN  10.0        NaN        NaN   34.0   
 984     48.0  ...       NaN   NaN    48.0  18.0        NaN        3.0   45.0   
 985     69.0  ...       NaN  10.0    69.0  15.0        NaN        NaN   32.0   
 986      NaN  ...      19.0   NaN     NaN  11.0        NaN        NaN    6.0   
 987     52.0  ...       NaN   NaN    52.0  14.0        NaN        NaN   47.0   
 988      5.0  ...       NaN   NaN     5.0  10.0        NaN        NaN    9.0   
 989     35.0  ...       NaN   3.0    35.0  23.0        NaN        NaN   30.0   
 990     67.0  ...       NaN   1.0    67.0  66.0        NaN        NaN   49.0   
 991      NaN  ...       NaN   NaN     NaN  13.0        NaN        NaN   44.0   
 992     23.0  ...       NaN   NaN    23.0   9.0        NaN        NaN   29.0   
 993      NaN  ...      56.0   NaN     NaN  12.0        NaN        NaN   39.0   
 994     63.0  ...       NaN   5.0    63.0  14.0        NaN        NaN   41.0   
 995     66.0  ...       NaN   5.0    66.0   9.0        NaN        NaN   41.0   
 996     14.0  ...       NaN   3.0    14.0  12.0        NaN        3.0   25.0   
 997     69.0  ...       NaN  13.0    69.0  38.0        NaN        NaN   47.0   
 998     15.0  ...       NaN   3.0    15.0  12.0        NaN        NaN   41.0   
 999     80.0  ...       NaN   6.0    80.0  14.0        NaN        NaN   43.0   
 
      WBC  Weight    pH  
 0    4.0   88.60  17.0  
 1    3.0   72.50   NaN  
 2    5.0   77.27  25.0  
 3    3.0  131.80   1.0  
 4    2.0   86.10   8.0  
 5    4.0   58.00  15.0  
 6    2.0  114.20   5.0  
 7    2.0   96.40   NaN  
 8    4.0   63.00   NaN  
 9    3.0   74.10   4.0  
 10   1.0   78.50   NaN  
 11   5.0   86.80   4.0  
 12   4.0   76.60  12.0  
 13   3.0   59.00   6.0  
 14   3.0   51.00   5.0  
 15   2.0   88.90   NaN  
 16   3.0   75.00   NaN  
 17   9.0   70.00  28.0  
 18   5.0   85.00  12.0  
 19   3.0  152.40   7.0  
 20   2.0   79.90   NaN  
 21   3.0  121.40  10.0  
 22   3.0   71.50   2.0  
 23   4.0  108.00  30.0  
 24   3.0   72.10   NaN  
 25   1.0   70.00  12.0  
 26   4.0  100.00  12.0  
 27   3.0  104.70   8.0  
 28   2.0   82.50   7.0  
 29   2.0   52.70  11.0  
 ..   ...     ...   ...  
 970  3.0   72.70   9.0  
 971  3.0  100.00   5.0  
 972  3.0  104.00  12.0  
 973  2.0   52.10   NaN  
 974  3.0  108.70  16.0  
 975  3.0   -1.00   2.0  
 976  2.0   69.50   5.0  
 977  5.0   55.00   6.0  
 978  2.0   60.00   3.0  
 979  2.0   71.00   NaN  
 980  2.0  128.80   7.0  
 981  2.0   79.00   NaN  
 982  2.0   -1.00   NaN  
 983  2.0   50.20   NaN  
 984  7.0   67.30   9.0  
 985  4.0   64.00  18.0  
 986  2.0   -1.00   NaN  
 987  3.0   81.00  11.0  
 988  3.0   64.20   4.0  
 989  2.0   61.00   4.0  
 990  4.0   79.30  10.0  
 991  4.0   51.10   2.0  
 992  4.0   79.40   7.0  
 993  2.0   -1.00   NaN  
 994  5.0   95.50  15.0  
 995  1.0   96.80   7.0  
 996  3.0   75.00   6.0  
 997  4.0  101.80  20.0  
 998  3.0   65.20   3.0  
 999  4.0   67.80  11.0  
 
 [1000 rows x 42 columns],
 'Fold4':      ALP  ALT  AST   Age  Albumin  BUN  Bilirubin  Cholesterol  Creatinine  \
 0    NaN  NaN  NaN  39.0      NaN  2.0        NaN          NaN         2.0   
 1    NaN  NaN  NaN  70.0      1.0  2.0        NaN          NaN         2.0   
 2    NaN  NaN  NaN  61.0      NaN  3.0        NaN          NaN         3.0   
 3    3.0  3.0  3.0  64.0      1.0  3.0        3.0          NaN         3.0   
 4    2.0  2.0  2.0  45.0      NaN  3.0        2.0          NaN         3.0   
 5    2.0  2.0  2.0  77.0      NaN  2.0        2.0          NaN         2.0   
 6    1.0  1.0  1.0  90.0      NaN  2.0        1.0          1.0         2.0   
 7    NaN  NaN  NaN  66.0      NaN  3.0        NaN          NaN         3.0   
 8    NaN  NaN  NaN  54.0      NaN  2.0        NaN          NaN         2.0   
 9    NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 10   1.0  1.0  1.0  73.0      1.0  8.0        1.0          NaN         8.0   
 11   NaN  NaN  NaN  62.0      NaN  5.0        NaN          NaN         5.0   
 12   NaN  NaN  NaN  56.0      1.0  3.0        1.0          1.0         3.0   
 13   NaN  NaN  NaN  57.0      NaN  2.0        NaN          NaN         2.0   
 14   NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 15   NaN  NaN  NaN  74.0      NaN  3.0        NaN          NaN         3.0   
 16   2.0  2.0  2.0  67.0      1.0  2.0        2.0          1.0         2.0   
 17   NaN  NaN  NaN  49.0      NaN  3.0        NaN          1.0         3.0   
 18   3.0  3.0  3.0  58.0      1.0  5.0        3.0          NaN         5.0   
 19   NaN  NaN  NaN  72.0      NaN  9.0        NaN          NaN         9.0   
 20   NaN  NaN  NaN  79.0      NaN  3.0        NaN          NaN         3.0   
 21   NaN  NaN  NaN  82.0      NaN  3.0        NaN          NaN         3.0   
 22   NaN  NaN  NaN  45.0      NaN  4.0        NaN          NaN         4.0   
 23   NaN  NaN  NaN  68.0      NaN  2.0        NaN          NaN         2.0   
 24   NaN  NaN  NaN  59.0      NaN  2.0        NaN          NaN         2.0   
 25   NaN  NaN  NaN  24.0      NaN  3.0        NaN          NaN         3.0   
 26   3.0  3.0  3.0  52.0      3.0  5.0        3.0          NaN         5.0   
 27   NaN  NaN  NaN  52.0      NaN  4.0        NaN          NaN         4.0   
 28   2.0  2.0  2.0  85.0      1.0  3.0        2.0          NaN         3.0   
 29   NaN  NaN  NaN  59.0      1.0  4.0        NaN          NaN         4.0   
 ..   ...  ...  ...   ...      ...  ...        ...          ...         ...   
 970  NaN  NaN  NaN  69.0      NaN  3.0        NaN          NaN         3.0   
 971  1.0  1.0  1.0  67.0      1.0  4.0        4.0          NaN         4.0   
 972  NaN  NaN  NaN  78.0      NaN  3.0        NaN          NaN         3.0   
 973  NaN  NaN  NaN  61.0      NaN  3.0        NaN          NaN         3.0   
 974  1.0  1.0  1.0  60.0      NaN  3.0        1.0          NaN         3.0   
 975  NaN  1.0  1.0  38.0      1.0  3.0        NaN          NaN         3.0   
 976  1.0  1.0  1.0  55.0      1.0  2.0        1.0          NaN         2.0   
 977  NaN  1.0  1.0  57.0      2.0  4.0        1.0          NaN         4.0   
 978  1.0  1.0  1.0  85.0      NaN  3.0        1.0          NaN         3.0   
 979  NaN  NaN  NaN  83.0      NaN  4.0        NaN          NaN         4.0   
 980  1.0  1.0  1.0  80.0      1.0  3.0        1.0          NaN         3.0   
 981  NaN  NaN  NaN  67.0      NaN  7.0        NaN          NaN         7.0   
 982  NaN  NaN  NaN  73.0      NaN  3.0        NaN          NaN         3.0   
 983  NaN  NaN  NaN  74.0      NaN  2.0        NaN          NaN         2.0   
 984  3.0  3.0  3.0  65.0      1.0  3.0        2.0          NaN         3.0   
 985  1.0  1.0  1.0  50.0      NaN  2.0        1.0          1.0         2.0   
 986  1.0  1.0  1.0  34.0      1.0  3.0        1.0          NaN         3.0   
 987  NaN  NaN  NaN  75.0      NaN  3.0        NaN          NaN         3.0   
 988  NaN  NaN  NaN  72.0      NaN  4.0        NaN          NaN         4.0   
 989  2.0  2.0  2.0  66.0      1.0  5.0        2.0          NaN         5.0   
 990  2.0  2.0  2.0  43.0      NaN  3.0        1.0          NaN         3.0   
 991  1.0  1.0  1.0  88.0      NaN  2.0        1.0          NaN         2.0   
 992  NaN  NaN  NaN  89.0      1.0  3.0        NaN          NaN         3.0   
 993  2.0  2.0  2.0  86.0      1.0  2.0        2.0          NaN         2.0   
 994  NaN  NaN  NaN  51.0      NaN  3.0        NaN          NaN         3.0   
 995  NaN  NaN  NaN  70.0      NaN  2.0        NaN          NaN         2.0   
 996  NaN  NaN  NaN  25.0      NaN  5.0        NaN          1.0         5.0   
 997  1.0  1.0  1.0  44.0      NaN  4.0        1.0          NaN         4.0   
 998  3.0  3.0  3.0  37.0      1.0  4.0        3.0          NaN         4.0   
 999  2.0  2.0  2.0  78.0      2.0  6.0        2.0          NaN         6.0   
 
      DiasABP  ...  RespRate  SaO2  SysABP  Temp  TroponinI  TroponinT  Urine  \
 0        NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   44.0   
 1       28.0  ...       NaN   NaN    28.0  11.0        NaN        NaN   34.0   
 2       61.0  ...       NaN   9.0    61.0  12.0        NaN        NaN   40.0   
 3       80.0  ...       NaN  11.0    80.0  31.0        NaN        NaN   47.0   
 4       22.0  ...       NaN   1.0    22.0  16.0        NaN        NaN   35.0   
 5       43.0  ...       NaN   1.0    43.0  17.0        NaN        NaN   38.0   
 6        NaN  ...       NaN   NaN     NaN  15.0        NaN        NaN   46.0   
 7       58.0  ...       NaN   NaN    58.0  46.0        NaN        NaN   44.0   
 8        NaN  ...       NaN   NaN     NaN   NaN        NaN        NaN    NaN   
 9       63.0  ...       NaN   1.0    63.0  51.0        NaN        NaN    4.0   
 10      42.0  ...      45.0   NaN    42.0  12.0        NaN        NaN   16.0   
 11      43.0  ...      55.0   NaN    43.0  17.0        NaN        NaN   34.0   
 12      50.0  ...       NaN  13.0    50.0  47.0        NaN        3.0   44.0   
 13      47.0  ...       NaN   1.0    47.0  11.0        NaN        NaN   33.0   
 14      75.0  ...       NaN   7.0    75.0  50.0        NaN        NaN   49.0   
 15      76.0  ...       NaN   7.0    76.0  62.0        NaN        NaN   52.0   
 16      43.0  ...       NaN   NaN    43.0  14.0        NaN        2.0   31.0   
 17      36.0  ...      53.0   1.0    36.0  13.0        NaN        NaN   26.0   
 18      76.0  ...       NaN   6.0    76.0  76.0        NaN        3.0   38.0   
 19      39.0  ...       NaN   1.0    39.0  12.0        NaN        4.0   36.0   
 20      65.0  ...       NaN   6.0    65.0  36.0        NaN        NaN   45.0   
 21      78.0  ...       NaN   1.0    78.0  63.0        NaN        NaN   42.0   
 22      43.0  ...       NaN   NaN    43.0   9.0        NaN        NaN   35.0   
 23       NaN  ...      44.0   NaN     NaN  11.0        NaN        3.0   12.0   
 24      45.0  ...       NaN   NaN    45.0  47.0        NaN        NaN   46.0   
 25      47.0  ...       NaN   NaN    47.0  12.0        NaN        NaN   42.0   
 26       NaN  ...      45.0   NaN     NaN  11.0        NaN        NaN    5.0   
 27      64.0  ...      65.0   NaN    64.0  14.0        NaN        NaN   46.0   
 28       NaN  ...      34.0   NaN     NaN  10.0        NaN        NaN    5.0   
 29      53.0  ...       NaN   NaN    53.0  23.0        NaN        3.0   31.0   
 ..       ...  ...       ...   ...     ...   ...        ...        ...    ...   
 970     53.0  ...       NaN   NaN    53.0  39.0        NaN        NaN   41.0   
 971      NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   25.0   
 972     65.0  ...       NaN   9.0    65.0  52.0        NaN        NaN   48.0   
 973     73.0  ...       NaN   7.0    73.0  13.0        3.0        NaN   46.0   
 974     11.0  ...      59.0   NaN    11.0  11.0        NaN        3.0   11.0   
 975     44.0  ...       NaN   8.0    44.0  10.0        3.0        NaN   33.0   
 976     67.0  ...       NaN   2.0    67.0  22.0        NaN        NaN   39.0   
 977     82.0  ...       NaN   1.0    82.0  19.0        NaN        NaN   38.0   
 978      NaN  ...      69.0   NaN     NaN  13.0        NaN        NaN   39.0   
 979     72.0  ...      76.0   1.0    72.0  13.0        NaN        2.0   37.0   
 980      NaN  ...      57.0   NaN     NaN  14.0        NaN        NaN   28.0   
 981     54.0  ...       NaN   3.0    54.0  13.0        NaN        3.0   40.0   
 982     82.0  ...       NaN   7.0    83.0  43.0        NaN        NaN   47.0   
 983      5.0  ...       NaN   2.0     5.0  11.0        NaN        NaN   27.0   
 984      NaN  ...      44.0   NaN     NaN  11.0        NaN        NaN    9.0   
 985      NaN  ...      42.0   NaN     NaN   9.0        NaN        NaN   35.0   
 986     66.0  ...       NaN   NaN    66.0  13.0        NaN        NaN   39.0   
 987      NaN  ...      41.0   NaN     NaN  12.0        4.0        NaN   21.0   
 988     56.0  ...       NaN   NaN    56.0  13.0        NaN        NaN   46.0   
 989      NaN  ...      49.0   NaN     NaN  11.0        NaN        NaN   11.0   
 990     53.0  ...       NaN   2.0    53.0  11.0        NaN        1.0   41.0   
 991      NaN  ...       NaN   1.0     NaN  12.0        NaN        NaN   36.0   
 992      NaN  ...      41.0   NaN     NaN  11.0        NaN        NaN   33.0   
 993      NaN  ...       NaN   NaN     NaN  12.0        NaN        NaN   25.0   
 994     58.0  ...       NaN   4.0    58.0  14.0        NaN        NaN   46.0   
 995     52.0  ...       NaN   NaN    52.0  21.0        NaN        NaN   39.0   
 996      NaN  ...      22.0   NaN     NaN  10.0        NaN        NaN    8.0   
 997      6.0  ...       NaN   NaN     6.0  13.0        NaN        NaN   36.0   
 998     52.0  ...       NaN   NaN    52.0  11.0        NaN        NaN   13.0   
 999     92.0  ...       NaN  15.0    92.0  86.0        NaN        NaN   48.0   
 
      WBC  Weight    pH  
 0    2.0   253.0   3.0  
 1    2.0   123.5   2.0  
 2    3.0    80.0  11.0  
 3    5.0    80.0  16.0  
 4    3.0   105.5   6.0  
 5    6.0    58.9   1.0  
 6    2.0    67.7   NaN  
 7    3.0    92.7   7.0  
 8    2.0    -1.0   1.0  
 9    3.0    80.9  12.0  
 10   2.0    70.6   3.0  
 11   5.0    82.5   8.0  
 12   3.0    64.0  10.0  
 13   2.0    77.1   3.0  
 14   3.0    95.0  17.0  
 15   3.0   109.0  22.0  
 16   2.0    63.8   6.0  
 17   2.0    68.9   1.0  
 18   2.0    90.0   9.0  
 19   4.0    63.8   7.0  
 20   4.0    59.5  16.0  
 21   2.0    71.2  10.0  
 22   5.0    82.0  14.0  
 23   2.0    90.0   1.0  
 24   2.0    93.3   7.0  
 25   3.0   100.0   6.0  
 26   4.0    76.1   NaN  
 27   2.0    65.0   NaN  
 28   2.0    -1.0   NaN  
 29   3.0    41.0   9.0  
 ..   ...     ...   ...  
 970  4.0   104.0   8.0  
 971  3.0    68.0   NaN  
 972  4.0    54.1  16.0  
 973  3.0    95.0  10.0  
 974  3.0    95.3   NaN  
 975  3.0    80.0   9.0  
 976  2.0   109.7  10.0  
 977  3.0    93.5   8.0  
 978  3.0    61.5   NaN  
 979  4.0    70.0   5.0  
 980  3.0    63.9   NaN  
 981  3.0   118.0   8.0  
 982  2.0    57.7  16.0  
 983  2.0    65.0   3.0  
 984  4.0   105.1   NaN  
 985  2.0    -1.0   NaN  
 986  3.0    70.0   1.0  
 987  3.0    83.0   NaN  
 988  4.0    80.0   6.0  
 989  2.0    78.2   2.0  
 990  2.0    92.9   8.0  
 991  2.0    90.7   1.0  
 992  2.0    64.0   NaN  
 993  2.0    53.0   NaN  
 994  3.0    75.0  12.0  
 995  2.0    87.0   7.0  
 996  3.0   166.4   NaN  
 997  3.0   109.0   3.0  
 998  4.0    87.4   5.0  
 999  7.0    70.7  25.0  
 
 [1000 rows x 42 columns]}
In [16]:
# get patients' 4000 temporal dataframe - by frequency of measurement of temporal data
all_temporal_dfs__freq = getAllTemporalDataFrameByAggregationType(cv_fold, all_patients, "freq")
all_temporal_dfs__freq.head()
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4000 patients' Temporals data has been extracted with aggregator freq
Out[16]:
ALP ALT AST Age Albumin BUN Bilirubin Cholesterol Creatinine DiasABP ... RespRate SaO2 SysABP Temp TroponinI TroponinT Urine WBC Weight pH
0 NaN NaN NaN 54.0 NaN 2.0 NaN NaN 2.0 NaN ... 42.0 NaN NaN 14.0 NaN NaN 38.0 2.0 -1.0 NaN
1 NaN NaN NaN 76.0 NaN 3.0 NaN NaN 3.0 68.0 ... NaN 6.0 68.0 46.0 NaN NaN 41.0 3.0 76.0 8.0
2 2.0 2.0 2.0 44.0 2.0 3.0 2.0 NaN 3.0 16.0 ... NaN 1.0 16.0 14.0 NaN NaN 41.0 3.0 56.7 4.0
3 1.0 1.0 1.0 68.0 1.0 3.0 1.0 NaN 3.0 NaN ... 59.0 NaN NaN 13.0 NaN NaN 6.0 3.0 84.6 NaN
4 NaN NaN NaN 88.0 1.0 2.0 NaN NaN 2.0 NaN ... 48.0 NaN NaN 15.0 NaN NaN 38.0 2.0 -1.0 NaN

5 rows × 42 columns

Extracting Most Recent Values of Temporal Variables within 48 Hours Period

Assumption:

  • Assuming that the patient is still alive within the 48 hours, his/her condition by the end of the period may have higher significance in determining the length of stay and mortality rate.
  • May reduce the likelihood of missing data.
In [17]:
# in 4 folds - last value of temporal variables within the 48 hours period
all_temporal_dfs_folds__most_recent = getAllTemporalDataFrameByAggregationTypeInFolds(cv_fold, all_patients, "most_recent")
all_temporal_dfs_folds__most_recent
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4 folds of patients' Temporals data has been extracted with aggregator most_recent
Out[17]:
{'Fold1':       Age   BUN  Creatinine   GCS  Gender  Glucose  HCO3   HCT     HR  Height  \
 0    54.0   8.0         0.7  15.0     0.0    115.0  28.0  30.3   86.0    -1.0   
 1    76.0  21.0         1.3  15.0     1.0    146.0  24.0  29.4   65.0   175.3   
 2    44.0   3.0         0.3   5.0     0.0    143.0  25.0  29.4   71.0    -1.0   
 3    68.0  10.0         0.7  15.0     1.0    117.0  28.0  36.3   79.0   180.3   
 4    88.0  25.0         1.0  15.0     0.0     92.0  20.0  30.9   68.0    -1.0   
 5    64.0  16.0         0.7   8.0     1.0    153.0  21.0  35.5   92.0   180.3   
 6    68.0  36.0         4.1  15.0     0.0    115.0  26.0  30.0   60.0   162.6   
 7    78.0  58.0         0.6   9.0     0.0    116.0  12.0  33.0   58.0   162.6   
 8    64.0  23.0         0.7  15.0     0.0    112.0  25.0  28.3  122.0    -1.0   
 9    74.0  22.0         1.3  15.0     1.0    114.0  26.0  28.4   78.0   175.3   
 10   64.0  55.0         1.2  15.0     0.0     81.0  18.0  28.7   91.0    -1.0   
 11   71.0   9.0         0.6  15.0     0.0    138.0  27.0  27.4   95.0   157.5   
 12   66.0  16.0         1.3  15.0     0.0    110.0  25.0  30.0   93.0   157.5   
 13   84.0  89.0         3.3  15.0     1.0    167.0  31.0  28.0   73.0   170.2   
 14   77.0  40.0         1.1  15.0     1.0    151.0  31.0  31.8   68.0   162.6   
 15   78.0  18.0         1.1  15.0     1.0    148.0  22.0  27.7  106.0   167.6   
 16   65.0  47.0         2.4  15.0     1.0    110.0  20.0  30.2   88.0    -1.0   
 17   84.0  32.0         1.1  15.0     1.0    182.0  27.0  29.7   83.0   182.9   
 18   78.0  24.0         1.4  11.0     0.0    137.0  17.0  33.8   73.0    -1.0   
 19   40.0   7.0         0.5  15.0     0.0     96.0  28.0  21.5   92.0   165.1   
 20   48.0   5.0         2.2  15.0     0.0    110.0  23.0  23.6   78.0   154.9   
 21   58.0  13.0         0.6  15.0     1.0     91.0  27.0  24.9   88.0   188.0   
 22   81.0  32.0         1.2  15.0     1.0    129.0  22.0  28.4   61.0    -1.0   
 23   35.0  35.0         1.4  15.0     0.0     68.0  17.0  25.3   82.0    -1.0   
 24   26.0   8.0         0.6   NaN     0.0     95.0  24.0  26.9    NaN    -1.0   
 25   66.0  20.0         4.7  15.0     0.0    104.0  21.0  31.5   65.0   137.2   
 26   80.0  22.0         0.7   8.0     0.0    129.0  21.0  32.8   72.0    -1.0   
 27   53.0  12.0         0.5  14.0     0.0     94.0  23.0  23.2   94.0   177.8   
 28   74.0  21.0         1.4  15.0     1.0    170.0  26.0  26.6   95.0   177.8   
 29   80.0  29.0         1.3  15.0     1.0    106.0  29.0  39.9   85.0   180.3   
 ..    ...   ...         ...   ...     ...      ...   ...   ...    ...     ...   
 970  59.0  22.0         1.8  10.0     1.0    143.0  23.0  31.1  121.0   167.6   
 971  80.0  12.0         0.8   3.0     0.0    243.0  22.0  24.5   58.0    -1.0   
 972  81.0  20.0         1.0  15.0     1.0    108.0  28.0  39.5   89.0   180.3   
 973  43.0  12.0         0.9   7.0     1.0     95.0  22.0  39.4  119.0   177.8   
 974  69.0  21.0         0.9  15.0     0.0    117.0  25.0  29.1   65.0   157.5   
 975  84.0  24.0         1.0  11.0     0.0    142.0  27.0  32.8  100.0    -1.0   
 976  60.0  33.0         5.2  10.0     1.0    123.0  26.0  31.2   89.0    -1.0   
 977  82.0  19.0         0.8  15.0     0.0    135.0  25.0  25.7   89.0   152.4   
 978  83.0  14.0         0.6  12.0     1.0    110.0  22.0  29.1  110.0   170.2   
 979  80.0  31.0         1.1  15.0     1.0     89.0  21.0  34.6   72.0    -1.0   
 980  84.0  12.0         0.6  15.0     0.0     79.0  28.0  32.6   62.0    -1.0   
 981  71.0   NaN         NaN   7.0     1.0      NaN   NaN   NaN   69.0   180.3   
 982  89.0  19.0         0.6   6.0     0.0    147.0  26.0  34.2   89.0    -1.0   
 983  65.0  16.0         0.7   8.0     1.0     24.0  22.0  32.7   90.0   180.3   
 984  69.0   6.0         0.7   9.0     1.0    118.0  22.0  30.7   75.0   172.7   
 985  50.0   4.0         0.5  15.0     0.0     73.0  20.0  24.7   96.0   162.6   
 986  82.0  61.0         1.8  15.0     0.0    193.0  14.0  35.5   71.0    -1.0   
 987  59.0  30.0         1.0   3.0     0.0    185.0  32.0  28.2  132.0    -1.0   
 988  19.0   8.0         0.8  15.0     1.0     96.0  26.0  38.8   83.0    -1.0   
 989  79.0   8.0         0.5   9.0     0.0    115.0  23.0  27.8   82.0   152.4   
 990  84.0  19.0         0.7  14.0     0.0     77.0  23.0  33.9  118.0    -1.0   
 991  66.0   NaN         NaN  15.0     0.0      NaN   NaN   NaN   85.0    -1.0   
 992  84.0   NaN         NaN   9.0     0.0      NaN   NaN   NaN   76.0    -1.0   
 993  90.0  29.0         1.1   NaN     0.0    184.0  21.0  30.4    NaN    -1.0   
 994  71.0  18.0         0.6  15.0     0.0    118.0  22.0  36.2   77.0   160.0   
 995  35.0  33.0         1.1  11.0     0.0     82.0  19.0  29.6   56.0    -1.0   
 996  73.0  21.0         0.7  15.0     1.0    127.0  24.0  39.2   88.0    -1.0   
 997  81.0  59.0         2.1  13.0     1.0    123.0  26.0  26.0   67.0    -1.0   
 998  63.0  25.0         1.4  15.0     1.0    229.0  26.0  25.7  102.0   172.7   
 999  82.0  17.0         0.6  15.0     0.0    139.0  27.0  29.8   80.0   162.6   
 
      ...    pH    ALP     ALT     AST  Albumin  Bilirubin  Lactate  \
 0    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 1    ...  7.37    NaN     NaN     NaN      NaN        NaN      NaN   
 2    ...  7.47  105.0    75.0   164.0      2.3        2.8      0.9   
 3    ...   NaN  105.0    12.0    15.0      4.4        0.2      NaN   
 4    ...   NaN    NaN     NaN     NaN      3.3        NaN      NaN   
 5    ...  7.46  101.0    60.0   162.0      NaN        0.4      NaN   
 6    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 7    ...  7.37   47.0    46.0    82.0      1.9        0.3      1.8   
 8    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 9    ...  7.38    NaN     NaN     NaN      NaN        NaN      NaN   
 10   ...   NaN  402.0    36.0    47.0      2.7        0.1      5.9   
 11   ...  7.41    NaN     NaN     NaN      NaN        NaN      NaN   
 12   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 13   ...   NaN   19.0    15.0    20.0      NaN        0.1      NaN   
 14   ...   NaN    NaN     NaN    57.0      2.9        NaN      NaN   
 15   ...  7.49    NaN     NaN     NaN      NaN        NaN      1.5   
 16   ...  7.36    NaN     NaN     NaN      NaN        NaN      NaN   
 17   ...   NaN    NaN     NaN     NaN      2.6        NaN      NaN   
 18   ...  7.39   51.0    10.0    20.0      2.5        1.6      0.8   
 19   ...  7.44    NaN     NaN     NaN      NaN        NaN      NaN   
 20   ...   NaN  173.0    63.0   152.0      2.0        8.0      NaN   
 21   ...  7.44    NaN     NaN     NaN      NaN        NaN      4.0   
 22   ...   NaN    NaN     NaN     NaN      NaN        NaN      1.6   
 23   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 24   ...  7.37    NaN     NaN     NaN      NaN        NaN      0.7   
 25   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 26   ...  7.48    NaN     NaN     NaN      NaN        NaN      NaN   
 27   ...  7.45  112.0    13.0    20.0      2.0        2.0      1.3   
 28   ...  7.42    NaN     NaN     NaN      NaN        NaN      NaN   
 29   ...  7.53    NaN     NaN     NaN      NaN        NaN      NaN   
 ..   ...   ...    ...     ...     ...      ...        ...      ...   
 970  ...  7.34   65.0    19.0    18.0      3.6        0.5      NaN   
 971  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 972  ...  7.49   71.0    29.0    27.0      3.5        1.1      NaN   
 973  ...  7.43   64.0    78.0   280.0      3.6        0.7      NaN   
 974  ...  7.42   47.0    23.0    41.0      2.8        0.6      1.1   
 975  ...  7.45    NaN     NaN     NaN      3.9        NaN      1.4   
 976  ...  7.47  113.0    17.0    45.0      3.6        0.3      1.3   
 977  ...  7.37    NaN     NaN     NaN      NaN        NaN      3.9   
 978  ...  7.47    NaN     NaN     NaN      NaN        NaN      1.8   
 979  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 980  ...  7.35  108.0   116.0    48.0      2.7        0.5      0.9   
 981  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 982  ...  7.39    NaN     NaN     NaN      3.6        NaN      2.0   
 983  ...  7.35   65.0    20.0    30.0      NaN        0.3      NaN   
 984  ...  7.41  188.0    21.0    24.0      2.4        0.5      NaN   
 985  ...  7.48    NaN     NaN     NaN      NaN        NaN      2.1   
 986  ...  7.21    NaN     NaN     NaN      NaN        NaN      1.6   
 987  ...  7.32   42.0    31.0    26.0      2.8        0.6      1.6   
 988  ...  7.39    NaN     NaN     NaN      NaN        NaN      0.8   
 989  ...  7.46   48.0    37.0    68.0      3.0        0.7      1.0   
 990  ...   NaN  110.0    65.0    57.0      3.2        0.5      1.5   
 991  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 992  ...  7.40    NaN     NaN     NaN      NaN        NaN      NaN   
 993  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 994  ...  7.35    NaN     NaN     NaN      NaN        NaN      NaN   
 995  ...   NaN   61.0  1343.0  1785.0      3.0        0.8      1.2   
 996  ...  7.35    NaN     NaN     NaN      NaN        NaN      1.4   
 997  ...  7.38    NaN     NaN     NaN      NaN        NaN      2.7   
 998  ...  7.43    NaN     NaN     NaN      NaN        NaN      NaN   
 999  ...  7.37    NaN     NaN     NaN      NaN        NaN      NaN   
 
      Cholesterol  TroponinI  TroponinT  
 0            NaN        NaN        NaN  
 1            NaN        NaN        NaN  
 2            NaN        NaN        NaN  
 3            NaN        NaN        NaN  
 4            NaN        NaN        NaN  
 5          212.0        1.3        NaN  
 6            NaN        0.8        NaN  
 7            NaN        3.1        NaN  
 8            NaN        NaN        NaN  
 9            NaN        NaN        NaN  
 10           NaN        NaN        NaN  
 11           NaN        NaN        NaN  
 12           NaN        NaN        NaN  
 13           NaN        NaN        NaN  
 14           NaN        6.6        NaN  
 15           NaN        NaN        NaN  
 16           NaN        NaN        NaN  
 17           NaN        NaN        NaN  
 18          84.0        NaN       0.31  
 19           NaN        NaN        NaN  
 20           NaN        NaN        NaN  
 21           NaN        NaN        NaN  
 22           NaN        NaN       0.13  
 23           NaN        NaN       0.37  
 24           NaN        NaN        NaN  
 25           NaN        1.8        NaN  
 26           NaN        NaN        NaN  
 27           NaN        NaN       0.02  
 28           NaN        NaN        NaN  
 29           NaN        NaN        NaN  
 ..           ...        ...        ...  
 970          NaN        NaN        NaN  
 971          NaN        NaN       0.08  
 972        190.0        NaN        NaN  
 973        150.0        NaN        NaN  
 974          NaN        NaN       0.59  
 975          NaN        NaN        NaN  
 976          NaN        NaN       4.77  
 977          NaN        NaN        NaN  
 978          NaN        NaN       0.05  
 979          NaN        NaN       0.02  
 980          NaN        1.2        NaN  
 981          NaN        NaN        NaN  
 982          NaN        NaN        NaN  
 983          NaN        NaN       1.38  
 984         96.0        4.6        NaN  
 985          NaN        NaN        NaN  
 986          NaN        NaN        NaN  
 987          NaN        NaN        NaN  
 988          NaN        NaN        NaN  
 989          NaN        NaN        NaN  
 990          NaN        NaN       0.29  
 991          NaN        NaN        NaN  
 992          NaN        NaN        NaN  
 993          NaN        NaN        NaN  
 994          NaN        NaN        NaN  
 995          NaN        NaN       0.39  
 996          NaN        NaN        NaN  
 997          NaN        NaN        NaN  
 998          NaN        NaN        NaN  
 999          NaN        NaN        NaN  
 
 [1000 rows x 42 columns],
 'Fold2':        ALP     ALT     AST   Age    BUN  Bilirubin  Creatinine  DiasABP  FiO2  \
 0     66.0    26.0    77.0  56.0   17.0        1.7         0.7     80.0  1.00   
 1      NaN     NaN     NaN  72.0   39.0        NaN         5.0     69.0  0.50   
 2      NaN     NaN     NaN  68.0   49.0        NaN         3.7     70.0  0.40   
 3      NaN     NaN     NaN  77.0   13.0        NaN         1.1     59.0  0.50   
 4      NaN     NaN     NaN  66.0   12.0        NaN         0.8     47.0   NaN   
 5     31.0   122.0    95.0  35.0   13.0        0.3         0.6      NaN   NaN   
 6      NaN     NaN     NaN  79.0   28.0        NaN         1.0      NaN   NaN   
 7     46.0    67.0   123.0  44.0   34.0        1.0         0.9     52.0  0.50   
 8      NaN     NaN     NaN  21.0   16.0        NaN         1.0     62.0  0.40   
 9     81.0  3633.0  4146.0  71.0   25.0        0.9         2.5     59.0  0.60   
 10    45.0    31.0    85.0  90.0   38.0        0.3         1.3     62.0   NaN   
 11     NaN     NaN     NaN  53.0   23.0        NaN         1.1      NaN  0.35   
 12     NaN     NaN     NaN  70.0   34.0        0.9         1.4      0.0  0.50   
 13     NaN     NaN     NaN  70.0   10.0        NaN         0.7     53.0  0.40   
 14     NaN     NaN     NaN  47.0   17.0        0.5         0.6      NaN  0.40   
 15     NaN     NaN     NaN  47.0   24.0        NaN         1.3     54.0  0.50   
 16     NaN     NaN     NaN  57.0   14.0        NaN         0.5      NaN   NaN   
 17   103.0    28.0    34.0  88.0   76.0        0.7         4.2     54.0  0.95   
 18     NaN    95.0    90.0  90.0   24.0        0.3         0.8      NaN   NaN   
 19    97.0     6.0     7.0  68.0   41.0        0.2         1.5      NaN  0.30   
 20     NaN     NaN     NaN  51.0   22.0        NaN         0.8     58.0   NaN   
 21    75.0    55.0    77.0  52.0   19.0        0.5         1.1     47.0  0.40   
 22    89.0    40.0    32.0  49.0    5.0        0.9         0.1      NaN   NaN   
 23     NaN     NaN     NaN  66.0   14.0        NaN         1.1      NaN  1.00   
 24     NaN     NaN     NaN  78.0   11.0        NaN         0.7     60.0  0.40   
 25   102.0    66.0    52.0  45.0   10.0        0.6         0.6     79.0  0.60   
 26    66.0    16.0    22.0  90.0   25.0        0.4         1.2      NaN   NaN   
 27     NaN     NaN     NaN  83.0   25.0        NaN         1.3      NaN   NaN   
 28    99.0    21.0    85.0  51.0    9.0       10.9         0.6     42.0  0.50   
 29    94.0   392.0   158.0  49.0  137.0        1.0         9.6      NaN   NaN   
 ..     ...     ...     ...   ...    ...        ...         ...      ...   ...   
 970   65.0    13.0    27.0  85.0   30.0        1.9         0.9      NaN  1.00   
 971    NaN     NaN     NaN  60.0   28.0        NaN         1.0     70.0  0.40   
 972   41.0    45.0    61.0  23.0    7.0        1.5         0.7     78.0  0.40   
 973    NaN     NaN     NaN  63.0   14.0        NaN         0.7     70.0   NaN   
 974    NaN     NaN     NaN  65.0   13.0        NaN         0.9     40.0   NaN   
 975   48.0    14.0    14.0  74.0   23.0        0.5         1.2      NaN   NaN   
 976    NaN     NaN     NaN  24.0   20.0        NaN         1.8      NaN   NaN   
 977    NaN     NaN     NaN  32.0    8.0        NaN         0.7      NaN   NaN   
 978   87.0    41.0    31.0  40.0   21.0        1.1         1.4    105.0   NaN   
 979    NaN     NaN     NaN  43.0   37.0        NaN         2.1     65.0  0.60   
 980   56.0    26.0    57.0  79.0   10.0        0.8         0.4     62.0  0.30   
 981    NaN     NaN     NaN  48.0   18.0        NaN         0.6     56.0  0.60   
 982    NaN     NaN     NaN  41.0    6.0        NaN         0.6      NaN   NaN   
 983   25.0     7.0    17.0  81.0   21.0        0.8         0.9     63.0  0.60   
 984    NaN     NaN     NaN  67.0    9.0        NaN         0.7     75.0  0.40   
 985    NaN     NaN     NaN  55.0   17.0        NaN         0.7     69.0  0.50   
 986    NaN     NaN     NaN  22.0    6.0        NaN         0.8     80.0  0.40   
 987   35.0    43.0    69.0  80.0   51.0        0.4         1.9     52.0  0.60   
 988  289.0   414.0   216.0  90.0   37.0        1.4         1.4      NaN  0.80   
 989   73.0    35.0    22.0  65.0   20.0        0.6         0.5     56.0   NaN   
 990    NaN     NaN     NaN  63.0   32.0        NaN         1.0     45.0  0.70   
 991  318.0   129.0    28.0  63.0   16.0        1.3         0.9      NaN   NaN   
 992    NaN     NaN     NaN  64.0   16.0        NaN         0.8      NaN   NaN   
 993   41.0    16.0    25.0  40.0   15.0        2.5         0.6     75.0  0.40   
 994    NaN     NaN     NaN  80.0   18.0        NaN         1.2     66.0   NaN   
 995   62.0    35.0   119.0  87.0   16.0        0.5         0.6     39.0  0.40   
 996   72.0    19.0    53.0  90.0   48.0        NaN         2.0      NaN   NaN   
 997    NaN     NaN     NaN  79.0   23.0        NaN         0.9     50.0  0.40   
 998    NaN     NaN     NaN  88.0   14.0        NaN         1.3     65.0   NaN   
 999    NaN     NaN     NaN  61.0   14.0        NaN         0.9      NaN   NaN   
 
       GCS  ...    WBC  Weight    pH  Albumin  Lactate  TroponinT  SaO2  \
 0     9.0  ...  16.00  108.10  7.43      NaN      NaN        NaN   NaN   
 1    15.0  ...  13.20  100.00  7.32      2.7      NaN        NaN   NaN   
 2    15.0  ...   6.60  104.90  7.34      NaN      1.5       1.40   NaN   
 3    15.0  ...   8.60   87.60  7.37      NaN      2.8        NaN  98.0   
 4    15.0  ...    NaN   73.40  7.40      NaN      NaN        NaN  99.0   
 5    15.0  ...   6.70   -1.00   NaN      3.6      NaN        NaN   NaN   
 6    15.0  ...  11.20   81.70   NaN      NaN      NaN        NaN   NaN   
 7    10.0  ...  28.00   70.00  7.51      NaN      1.1        NaN  94.0   
 8    15.0  ...   9.40   84.00  7.45      NaN      1.9        NaN   NaN   
 9     9.0  ...  31.00  123.10  7.49      2.4      5.7        NaN  95.0   
 10   15.0  ...   7.50   55.50   NaN      2.5      2.9       0.53   NaN   
 11   15.0  ...   9.00   74.80  7.44      NaN      1.3        NaN   NaN   
 12    6.0  ...   3.20   78.40  7.36      NaN      1.4       0.98  98.0   
 13   15.0  ...  16.90   99.40  7.42      NaN      NaN        NaN  98.0   
 14   10.0  ...   3.20   56.00   NaN      NaN      NaN        NaN   NaN   
 15   15.0  ...  16.20   82.20  7.41      NaN      NaN        NaN  97.0   
 16   15.0  ...   8.70   68.80   NaN      NaN      NaN        NaN   NaN   
 17   15.0  ...  16.80   77.20  7.38      2.4      1.5        NaN   NaN   
 18   14.0  ...  20.50   56.40   NaN      3.3      1.6       0.03   NaN   
 19   15.0  ...  15.00  135.20   NaN      2.6      NaN        NaN   NaN   
 20   15.0  ...  21.10   75.10  7.36      NaN      1.4        NaN   NaN   
 21   10.0  ...  14.40   71.00  7.40      3.5      3.5        NaN   NaN   
 22    NaN  ...   0.75   -1.00   NaN      NaN      0.7        NaN   NaN   
 23   14.0  ...  11.50   96.40  7.40      NaN      2.4        NaN  95.0   
 24   15.0  ...   8.00  102.50  7.42      NaN      NaN        NaN  97.0   
 25    6.0  ...  35.60  115.80  7.43      2.7      1.2        NaN  97.0   
 26   13.0  ...   6.90  128.60   NaN      3.6      NaN       0.01   NaN   
 27   15.0  ...   8.50   80.00   NaN      NaN      NaN       0.10   NaN   
 28    7.0  ...  10.60   73.90  7.44      3.0      NaN        NaN  97.0   
 29   15.0  ...  14.70   88.60  7.44      2.4      5.1       2.49   NaN   
 ..    ...  ...    ...     ...   ...      ...      ...        ...   ...   
 970  15.0  ...  56.40   -1.00   NaN      3.3      NaN        NaN   NaN   
 971  15.0  ...   7.60   91.57  7.46      NaN      NaN        NaN  97.0   
 972   4.0  ...  17.00   -1.00  7.52      3.7      1.9        NaN   NaN   
 973  15.0  ...  11.10  146.60  7.42      NaN      NaN        NaN  96.0   
 974  15.0  ...   7.10   70.80  7.39      NaN      NaN        NaN   NaN   
 975   3.0  ...   4.50   59.20   NaN      1.7      5.0        NaN   NaN   
 976  15.0  ...   8.20  134.20  7.53      NaN      NaN        NaN   NaN   
 977  15.0  ...   7.80  102.00   NaN      NaN      NaN        NaN   NaN   
 978  15.0  ...  10.70   72.00  7.48      NaN      NaN        NaN  97.0   
 979   6.0  ...  15.20  107.10  7.27      1.5      1.3       0.48  96.0   
 980   7.0  ...   5.10   88.60  7.42      2.9      NaN        NaN  97.0   
 981  15.0  ...  10.20   79.10  7.39      NaN      NaN        NaN   NaN   
 982  15.0  ...   4.90  143.80   NaN      NaN      NaN        NaN   NaN   
 983   8.0  ...  12.00   70.00  7.35      1.8      0.9       0.02  96.0   
 984  15.0  ...   8.80   70.00  7.44      NaN      1.3       0.14   NaN   
 985  15.0  ...   8.50   91.60  7.33      NaN      NaN        NaN  92.0   
 986   3.0  ...  12.60   75.00  7.53      NaN      2.0        NaN   NaN   
 987   3.0  ...  23.30   71.00  7.35      2.2      3.4       0.03  97.0   
 988  15.0  ...  15.20   65.00  7.24      3.1      1.4       1.31   NaN   
 989  15.0  ...  23.60   67.00  7.43      2.1      1.3       0.04  95.0   
 990  15.0  ...  13.40   98.00  7.32      NaN      NaN        NaN  98.0   
 991  15.0  ...  13.70   86.60   NaN      2.6      NaN        NaN   NaN   
 992  15.0  ...   9.70   -1.00   NaN      NaN      NaN        NaN   NaN   
 993  10.0  ...   7.80  101.00  7.45      2.7      0.8        NaN  97.0   
 994  15.0  ...   5.90   -1.00   NaN      3.1      2.0       0.02   NaN   
 995  11.0  ...  11.40   39.20  7.46      3.4      1.7       1.14   NaN   
 996  15.0  ...   9.10   83.30   NaN      NaN      NaN        NaN   NaN   
 997  15.0  ...  14.80   75.80  7.41      NaN      1.5        NaN  94.0   
 998  15.0  ...  14.20   59.00  7.43      NaN      NaN        NaN   NaN   
 999  15.0  ...   5.50   71.00   NaN      NaN      NaN        NaN   NaN   
 
      RespRate  Cholesterol  TroponinI  
 0         NaN          NaN        NaN  
 1         NaN          NaN        NaN  
 2         NaN          NaN        NaN  
 3         NaN          NaN        NaN  
 4         NaN          NaN        NaN  
 5        21.0          NaN        NaN  
 6         NaN          NaN        NaN  
 7         NaN          NaN        NaN  
 8         NaN          NaN        NaN  
 9         NaN          NaN        NaN  
 10       24.0         91.0        NaN  
 11        NaN          NaN        NaN  
 12        NaN          NaN        NaN  
 13        NaN          NaN        NaN  
 14        NaN          NaN        NaN  
 15        NaN          NaN        NaN  
 16       20.0        243.0        NaN  
 17        NaN          NaN        NaN  
 18       25.0          NaN        NaN  
 19        NaN          NaN        NaN  
 20       12.0          NaN        NaN  
 21        NaN          NaN        NaN  
 22        NaN          NaN        NaN  
 23        NaN          NaN        NaN  
 24        NaN          NaN        NaN  
 25        NaN          NaN        NaN  
 26       10.0          NaN        NaN  
 27       15.0          NaN        NaN  
 28        NaN          NaN        NaN  
 29        NaN          NaN        NaN  
 ..        ...          ...        ...  
 970      20.0          NaN        NaN  
 971       NaN          NaN        NaN  
 972       NaN          NaN        NaN  
 973       NaN          NaN        NaN  
 974       NaN        207.0        NaN  
 975      30.0          NaN        NaN  
 976      18.0          NaN        NaN  
 977      17.0          NaN        NaN  
 978      21.0          NaN        NaN  
 979       NaN          NaN        NaN  
 980       NaN          NaN        NaN  
 981       NaN          NaN        NaN  
 982      16.0          NaN        NaN  
 983       NaN          NaN        NaN  
 984       NaN          NaN        NaN  
 985       NaN          NaN        NaN  
 986       NaN          NaN        NaN  
 987       NaN          NaN        NaN  
 988       NaN          NaN        NaN  
 989       NaN          NaN        NaN  
 990       NaN        217.0       10.2  
 991      25.0          NaN        NaN  
 992      18.0          NaN        NaN  
 993       NaN          NaN        NaN  
 994      22.0          NaN        NaN  
 995       NaN        158.0        NaN  
 996      19.0          NaN        NaN  
 997       NaN          NaN        NaN  
 998      19.0          NaN        NaN  
 999       NaN          NaN        NaN  
 
 [1000 rows x 42 columns],
 'Fold3':       Age   BUN  Creatinine  DiasABP  FiO2   GCS  Gender  Glucose  HCO3   HCT  \
 0    57.0  17.0         0.7     69.0  0.70  15.0     1.0    186.0  30.0  30.5   
 1    87.0  16.0         0.8     55.0   NaN  11.0     1.0     92.0  27.0  36.3   
 2    73.0  12.0         0.8     62.0  0.50  11.0     0.0    162.0  23.0  32.4   
 3    72.0  57.0         4.3      NaN  0.40  11.0     0.0    207.0  17.0  23.2   
 4    76.0  23.0         1.4     50.0  0.50  14.0     1.0    143.0  23.0  25.1   
 5    59.0   9.0         0.4     62.0  0.40   9.0     0.0    128.0  24.0  34.3   
 6    76.0  61.0         4.0     57.0  0.40  14.0     1.0    215.0  25.0  28.5   
 7    43.0  47.0         5.0      NaN   NaN  15.0     0.0     82.0  27.0  30.3   
 8    60.0   9.0         0.7      NaN   NaN  15.0     0.0    163.0  28.0  27.5   
 9    60.0  12.0         0.6     60.0  0.35   7.0     1.0    131.0  23.0  27.5   
 10   60.0  12.0         0.9     55.0   NaN  15.0     1.0    127.0  25.0  36.4   
 11   69.0  48.0         2.2      NaN  0.50  15.0     1.0    113.0  31.0  35.2   
 12   74.0  16.0         0.8     50.0  0.50  15.0     0.0    119.0  24.0  31.4   
 13   78.0  41.0         5.5     62.0  0.35  15.0     0.0    141.0  30.0  31.4   
 14   82.0  15.0         0.7     99.0  0.60  15.0     0.0    118.0  23.0  29.6   
 15   24.0  13.0         0.7      NaN   NaN  14.0     0.0    166.0  25.0  36.1   
 16   87.0  23.0         1.1      NaN  0.35  11.0     1.0    170.0  25.0  21.7   
 17   90.0  14.0         1.1     83.0  0.60   9.0     1.0    109.0  21.0  32.2   
 18   68.0  22.0         1.5     88.0  1.00  14.0     1.0    118.0  24.0  35.2   
 19   72.0  41.0         0.7     62.0  0.50  15.0     0.0    161.0  31.0  29.6   
 20   81.0  22.0         0.8      NaN   NaN  14.0     0.0    114.0  34.0  37.0   
 21   75.0  22.0         1.4     71.0  0.70  15.0     1.0    176.0  24.0  32.6   
 22   70.0  30.0         1.1     45.0  0.35  15.0     1.0    200.0  20.0  30.1   
 23   77.0  32.0         3.2     34.0  0.50   3.0     1.0     96.0  26.0  28.1   
 24   72.0  28.0         0.5     59.0  0.70  14.0     0.0    202.0  24.0  27.1   
 25   68.0  39.0         2.5     40.0  0.50  15.0     1.0    121.0  19.0  24.9   
 26   63.0  23.0         1.1     78.0   NaN   3.0     1.0    156.0  23.0  34.6   
 27   68.0  42.0         3.0     47.0  0.50  10.0     1.0    219.0  20.0  25.8   
 28   76.0  31.0         2.1     76.0   NaN  15.0     1.0    131.0  21.0  27.1   
 29   49.0  23.0         0.9     40.0  0.40  15.0     1.0    118.0  24.0  25.5   
 ..    ...   ...         ...      ...   ...   ...     ...      ...   ...   ...   
 970  68.0  19.0         0.9     47.0  0.50  15.0     1.0    126.0  28.0  34.8   
 971  70.0  17.0         1.0     65.0   NaN  15.0     1.0    144.0  24.0  33.7   
 972  74.0  23.0         0.9     62.0  0.40  15.0     0.0    112.0  27.0  30.1   
 973  79.0   8.0         0.6      NaN   NaN  15.0     0.0     74.0  24.0  32.4   
 974  55.0  23.0         1.0     49.0  0.40  15.0     0.0      NaN  27.0  24.4   
 975  48.0  23.0         0.8     98.0   NaN  15.0     0.0    123.0  27.0  26.4   
 976  74.0  23.0         1.1      NaN  0.70  15.0     1.0     97.0  25.0  30.4   
 977  69.0  30.0         1.2     69.0  0.40  11.0     1.0     70.0  23.0  23.7   
 978  79.0  32.0         1.2      NaN   NaN  15.0     0.0    106.0  21.0  29.5   
 979  25.0   6.0         0.6      NaN   NaN  14.0     0.0    132.0  28.0  27.1   
 980  76.0  26.0         1.1     61.0  0.40  12.0     1.0    203.0  25.0  38.7   
 981  73.0  75.0         3.1      NaN   NaN  15.0     1.0    143.0  14.0  35.1   
 982  90.0   8.0         0.5      NaN   NaN  14.0     0.0    111.0  28.0  29.9   
 983  51.0   6.0         0.6      NaN   NaN  15.0     0.0     89.0  27.0  31.0   
 984  75.0   7.0         0.6     52.0  0.40   8.0     0.0    167.0  21.0  31.3   
 985  57.0  22.0         0.7     53.0  0.40  10.0     1.0    235.0  27.0  29.2   
 986  37.0   9.0         0.9      NaN   NaN  15.0     0.0    108.0  25.0  24.8   
 987  35.0  13.0         1.1     47.0  0.40   9.0     1.0    110.0  24.0  24.4   
 988  26.0  10.0         0.8     66.0  0.50   3.0     1.0    110.0  24.0  27.0   
 989  84.0  26.0         0.9     47.0   NaN  15.0     0.0     90.0  20.0  35.4   
 990  76.0  14.0         0.8     51.0  0.50  15.0     1.0    146.0   NaN  30.9   
 991  90.0  34.0         1.3      NaN  0.50  10.0     0.0    187.0  29.0  30.3   
 992  59.0  47.0         6.9     55.0  0.50   9.0     1.0    118.0  30.0  23.0   
 993  55.0   7.0         0.7      NaN   NaN  12.0     1.0    125.0  20.0  28.4   
 994  66.0  21.0         0.9     80.0  0.50  15.0     1.0    167.0  19.0  36.0   
 995  63.0  12.0         0.7     51.0  0.35  15.0     1.0    152.0  20.0  33.6   
 996  26.0   7.0         0.7     77.0  0.40  15.0     1.0     79.0  26.0  31.4   
 997  78.0  32.0         1.4     53.0  0.50  15.0     1.0     92.0  27.0  30.9   
 998  77.0  11.0         0.8    135.0  0.40  14.0     1.0     75.0  18.0  35.0   
 999  38.0   7.0         0.5     57.0  0.50  15.0     0.0    108.0  23.0  27.2   
 
      ...  RespRate    ALP    ALT    AST  Bilirubin  Lactate  Albumin  \
 0    ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 1    ...      10.0    NaN    NaN    NaN        NaN      NaN      NaN   
 2    ...       NaN   28.0   17.0   29.0        0.6     1.80      NaN   
 3    ...       NaN    NaN    NaN    NaN        NaN     2.60      NaN   
 4    ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 5    ...       NaN   66.0   19.0   38.0        0.4     1.50      3.4   
 6    ...       NaN  118.0   68.0   84.0        0.3      NaN      3.4   
 7    ...      16.0    NaN    NaN    NaN        NaN      NaN      NaN   
 8    ...      41.0    NaN    NaN    NaN        NaN      NaN      NaN   
 9    ...       NaN    NaN    NaN    NaN        NaN     2.30      3.3   
 10   ...      16.0    NaN    NaN    NaN        NaN      NaN      NaN   
 11   ...       NaN   57.0   10.0   16.0        0.3     1.20      3.7   
 12   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 13   ...       NaN  101.0    9.0   17.0        0.4     1.80      3.6   
 14   ...      19.0    NaN    NaN    NaN        NaN     1.90      NaN   
 15   ...      20.0    NaN    NaN    NaN        NaN      NaN      NaN   
 16   ...      22.0   67.0   13.0   20.0        0.7     7.60      3.0   
 17   ...       NaN    NaN    NaN    NaN        NaN     1.70      NaN   
 18   ...       NaN   25.0   27.0   45.0        0.6     1.60      NaN   
 19   ...       NaN   98.0   17.0   21.0        1.0     0.90      2.7   
 20   ...      21.0    NaN    NaN    NaN        NaN      NaN      NaN   
 21   ...       NaN    NaN    NaN    NaN        NaN     0.90      NaN   
 22   ...       NaN    NaN    NaN    NaN        NaN     1.91      NaN   
 23   ...       NaN   43.0   24.0  458.0        0.9     3.50      NaN   
 24   ...      15.0   36.0   31.0   17.0        2.9      NaN      5.3   
 25   ...       NaN    NaN    NaN    NaN        NaN     1.60      NaN   
 26   ...       NaN   60.0   21.0   72.0        0.8     2.90      3.1   
 27   ...       NaN  190.0  124.0   33.0        0.7     1.60      2.1   
 28   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 29   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 ..   ...       ...    ...    ...    ...        ...      ...      ...   
 970  ...       NaN   41.0   76.0   85.0        1.2     3.30      NaN   
 971  ...      27.0    NaN    NaN    NaN        NaN     1.40      NaN   
 972  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 973  ...      20.0   75.0   12.0   17.0        0.3      NaN      3.0   
 974  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 975  ...       NaN   64.0   10.0   17.0        0.2     2.70      2.2   
 976  ...       NaN    NaN    NaN    NaN        0.2     1.60      2.6   
 977  ...       NaN    NaN    NaN    NaN        NaN     1.80      NaN   
 978  ...      13.0    NaN    NaN    NaN        NaN      NaN      NaN   
 979  ...      12.0    NaN    NaN    NaN        NaN      NaN      NaN   
 980  ...       NaN    NaN   14.0   16.0        NaN      NaN      3.4   
 981  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 982  ...      27.0    NaN    NaN    NaN        NaN      NaN      NaN   
 983  ...      17.0   66.0   74.0   55.0        0.3      NaN      3.7   
 984  ...       NaN    NaN    NaN    NaN        NaN     1.50      NaN   
 985  ...       NaN  132.0  146.0  188.0        1.1     1.60      2.2   
 986  ...      21.0  127.0   68.0   18.0        0.3      NaN      NaN   
 987  ...       NaN    NaN    NaN    NaN        NaN     1.90      NaN   
 988  ...       NaN   68.0    9.0   15.0        0.4     2.70      NaN   
 989  ...       NaN    NaN    NaN    NaN        NaN      NaN      2.6   
 990  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 991  ...       NaN   49.0    9.0   10.0        0.6     1.20      2.7   
 992  ...       NaN    NaN    NaN    NaN        NaN     1.50      NaN   
 993  ...      23.0    NaN    NaN    NaN        NaN      NaN      NaN   
 994  ...       NaN  217.0  468.0  464.0        0.9     2.20      1.9   
 995  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 996  ...       NaN   90.0   12.0   54.0        1.1     0.90      3.6   
 997  ...       NaN    NaN    NaN    NaN        NaN     1.40      NaN   
 998  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 999  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 
      TroponinT  TroponinI  Cholesterol  
 0          NaN        NaN          NaN  
 1          NaN        NaN          NaN  
 2          NaN        NaN          NaN  
 3          NaN        NaN          NaN  
 4          NaN        NaN          NaN  
 5         0.67        NaN          NaN  
 6          NaN        NaN          NaN  
 7          NaN        NaN          NaN  
 8          NaN        NaN          NaN  
 9          NaN        NaN          NaN  
 10         NaN        NaN          NaN  
 11        0.08        NaN          NaN  
 12         NaN        NaN          NaN  
 13        0.06        NaN          NaN  
 14         NaN        NaN          NaN  
 15         NaN        NaN          NaN  
 16        0.18        NaN          NaN  
 17         NaN        NaN          NaN  
 18         NaN        NaN          NaN  
 19        0.03        NaN          NaN  
 20        0.03        NaN          NaN  
 21         NaN        NaN          NaN  
 22         NaN        NaN          NaN  
 23       11.58        NaN          NaN  
 24         NaN        NaN          NaN  
 25         NaN        NaN          NaN  
 26         NaN        NaN          NaN  
 27         NaN        NaN          NaN  
 28         NaN       13.1          NaN  
 29         NaN        NaN          NaN  
 ..         ...        ...          ...  
 970        NaN       15.0          NaN  
 971        NaN        NaN          NaN  
 972        NaN        NaN          NaN  
 973        NaN        NaN          NaN  
 974        NaN        NaN          NaN  
 975        NaN        NaN          NaN  
 976       0.50        NaN        116.0  
 977       0.11        NaN          NaN  
 978        NaN        NaN          NaN  
 979        NaN        NaN          NaN  
 980       0.01        NaN        237.0  
 981        NaN        NaN          NaN  
 982        NaN        NaN          NaN  
 983        NaN        NaN          NaN  
 984       0.14        NaN          NaN  
 985        NaN        NaN          NaN  
 986        NaN        NaN          NaN  
 987        NaN        NaN          NaN  
 988        NaN        NaN          NaN  
 989        NaN        NaN          NaN  
 990        NaN        NaN          NaN  
 991        NaN        NaN          NaN  
 992        NaN        NaN          NaN  
 993        NaN        NaN          NaN  
 994        NaN        NaN          NaN  
 995        NaN        NaN          NaN  
 996       0.21        NaN          NaN  
 997        NaN        NaN          NaN  
 998        NaN        NaN          NaN  
 999        NaN        NaN          NaN  
 
 [1000 rows x 42 columns],
 'Fold4':       Age    BUN  Creatinine  FiO2   GCS  Gender  Glucose  HCO3   HCT     HR  \
 0    39.0   13.0         0.5  0.40  10.0     0.0     86.0  33.0  32.9   97.0   
 1    70.0   26.0         0.5  0.50  11.0     0.0    144.0  30.0  29.4   89.0   
 2    61.0   18.0         0.9  0.50  15.0     1.0     99.0  30.0  28.8   98.0   
 3    64.0   14.0         1.0  0.60  15.0     1.0    157.0  22.0  26.3  106.0   
 4    45.0   22.0         0.6  0.40  11.0     1.0    139.0  28.0  27.6   84.0   
 5    77.0   29.0         1.7   NaN  13.0     1.0     98.0  23.0  29.9   87.0   
 6    90.0   18.0         0.8  0.40   9.0     0.0     65.0  24.0  29.6  114.0   
 7    66.0   11.0         1.0  0.60  14.0     1.0    110.0  23.0  30.6   90.0   
 8    54.0    9.0         0.7   NaN   NaN     1.0    103.0  26.0  39.7    NaN   
 9    74.0   11.0         0.8  0.40  15.0     1.0    142.0  27.0  24.7   74.0   
 10   73.0   20.0         3.6   NaN  15.0     1.0    141.0  14.0  32.8  107.0   
 11   62.0   20.0         0.7   NaN  13.0     1.0    135.0  26.0  21.6   79.0   
 12   56.0   15.0         1.1  0.40   7.0     1.0    129.0  15.0  39.3   62.0   
 13   57.0   10.0         0.4  0.50  15.0     0.0    106.0  30.0  24.8   92.0   
 14   74.0   14.0         0.8  0.70   3.0     0.0     91.0  23.0  30.2   72.0   
 15   74.0   22.0         1.1  0.50  15.0     1.0     65.0  25.0  30.6   90.0   
 16   67.0   35.0         0.8  0.35  15.0     1.0    173.0  33.0  36.0   99.0   
 17   49.0   13.0         0.9   NaN  14.0     1.0    167.0  24.0  35.5   74.0   
 18   58.0   46.0         2.5  0.50   8.0     0.0    140.0  16.0  24.3   77.0   
 19   72.0   39.0         1.0  0.40   4.0     0.0    159.0  21.0  30.0   83.0   
 20   79.0   31.0         1.3  0.50  14.0     1.0    107.0  19.0  27.1   77.0   
 21   82.0   26.0         1.1  0.40  15.0     1.0    117.0  23.0  31.5   85.0   
 22   45.0   15.0         0.8  0.50   7.0     1.0    208.0  22.0  24.7   92.0   
 23   68.0   31.0         1.3   NaN  15.0     1.0    124.0  23.0  31.7   75.0   
 24   59.0   25.0         1.4  0.40  15.0     0.0    102.0  26.0  26.1   80.0   
 25   24.0   14.0         0.9  0.40  12.0     1.0    141.0  27.0  20.4   70.0   
 26   52.0   25.0         1.0   NaN  15.0     1.0    106.0  15.0  23.4   66.0   
 27   52.0   21.0         0.8   NaN  15.0     0.0    114.0  23.0  29.0   76.0   
 28   85.0   56.0         1.5   NaN  13.0     0.0    108.0  16.0  34.0  108.0   
 29   59.0   68.0         5.9  0.40   9.0     0.0    100.0  19.0  29.1  106.0   
 ..    ...    ...         ...   ...   ...     ...      ...   ...   ...    ...   
 970  69.0   22.0         1.0  0.50  15.0     1.0    120.0  27.0  24.9   84.0   
 971  67.0    9.0         0.6   NaN  15.0     1.0    147.0  18.0  26.2   78.0   
 972  78.0   10.0         0.5  0.40  11.0     0.0     66.0  27.0  30.5   92.0   
 973  61.0   26.0         0.7  0.40   6.0     1.0    136.0  25.0  27.7   90.0   
 974  60.0   17.0         1.0   NaN  15.0     1.0    112.0  26.0  33.0   74.0   
 975  38.0    7.0         0.5  0.50   6.0     0.0    138.0  21.0  25.0   80.0   
 976  55.0   23.0         0.7  0.60   9.0     0.0    140.0  21.0  27.9   72.0   
 977  57.0   10.0         0.5  0.40   6.0     1.0    126.0  25.0  24.3  106.0   
 978  85.0   40.0         1.1   NaN  15.0     0.0    101.0  20.0  33.9   60.0   
 979  83.0   48.0         3.2  1.00   7.0     1.0    156.0  21.0  40.1   86.0   
 980  80.0   13.0         0.6   NaN  14.0     1.0    103.0  17.0  26.9   62.0   
 981  67.0   27.0         0.6  0.50   8.0     0.0    107.0  26.0  28.3   71.0   
 982  73.0   29.0         1.7  0.60  15.0     0.0    111.0  32.0  27.5   67.0   
 983  74.0   59.0         5.5   NaN  15.0     0.0    181.0  17.0  31.4   75.0   
 984  65.0    6.0         0.5   NaN  15.0     1.0    220.0  23.0  36.3  108.0   
 985  50.0    6.0         0.5   NaN  15.0     0.0     88.0  21.0  40.1   75.0   
 986  34.0    9.0         0.8   NaN  15.0     1.0    118.0  24.0  33.4   65.0   
 987  75.0   20.0         1.1   NaN  15.0     1.0    109.0  21.0  28.4   74.0   
 988  72.0   16.0         0.6  0.70  13.0     0.0    149.0  31.0  35.0   84.0   
 989  66.0   96.0         6.5   NaN  15.0     1.0    131.0  12.0  60.3   56.0   
 990  43.0   20.0         1.0  0.40  15.0     1.0     95.0  25.0  35.6   97.0   
 991  88.0   39.0         1.6  1.00  15.0     1.0    112.0  21.0  32.6   60.0   
 992  89.0   14.0         1.0   NaN  11.0     1.0     96.0  23.0  36.3   83.0   
 993  86.0   69.0         2.2   NaN  15.0     1.0    102.0  20.0  31.7   70.0   
 994  51.0   15.0         0.5  0.40  10.0     0.0    111.0  27.0  29.1  106.0   
 995  70.0   18.0         1.0  0.50  15.0     0.0    106.0  22.0  30.3   89.0   
 996  25.0    7.0         0.7   NaN  15.0     1.0     88.0  28.0  31.9   80.0   
 997  44.0    6.0         1.0  0.40   5.0     1.0    132.0  25.0  37.8   86.0   
 998  37.0  114.0        11.7  0.50   3.0     1.0    118.0  21.0  27.1   82.0   
 999  78.0   24.0         1.5  0.50  14.0     0.0    126.0  19.0  30.7   84.0   
 
      ...  SysABP   SaO2    ALP     ALT     AST  Bilirubin  Cholesterol  \
 0    ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 1    ...   120.0    NaN    NaN     NaN     NaN        NaN          NaN   
 2    ...    82.0   98.0    NaN     NaN     NaN        NaN          NaN   
 3    ...   101.0   94.0   71.0    83.0    60.0        0.6          NaN   
 4    ...   130.0   97.0   46.0    34.0    43.0        0.7          NaN   
 5    ...   139.0  100.0  112.0    39.0    87.0        0.5          NaN   
 6    ...     NaN    NaN   88.0    19.0    22.0        0.7        105.0   
 7    ...    96.0    NaN    NaN     NaN     NaN        NaN          NaN   
 8    ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 9    ...   113.0   98.0    NaN     NaN     NaN        NaN          NaN   
 10   ...   136.0    NaN   84.0    55.0    77.0        1.0          NaN   
 11   ...   124.0    NaN    NaN     NaN     NaN        NaN          NaN   
 12   ...    90.0  100.0    NaN     NaN     NaN        0.7        218.0   
 13   ...   117.0   95.0    NaN     NaN     NaN        NaN          NaN   
 14   ...   116.0   98.0    NaN     NaN     NaN        NaN          NaN   
 15   ...   153.0   98.0    NaN     NaN     NaN        NaN          NaN   
 16   ...   140.0    NaN   93.0    63.0    22.0        0.3        101.0   
 17   ...   101.0   97.0    NaN     NaN     NaN        NaN        204.0   
 18   ...   125.0   98.0   84.0     7.0    34.0        0.6          NaN   
 19   ...   141.0   99.0    NaN     NaN     NaN        NaN          NaN   
 20   ...   141.0   97.0    NaN     NaN     NaN        NaN          NaN   
 21   ...   120.0   96.0    NaN     NaN     NaN        NaN          NaN   
 22   ...   108.0    NaN    NaN     NaN     NaN        NaN          NaN   
 23   ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 24   ...   141.0    NaN    NaN     NaN     NaN        NaN          NaN   
 25   ...   152.0    NaN    NaN     NaN     NaN        NaN          NaN   
 26   ...     NaN    NaN  255.0    35.0    50.0       10.4          NaN   
 27   ...   122.0    NaN    NaN     NaN     NaN        NaN          NaN   
 28   ...     NaN    NaN  322.0    50.0    26.0        2.3          NaN   
 29   ...   149.0    NaN    NaN     NaN     NaN        NaN          NaN   
 ..   ...     ...    ...    ...     ...     ...        ...          ...   
 970  ...   105.0    NaN    NaN     NaN     NaN        NaN          NaN   
 971  ...     NaN    NaN   55.0    18.0    21.0        0.7          NaN   
 972  ...   108.0   97.0    NaN     NaN     NaN        NaN          NaN   
 973  ...   157.0   99.0    NaN     NaN     NaN        NaN          NaN   
 974  ...   111.0    NaN   98.0    54.0   223.0        0.6          NaN   
 975  ...   139.0   98.0    NaN    29.0    46.0        NaN          NaN   
 976  ...   150.0   94.0   71.0    38.0    49.0        0.3          NaN   
 977  ...   103.0   98.0    NaN    30.0    22.0        0.6          NaN   
 978  ...     NaN    NaN   51.0    12.0    17.0        0.3          NaN   
 979  ...   124.0   97.0    NaN     NaN     NaN        NaN          NaN   
 980  ...     NaN    NaN   41.0    13.0    16.0        1.3          NaN   
 981  ...   108.0   99.0    NaN     NaN     NaN        NaN          NaN   
 982  ...   104.0   98.0    NaN     NaN     NaN        NaN          NaN   
 983  ...    95.0   91.0    NaN     NaN     NaN        NaN          NaN   
 984  ...     NaN    NaN  215.0    91.0    44.0        5.4          NaN   
 985  ...     NaN    NaN   57.0     9.0    13.0        0.2        132.0   
 986  ...   113.0    NaN   37.0    39.0    46.0        0.9          NaN   
 987  ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 988  ...   158.0    NaN    NaN     NaN     NaN        NaN          NaN   
 989  ...     NaN    NaN  144.0    36.0   205.0       34.7          NaN   
 990  ...   157.0   96.0   48.0   114.0    40.0        0.3          NaN   
 991  ...     NaN   95.0  163.0   115.0    80.0        9.2          NaN   
 992  ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 993  ...     NaN    NaN  155.0    28.0    35.0        0.9          NaN   
 994  ...   112.0   98.0    NaN     NaN     NaN        NaN          NaN   
 995  ...   152.0    NaN    NaN     NaN     NaN        NaN          NaN   
 996  ...     NaN    NaN    NaN     NaN     NaN        NaN        117.0   
 997  ...   113.0    NaN   51.0    20.0    20.0        0.5          NaN   
 998  ...   145.0    NaN  158.0  1513.0  1277.0        0.6          NaN   
 999  ...   129.0   98.0   42.0     9.0    99.0        0.5          NaN   
 
      RespRate  TroponinT  TroponinI  
 0         NaN        NaN        NaN  
 1         NaN        NaN        NaN  
 2         NaN        NaN        NaN  
 3         NaN        NaN        NaN  
 4         NaN        NaN        NaN  
 5         NaN        NaN        NaN  
 6         NaN        NaN        NaN  
 7         NaN        NaN        NaN  
 8         NaN        NaN        NaN  
 9         NaN        NaN        NaN  
 10       22.0        NaN        NaN  
 11       20.0        NaN        NaN  
 12        NaN      11.18        NaN  
 13        NaN        NaN        NaN  
 14        NaN        NaN        NaN  
 15        NaN        NaN        NaN  
 16        NaN       0.03        NaN  
 17       19.0        NaN        NaN  
 18        NaN       1.80        NaN  
 19        NaN       0.38        NaN  
 20        NaN        NaN        NaN  
 21        NaN        NaN        NaN  
 22        NaN        NaN        NaN  
 23       18.0       0.61        NaN  
 24        NaN        NaN        NaN  
 25        NaN        NaN        NaN  
 26       19.0        NaN        NaN  
 27       20.0        NaN        NaN  
 28       20.0        NaN        NaN  
 29        NaN       0.38        NaN  
 ..        ...        ...        ...  
 970       NaN        NaN        NaN  
 971       NaN        NaN        NaN  
 972       NaN        NaN        NaN  
 973       NaN        NaN       11.7  
 974      16.0       4.63        NaN  
 975       NaN        NaN        6.3  
 976       NaN        NaN        NaN  
 977       NaN        NaN        NaN  
 978      15.0        NaN        NaN  
 979      24.0       0.11        NaN  
 980      21.0        NaN        NaN  
 981       NaN       0.03        NaN  
 982       NaN        NaN        NaN  
 983       NaN        NaN        NaN  
 984      13.0        NaN        NaN  
 985      22.0        NaN        NaN  
 986       NaN        NaN        NaN  
 987      24.0        NaN        0.4  
 988       NaN        NaN        NaN  
 989      17.0        NaN        NaN  
 990       NaN       0.02        NaN  
 991       NaN        NaN        NaN  
 992      23.0        NaN        NaN  
 993       NaN        NaN        NaN  
 994       NaN        NaN        NaN  
 995       NaN        NaN        NaN  
 996      18.0        NaN        NaN  
 997       NaN        NaN        NaN  
 998       NaN        NaN        NaN  
 999       NaN        NaN        NaN  
 
 [1000 rows x 42 columns]}
In [18]:
# get patients' 4000 temporal dataframe - by the last value within the 48 hours period
all_temporal_dfs__most_recent = getAllTemporalDataFrameByAggregationType(cv_fold, all_patients, "most_recent")
all_temporal_dfs__most_recent.head()
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4000 patients' Temporals data has been extracted with aggregator most_recent
Out[18]:
Age BUN Creatinine GCS Gender Glucose HCO3 HCT HR Height ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 54.0 8.0 0.7 15.0 0.0 115.0 28.0 30.3 86.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 76.0 21.0 1.3 15.0 1.0 146.0 24.0 29.4 65.0 175.3 ... 7.37 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 44.0 3.0 0.3 5.0 0.0 143.0 25.0 29.4 71.0 -1.0 ... 7.47 105.0 75.0 164.0 2.3 2.8 0.9 NaN NaN NaN
3 68.0 10.0 0.7 15.0 1.0 117.0 28.0 36.3 79.0 180.3 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 88.0 25.0 1.0 15.0 0.0 92.0 20.0 30.9 68.0 -1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN

5 rows × 42 columns

Extracting Earliest Values of Temporal Variables in the initial time period of the 48 Hours

Assumption:

  • Patients' condition in the initial hours of the 48 hours time period, his/her condition may have a higher significance in determining the length of stay and mortality rate.
In [19]:
# in 4 folds  - by the first value of each of the temporal data
all_temporal_dfs_folds__earliest = getAllTemporalDataFrameByAggregationType(cv_fold, all_patients, "earliest")
all_temporal_dfs_folds__earliest
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4000 patients' Temporals data has been extracted with aggregator earliest
Out[19]:
Age BUN Creatinine GCS Gender Glucose HCO3 HCT HR Height ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 54.0 13.0 0.8 15.0 0.0 205.0 26.0 33.7 73.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 76.0 16.0 0.8 3.0 1.0 105.0 21.0 24.7 88.0 175.3 ... 7.45 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 44.0 8.0 0.4 7.0 0.0 141.0 24.0 28.5 100.0 -1.0 ... 7.51 127.0 91.0 235.0 2.7 3.0 1.3 NaN NaN NaN
3 68.0 23.0 0.9 15.0 1.0 129.0 28.0 41.3 79.0 180.3 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 88.0 45.0 1.0 15.0 0.0 113.0 18.0 22.6 93.0 -1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN
5 64.0 15.0 1.4 7.0 1.0 264.0 19.0 41.6 78.0 180.3 ... 7.29 101.0 45.0 47.0 NaN 0.4 NaN 212.0 1.3 NaN
6 68.0 32.0 3.4 15.0 0.0 94.0 25.0 31.9 73.0 162.6 ... NaN NaN NaN NaN NaN NaN NaN NaN 0.7 NaN
7 78.0 81.0 0.9 15.0 0.0 132.0 18.0 32.6 111.0 162.6 ... 7.40 47.0 46.0 82.0 1.9 0.3 1.4 NaN 3.5 NaN
8 64.0 21.0 0.7 15.0 0.0 113.0 21.0 28.3 127.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 74.0 19.0 1.1 10.0 1.0 106.0 23.0 31.5 67.0 175.3 ... 7.39 NaN NaN NaN NaN NaN NaN NaN NaN NaN
10 64.0 64.0 1.3 15.0 0.0 106.0 20.0 18.1 101.0 -1.0 ... NaN 402.0 36.0 47.0 2.7 0.1 5.7 NaN NaN NaN
11 71.0 9.0 0.5 10.0 0.0 132.0 25.0 34.1 84.0 157.5 ... 7.44 NaN NaN NaN NaN NaN NaN NaN NaN NaN
12 66.0 18.0 1.4 15.0 0.0 105.0 25.0 28.1 88.0 157.5 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 84.0 83.0 3.6 15.0 1.0 166.0 31.0 27.0 70.0 170.2 ... NaN 19.0 15.0 20.0 NaN 0.1 NaN NaN NaN NaN
14 77.0 44.0 1.4 15.0 1.0 157.0 31.0 27.6 80.0 162.6 ... NaN NaN NaN 57.0 2.9 NaN NaN NaN 6.6 NaN
15 78.0 21.0 1.0 3.0 1.0 90.0 22.0 26.0 73.0 167.6 ... 7.36 NaN NaN NaN NaN NaN 1.5 NaN NaN NaN
16 65.0 36.0 1.9 9.0 1.0 341.0 28.0 41.8 94.0 -1.0 ... 7.34 NaN NaN NaN NaN NaN NaN NaN NaN NaN
17 84.0 31.0 1.1 15.0 1.0 170.0 28.0 29.6 101.0 182.9 ... NaN NaN NaN NaN 2.6 NaN NaN NaN NaN NaN
18 78.0 16.0 0.9 3.0 0.0 204.0 19.0 15.1 124.0 -1.0 ... 7.10 48.0 10.0 13.0 2.2 0.9 1.9 84.0 NaN 0.03
19 40.0 10.0 0.5 3.0 0.0 114.0 27.0 25.0 79.0 165.1 ... 7.11 NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 48.0 7.0 3.5 15.0 0.0 63.0 23.0 26.2 115.0 154.9 ... NaN 202.0 58.0 102.0 2.0 6.8 NaN NaN NaN NaN
21 58.0 18.0 0.8 3.0 1.0 213.0 21.0 34.6 119.0 188.0 ... 7.38 NaN NaN NaN NaN NaN 4.0 NaN NaN NaN
22 81.0 27.0 1.4 15.0 1.0 140.0 22.0 31.7 62.0 -1.0 ... NaN NaN NaN NaN NaN NaN 1.6 NaN NaN 0.15
23 35.0 68.0 2.3 15.0 0.0 603.0 11.0 25.5 112.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.15
24 26.0 9.0 0.5 NaN 0.0 175.0 20.0 17.0 NaN -1.0 ... 7.37 NaN NaN NaN NaN NaN 0.8 NaN NaN NaN
25 66.0 27.0 5.0 15.0 0.0 76.0 22.0 31.1 76.0 137.2 ... NaN NaN NaN NaN NaN NaN NaN NaN 1.2 NaN
26 80.0 19.0 0.8 6.0 0.0 201.0 28.0 40.9 51.0 -1.0 ... 7.44 NaN NaN NaN NaN NaN NaN NaN NaN NaN
27 53.0 33.0 1.0 8.0 0.0 150.0 21.0 26.8 98.0 177.8 ... 7.48 124.0 14.0 20.0 2.0 1.3 1.8 NaN NaN 0.02
28 74.0 10.0 1.0 3.0 1.0 131.0 21.0 24.9 103.0 177.8 ... 7.39 NaN NaN NaN NaN NaN NaN NaN NaN NaN
29 80.0 30.0 1.1 15.0 1.0 172.0 27.0 37.9 67.0 180.3 ... 7.49 NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3970 69.0 17.0 0.9 10.0 1.0 131.0 25.0 22.0 90.0 177.8 ... 7.45 NaN NaN NaN NaN NaN 3.1 NaN NaN NaN
3971 67.0 7.0 0.6 15.0 1.0 100.0 17.0 20.5 87.0 -1.0 ... NaN 55.0 18.0 21.0 3.0 0.3 NaN NaN NaN NaN
3972 78.0 11.0 0.4 3.0 0.0 66.0 22.0 24.3 97.0 157.5 ... 7.25 NaN NaN NaN NaN NaN 2.8 NaN NaN NaN
3973 61.0 33.0 0.8 3.0 1.0 163.0 27.0 33.3 90.0 182.9 ... 7.46 NaN NaN NaN NaN NaN 1.8 NaN 47.5 NaN
3974 60.0 16.0 0.9 15.0 1.0 105.0 23.0 37.4 94.0 -1.0 ... NaN 98.0 54.0 223.0 NaN 0.6 NaN NaN NaN 9.03
3975 38.0 12.0 0.8 10.0 0.0 131.0 18.0 37.6 86.0 165.1 ... 7.35 NaN 29.0 46.0 4.6 NaN 6.9 NaN 13.5 NaN
3976 55.0 19.0 0.5 9.0 0.0 143.0 19.0 22.2 96.0 -1.0 ... 7.15 71.0 38.0 49.0 2.6 0.3 1.2 NaN NaN NaN
3977 57.0 21.0 0.9 9.0 1.0 121.0 25.0 27.2 140.0 188.0 ... 7.37 NaN 30.0 22.0 2.3 0.6 2.8 NaN NaN NaN
3978 85.0 77.0 1.1 15.0 0.0 114.0 21.0 35.6 60.0 -1.0 ... NaN 51.0 12.0 17.0 NaN 0.3 NaN NaN NaN NaN
3979 83.0 18.0 1.0 13.0 1.0 220.0 24.0 51.0 90.0 180.3 ... 7.48 NaN NaN NaN NaN NaN 1.3 NaN NaN 0.05
3980 80.0 17.0 0.7 14.0 1.0 142.0 20.0 25.6 89.0 167.6 ... NaN 41.0 13.0 16.0 2.5 1.3 NaN NaN NaN NaN
3981 67.0 19.0 0.6 10.0 0.0 90.0 26.0 31.1 106.0 170.2 ... 7.29 NaN NaN NaN NaN NaN 1.1 NaN NaN 0.02
3982 73.0 20.0 1.5 3.0 0.0 146.0 27.0 22.7 88.0 152.4 ... 7.26 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3983 74.0 55.0 4.7 15.0 0.0 117.0 16.0 30.0 45.0 -1.0 ... 7.13 NaN NaN NaN NaN NaN 3.1 NaN NaN NaN
3984 65.0 17.0 0.9 15.0 1.0 184.0 25.0 34.5 100.0 -1.0 ... NaN 216.0 122.0 60.0 3.1 6.7 NaN NaN NaN NaN
3985 50.0 6.0 0.4 15.0 0.0 90.0 19.0 37.6 58.0 -1.0 ... NaN 57.0 9.0 13.0 NaN 0.2 NaN 132.0 NaN NaN
3986 34.0 20.0 0.8 15.0 1.0 151.0 23.0 35.0 76.0 175.3 ... 7.38 37.0 39.0 46.0 3.9 0.9 NaN NaN NaN NaN
3987 75.0 29.0 1.3 15.0 1.0 113.0 24.0 27.0 83.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN 1.5 NaN
3988 72.0 15.0 0.8 10.0 0.0 193.0 30.0 31.2 71.0 165.1 ... 7.03 NaN NaN NaN NaN NaN 4.0 NaN NaN NaN
3989 66.0 80.0 4.0 15.0 1.0 53.0 13.0 61.8 62.0 -1.0 ... 7.42 148.0 34.0 139.0 2.7 36.0 2.6 NaN NaN NaN
3990 43.0 23.0 1.2 7.0 1.0 151.0 22.0 42.9 106.0 -1.0 ... 7.28 50.0 150.0 86.0 NaN 0.3 5.4 NaN NaN 0.02
3991 88.0 29.0 1.4 15.0 1.0 123.0 21.0 35.2 70.0 -1.0 ... 7.29 163.0 115.0 80.0 NaN 9.2 NaN NaN NaN NaN
3992 89.0 13.0 1.1 12.0 1.0 100.0 22.0 32.5 87.0 177.8 ... NaN NaN NaN NaN 3.6 NaN NaN NaN NaN NaN
3993 86.0 57.0 2.2 15.0 1.0 106.0 18.0 31.8 72.0 162.6 ... NaN 192.0 35.0 37.0 2.6 1.1 NaN NaN NaN NaN
3994 51.0 9.0 0.5 3.0 0.0 122.0 25.0 29.0 90.0 -1.0 ... 7.34 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3995 70.0 14.0 0.8 3.0 0.0 121.0 21.0 27.8 83.0 -1.0 ... 7.42 NaN NaN NaN NaN NaN 2.3 NaN NaN NaN
3996 25.0 5.0 0.9 15.0 1.0 96.0 20.0 30.3 81.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN 117.0 NaN NaN
3997 44.0 10.0 1.2 8.0 1.0 99.0 23.0 37.9 86.0 -1.0 ... 7.39 51.0 20.0 20.0 NaN 0.5 NaN NaN NaN NaN
3998 37.0 65.0 7.6 6.0 1.0 125.0 31.0 28.8 74.0 -1.0 ... 7.52 176.0 2364.0 2038.0 3.1 0.9 1.9 NaN NaN NaN
3999 78.0 22.0 1.0 3.0 0.0 126.0 24.0 22.3 88.0 157.5 ... 7.37 46.0 28.0 153.0 2.2 0.7 1.0 NaN NaN NaN

4000 rows × 42 columns

In [20]:
# get patients' 4000 temporal dataframe - by the first value of each of the temporal data
all_temporal_dfs__earliest = getAllTemporalDataFrameByAggregationType(cv_fold, all_patients, "earliest")
all_temporal_dfs__earliest.head()
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4000 patients' Temporals data has been extracted with aggregator earliest
Out[20]:
Age BUN Creatinine GCS Gender Glucose HCO3 HCT HR Height ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 54.0 13.0 0.8 15.0 0.0 205.0 26.0 33.7 73.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 76.0 16.0 0.8 3.0 1.0 105.0 21.0 24.7 88.0 175.3 ... 7.45 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 44.0 8.0 0.4 7.0 0.0 141.0 24.0 28.5 100.0 -1.0 ... 7.51 127.0 91.0 235.0 2.7 3.0 1.3 NaN NaN NaN
3 68.0 23.0 0.9 15.0 1.0 129.0 28.0 41.3 79.0 180.3 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 88.0 45.0 1.0 15.0 0.0 113.0 18.0 22.6 93.0 -1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN

5 rows × 42 columns

In [21]:
def extractPatientOutcome(temp_dir):
    data = pd.read_csv(temp_dir, sep=",", encoding = "ISO-8859-1")
    return data
In [22]:
# get patients' outcome dataframe
all_outcome_dfs_folds = {} 

for csv_file in outcomes:
    fold_num = csv_file[0:5]
    all_outcome_dfs_folds[fold_num] = pd.DataFrame()
    all_outcome_dfs_folds[fold_num] = all_outcome_dfs_folds[fold_num].append(extractPatientOutcome(os.path.join(dirpaths[:1][0], csv_file)), ignore_index=True)

print(len(all_outcome_dfs_folds), "folds of patients' outcome data has been extracted")
4 folds of patients' outcome data has been extracted
In [23]:
# get patients' outcome dataframe
all_outcome_dfs = pd.DataFrame()

for csv_file in outcomes:
    all_outcome_dfs = all_outcome_dfs.append(extractPatientOutcome(os.path.join(dirpaths[:1][0], csv_file)), ignore_index=True)

print(len(all_outcome_dfs), "folds of patients' outcome data has been extracted")
4000 folds of patients' outcome data has been extracted
In [24]:
all_temporal_dfs__most_recent.head()
Out[24]:
Age BUN Creatinine GCS Gender Glucose HCO3 HCT HR Height ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 54.0 8.0 0.7 15.0 0.0 115.0 28.0 30.3 86.0 -1.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 76.0 21.0 1.3 15.0 1.0 146.0 24.0 29.4 65.0 175.3 ... 7.37 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 44.0 3.0 0.3 5.0 0.0 143.0 25.0 29.4 71.0 -1.0 ... 7.47 105.0 75.0 164.0 2.3 2.8 0.9 NaN NaN NaN
3 68.0 10.0 0.7 15.0 1.0 117.0 28.0 36.3 79.0 180.3 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 88.0 25.0 1.0 15.0 0.0 92.0 20.0 30.9 68.0 -1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN

5 rows × 42 columns

Summary of Data Structures:

  • dirpaths => List of all directory paths
  • outcomes => Lists of all outcome.csv files
  • folds = Dictionary of all patients txt.files by folds
  • static_variables => List of all static variables
  • temporal_variables => List of all temporal variables
  • all_patients => {patient id: raw dataframe from txt.file}
  • cv_fold => {fold# : list of all patients'id}
  • all_static_dfs
  • all_temporal_dfs__freq
  • all_temporal_dfs__most_recent
  • all_temporal_dfs__earliest
  • all_outcome_dfs

Folds Dictionary Objects for training and testing:

  • all_static_dfs_folds
  • all_temporal_dfs_folds__freq
  • all_temporal_dfs_folds__most_recent
  • all_temporal_dfs_folds__earliest
  • all_outcome_dfs_folds

Data Exploration

Objectives:

  • Understand and analyse patients' data and distribution.
  • Determine the factors to be considered in design matrix.
  • Explore relationships of the patients' health variables.
In [25]:
# function to visualize a patients' temporal data given an id
def visualizePatientTemporalData(data):
    temporal = data[5:] # include weight
    patient_id = data.loc[data['Time'] == '00:00', :][:-5] 

    plt.figure(figsize=(24, 9))
    tick_spacing = 2

    chart = sns.scatterplot(temporal['Time'], temporal['Parameter'], s=150)
    chart.set_title("Patient ID: " + str(int(patient_id['Value'])))
    chart.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
    plt.setp(chart.get_xticklabels(), rotation=45)
    plt.show()

MUST HAVE Visualization

Plot of each patient's temporal data.

In [26]:
# test visualization of Temporal data
visualizePatientTemporalData(all_patients['132539'])                        

Further Data Exploration

all_features_df_XXX - Merge outcome variables with static and temporal variables

In [27]:
# dataframes of all variables
print(all_outcome_dfs.columns)
print(all_static_dfs.columns)
print(all_temporal_dfs__most_recent.columns)
Index(['RecordID', 'Length_of_stay', 'In-hospital_death'], dtype='object')
Index(['Age', 'Gender', 'Height', 'ICUType', 'RecordID', 'Weight'], dtype='object')
Index(['Age', 'BUN', 'Creatinine', 'GCS', 'Gender', 'Glucose', 'HCO3', 'HCT',
       'HR', 'Height', 'ICUType', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP',
       'Na', 'Platelets', 'RecordID', 'RespRate', 'Temp', 'Urine', 'WBC',
       'Weight', 'DiasABP', 'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2',
       'SysABP', 'pH', 'ALP', 'ALT', 'AST', 'Albumin', 'Bilirubin', 'Lactate',
       'Cholesterol', 'TroponinI', 'TroponinT'],
      dtype='object')
In [28]:
# exploration by frequency of measurement temporal data
all_features_df = pd.merge(all_outcome_dfs, all_static_dfs)
all_features_df__freq = pd.merge(all_features_df, all_temporal_dfs__freq, how='left', on=["RecordID", "Age", "Gender", "Height", "ICUType", "Weight"])
print("Number of Features is", len(all_features_df__freq.columns))
print(all_features_df__freq.columns)
all_features_df__freq.head()
Number of Features is 44
Index(['RecordID', 'Length_of_stay', 'In-hospital_death', 'Age', 'Gender',
       'Height', 'ICUType', 'Weight', 'ALP', 'ALT', 'AST', 'Albumin', 'BUN',
       'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS',
       'Glucose', 'HCO3', 'HCT', 'HR', 'K', 'Lactate', 'MAP', 'MechVent', 'Mg',
       'NIDiasABP', 'NIMAP', 'NISysABP', 'Na', 'PaCO2', 'PaO2', 'Platelets',
       'RespRate', 'SaO2', 'SysABP', 'Temp', 'TroponinI', 'TroponinT', 'Urine',
       'WBC', 'pH'],
      dtype='object')
Out[28]:
RecordID Length_of_stay In-hospital_death Age Gender Height ICUType Weight ALP ALT ... Platelets RespRate SaO2 SysABP Temp TroponinI TroponinT Urine WBC pH
0 132539 5 0 54 0 -1 4 -1 NaN NaN ... 2.0 42.0 NaN NaN 14.0 NaN NaN 38.0 2.0 NaN
1 132540 8 0 76 1 175.3 2 76 NaN NaN ... 5.0 NaN 6.0 68.0 46.0 NaN NaN 41.0 3.0 8.0
2 132541 19 0 44 0 -1 3 56.7 2.0 2.0 ... 3.0 NaN 1.0 16.0 14.0 NaN NaN 41.0 3.0 4.0
3 132543 9 0 68 1 180.3 3 84.6 1.0 1.0 ... 3.0 59.0 NaN NaN 13.0 NaN NaN 6.0 3.0 NaN
4 132545 4 0 88 0 -1 3 -1 NaN NaN ... 2.0 48.0 NaN NaN 15.0 NaN NaN 38.0 2.0 NaN

5 rows × 44 columns

In [29]:
# exploration by patient's condition in which he/she came into ICU by (earliest temporal data)
all_features_df = pd.merge(all_outcome_dfs, all_static_dfs)
all_features_df__earliest = pd.merge(all_features_df, all_temporal_dfs__earliest, how='left', on=["RecordID", "Age", "Gender", "Height", "ICUType", "Weight"])
print("Number of Features is", len(all_features_df__earliest.columns))
print(all_features_df__earliest.columns)
all_features_df__earliest.head()
Number of Features is 44
Index(['RecordID', 'Length_of_stay', 'In-hospital_death', 'Age', 'Gender',
       'Height', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'Urine', 'WBC', 'DiasABP', 'FiO2',
       'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2', 'SysABP', 'pH', 'ALP',
       'ALT', 'AST', 'Albumin', 'Bilirubin', 'Lactate', 'Cholesterol',
       'TroponinI', 'TroponinT'],
      dtype='object')
Out[29]:
RecordID Length_of_stay In-hospital_death Age Gender Height ICUType Weight BUN Creatinine ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 132539 5 0 54 0 -1 4 -1 13.0 0.8 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 132540 8 0 76 1 175.3 2 76 16.0 0.8 ... 7.45 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 132541 19 0 44 0 -1 3 56.7 8.0 0.4 ... 7.51 127.0 91.0 235.0 2.7 3.0 1.3 NaN NaN NaN
3 132543 9 0 68 1 180.3 3 84.6 23.0 0.9 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 132545 4 0 88 0 -1 3 -1 45.0 1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN

5 rows × 44 columns

In [30]:
# exploration by patient's condition by end of 48 hours in ICU (latest temporal data)
all_features_df = pd.merge(all_outcome_dfs, all_static_dfs)
all_features_df__most_recent = pd.merge(all_features_df, all_temporal_dfs__most_recent, how='left', on=["RecordID", "Age", "Gender", "Height", "ICUType", "Weight"])
print("Number of Features is", len(all_features_df__most_recent.columns))
print(all_features_df__most_recent.columns)
all_features_df__most_recent.head()
Number of Features is 44
Index(['RecordID', 'Length_of_stay', 'In-hospital_death', 'Age', 'Gender',
       'Height', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'Urine', 'WBC', 'DiasABP', 'FiO2',
       'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2', 'SysABP', 'pH', 'ALP',
       'ALT', 'AST', 'Albumin', 'Bilirubin', 'Lactate', 'Cholesterol',
       'TroponinI', 'TroponinT'],
      dtype='object')
Out[30]:
RecordID Length_of_stay In-hospital_death Age Gender Height ICUType Weight BUN Creatinine ... pH ALP ALT AST Albumin Bilirubin Lactate Cholesterol TroponinI TroponinT
0 132539 5 0 54 0 -1 4 -1 8.0 0.7 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 132540 8 0 76 1 175.3 2 76 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 132541 19 0 44 0 -1 3 56.7 3.0 0.3 ... 7.47 105.0 75.0 164.0 2.3 2.8 0.9 NaN NaN NaN
3 132543 9 0 68 1 180.3 3 84.6 10.0 0.7 ... NaN 105.0 12.0 15.0 4.4 0.2 NaN NaN NaN NaN
4 132545 4 0 88 0 -1 3 -1 25.0 1.0 ... NaN NaN NaN NaN 3.3 NaN NaN NaN NaN NaN

5 rows × 44 columns

Below is an exploration of the data types of variables and number of observations. This is to check for missing values for each parameter and for data cleaning. This process is executed for each aggregation type (i.e. freq, most recent, earliest).

In [31]:
all_features_df__most_recent.info()
# Static Variables are in non-null objects
# Temporal data are in float
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4000 entries, 0 to 3999
Data columns (total 44 columns):
RecordID             4000 non-null object
Length_of_stay       4000 non-null int64
In-hospital_death    4000 non-null int64
Age                  4000 non-null object
Gender               4000 non-null object
Height               4000 non-null object
ICUType              4000 non-null object
Weight               4000 non-null object
BUN                  2508 non-null float64
Creatinine           2508 non-null float64
GCS                  2496 non-null float64
Glucose              2501 non-null float64
HCO3                 2504 non-null float64
HCT                  2508 non-null float64
HR                   2497 non-null float64
K                    2503 non-null float64
Mg                   2488 non-null float64
NIDiasABP            2297 non-null float64
NIMAP                2296 non-null float64
NISysABP             2302 non-null float64
Na                   2504 non-null float64
Platelets            2506 non-null float64
RespRate             899 non-null float64
Temp                 2496 non-null float64
Urine                2461 non-null float64
WBC                  2503 non-null float64
DiasABP              1591 non-null float64
FiO2                 1551 non-null float64
MAP                  1584 non-null float64
MechVent             1364 non-null float64
PaCO2                1745 non-null float64
PaO2                 1745 non-null float64
SaO2                 847 non-null float64
SysABP               1591 non-null float64
pH                   1762 non-null float64
ALP                  1155 non-null float64
ALT                  1182 non-null float64
AST                  1183 non-null float64
Albumin              1106 non-null float64
Bilirubin            1175 non-null float64
Lactate              1343 non-null float64
Cholesterol          234 non-null float64
TroponinI            127 non-null float64
TroponinT            609 non-null float64
dtypes: float64(36), int64(2), object(6)
memory usage: 1.4+ MB

Data Cleaning

In [32]:
def cleanData(df): 
    # Convert Age, Gender, ICUType to convert to categorical or binary variables
    df[["Age", "Gender", "ICUType"]] = df[["Age", "Gender", "ICUType"]].astype(int)
    
    # Convert Height, Weight from objects into float
    df[["Height","Weight"]] = df[["Height","Weight"]].apply(pd.to_numeric)
    
    # Missing Data -1: To change to NaN. To ensure that there are no negative values for the visualisation.
    df = df.replace(to_replace=-1, value = np.nan)
    
    # Missing Data NaN: To change to 0. To ensure that there are no null values for the visualisation.
    df[['MechVent']] = df[['MechVent']].fillna(value = 0)
    return df
In [33]:
all_features_df__most_recent = cleanData(all_features_df__most_recent)
all_features_df__most_recent.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4000 entries, 0 to 3999
Data columns (total 44 columns):
RecordID             4000 non-null int64
Length_of_stay       4000 non-null int64
In-hospital_death    4000 non-null int64
Age                  4000 non-null int32
Gender               3997 non-null float64
Height               2106 non-null float64
ICUType              4000 non-null int32
Weight               3674 non-null float64
BUN                  2508 non-null float64
Creatinine           2508 non-null float64
GCS                  2496 non-null float64
Glucose              2501 non-null float64
HCO3                 2504 non-null float64
HCT                  2508 non-null float64
HR                   2497 non-null float64
K                    2503 non-null float64
Mg                   2488 non-null float64
NIDiasABP            2297 non-null float64
NIMAP                2296 non-null float64
NISysABP             2302 non-null float64
Na                   2504 non-null float64
Platelets            2506 non-null float64
RespRate             899 non-null float64
Temp                 2496 non-null float64
Urine                2461 non-null float64
WBC                  2503 non-null float64
DiasABP              1591 non-null float64
FiO2                 1551 non-null float64
MAP                  1584 non-null float64
MechVent             4000 non-null float64
PaCO2                1745 non-null float64
PaO2                 1745 non-null float64
SaO2                 847 non-null float64
SysABP               1591 non-null float64
pH                   1762 non-null float64
ALP                  1155 non-null float64
ALT                  1182 non-null float64
AST                  1183 non-null float64
Albumin              1106 non-null float64
Bilirubin            1175 non-null float64
Lactate              1343 non-null float64
Cholesterol          234 non-null float64
TroponinI            127 non-null float64
TroponinT            609 non-null float64
dtypes: float64(39), int32(2), int64(3)
memory usage: 1.3 MB
In [34]:
all_features_df__freq = cleanData(all_features_df__freq)
In [35]:
all_features_df__earliest = cleanData(all_features_df__earliest)

Visualize kde distribution of variables

This is to explore the distribution of the features across all ICUs.

In [36]:
# visulize all distirbution of each parameter (Boxplot, histogram, kde)
cat_feature = ["Gender", "ICUType", "MechVent"]
num_feature = ["Age","Height","Weight","BUN","Creatinine","GCS","Glucose","HCO3","HCT","HR",
               "K","Mg","NIDiasABP","NIMAP","NISysABP","Na","Platelets","RespRate","Temp",
               "TroponinT","Urine","WBC","Lactate","DiasABP","FiO2","MAP","PaCO2","PaO2",
               "SaO2","SysABP","pH","ALP","ALT","AST","Albumin","Bilirubin","TroponinI",
               "Cholesterol"]
print(len(cat_feature))
print(len(num_feature))
3
38
In [37]:
# This is to explore the distribution of the features across all ICUs, gender, MechVent or In-hospital death.
from collections import OrderedDict

def visualizeKDEDistribution(df, catType="IcuType"):
    plt.figure(figsize = (30, 90))

    colors = OrderedDict({1: 'blue', 2: 'orange', 3: 'green', 4: 'red'})
    bin_colors = OrderedDict({0: 'blue', 1: 'red'})
    icuTypes = OrderedDict({1: 'Coronary Care Unit', 2: "Cardiac Surgery Recovery Unit", 3: 'Medical ICU', 4: 'Surgical ICU'})
    genderTypes = OrderedDict({0: 'Female', 1: 'Male'})
    mechVentTypes = OrderedDict({0: 'Not Mech Vent', 1: 'Mech Vent'})
    inHospitalDeathTypes = OrderedDict({0: 'Survive', 1: 'In-Hospital Death'})
    
    for i, feature in enumerate(num_feature):
        ax = plt.subplot(19, 2, i + 1)
        
        if catType == "IcuType":
            for icuTypes_level, color in colors.items():
                sns.kdeplot(df.loc[df['ICUType'] == icuTypes_level, feature].dropna(), 
                            ax = ax, color = color, label = icuTypes[icuTypes_level])

            plt.title(f'{feature.capitalize()} Distribution'); plt.xlabel(f'{feature}'); plt.ylabel('Density')
        
        if catType == "Gender": 
            for genderTypes_level, color in bin_colors.items():
                sns.kdeplot(df.loc[df['Gender'] == genderTypes_level, feature].dropna(), 
                            ax = ax, color = color, label = genderTypes[genderTypes_level])

            plt.title(f'{feature.capitalize()} Distribution'); plt.xlabel(f'{feature}'); plt.ylabel('Density')

        if catType == "MechVent": 
            for mechVentTypes_level, color in bin_colors.items():
                sns.kdeplot(df.loc[df['MechVent'] == mechVentTypes_level, feature].dropna(), 
                            ax = ax, color = color, label = mechVentTypes[mechVentTypes_level])

            plt.title(f'{feature.capitalize()} Distribution'); plt.xlabel(f'{feature}'); plt.ylabel('Density')
   
        if catType == "In-hospital_death": 
            for inHospitalDeathTypes_level, color in bin_colors.items():
                sns.kdeplot(df.loc[df['In-hospital_death'] == inHospitalDeathTypes_level, feature].dropna(), 
                            ax = ax, color = color, label = inHospitalDeathTypes[inHospitalDeathTypes_level])

            plt.title(f'{feature.capitalize()} Distribution'); plt.xlabel(f'{feature}'); plt.ylabel('Density')
            
    plt.subplots_adjust(top = 2)
In [38]:
# find by ICU Type, how many in-hospital deaths
# distribution of the length of stay
all_features_df = pd.merge(all_outcome_dfs, all_static_dfs)
all_features_df.head()

# length of stay by ICUType and in-hospital death distribution
sns.catplot(x="ICUType", y="Length_of_stay", col="In-hospital_death", kind="box", data=all_features_df, aspect=1);

Above are some observations of the length of stay by ICUType and in-hospital death distribution:

  1. There are some outliers who spend a long time in the hospital but still died.
  2. There is a larger spread of length of stays for non-deaths than deaths
  3. Cardiac ICUType has the largest IQR for length of stay as comparied to the rest

Objective: To examine the number of patients based on survival or dead outcomes over the 4 folds

In [39]:
# Get the list of record ids of all patients
all_record_ids = []

# Get the list of survival patients and dead patients
outcome_dead_list = []
outcome_survival_list =[]
outcome_df_folds = pd.DataFrame(columns = ['Dead' , 'Survived'], index=['Fold1', 'Fold2', 'Fold3' , 'Fold4'])

outcome_fold1 = all_outcome_dfs_folds['Fold1']

for n in range(len(outcome_fold1)):
    all_record_ids.append(outcome_fold1['RecordID'][n])
    if (outcome_fold1['In-hospital_death'][n] == 1):
        outcome_dead_list.append(outcome_fold1['RecordID'][n])
        all_record_ids.append
    else :
        outcome_survival_list.append(outcome_fold1['RecordID'][n])

fold1_outcome_dead = len(outcome_dead_list)
fold1_outcome_survive = len(outcome_survival_list)
outcome_df_folds.iloc[0] = [fold1_outcome_dead, fold1_outcome_survive]
outcome_fold2 = all_outcome_dfs_folds['Fold2']

for n in range(len(outcome_fold2)):
    all_record_ids.append(outcome_fold2['RecordID'][n])
    if (outcome_fold2['In-hospital_death'][n] == 1):
        outcome_dead_list.append(outcome_fold2['RecordID'][n])
        all_record_ids.append
    else :
        outcome_survival_list.append(outcome_fold2['RecordID'][n])

fold2_outcome_dead = len(outcome_dead_list)-fold1_outcome_dead
fold2_outcome_survive = len(outcome_survival_list)-fold1_outcome_survive
outcome_df_folds.iloc[1] = [fold2_outcome_dead, fold2_outcome_survive]
outcome_fold3 = all_outcome_dfs_folds['Fold3']

for n in range(len(outcome_fold3)):
    all_record_ids.append(outcome_fold2['RecordID'][n])
    if (outcome_fold3['In-hospital_death'][n] == 1):
        outcome_dead_list.append(outcome_fold3['RecordID'][n])
    else :
        outcome_survival_list.append(outcome_fold3['RecordID'][n])

fold3_outcome_dead = len(outcome_dead_list)-fold1_outcome_dead-fold2_outcome_dead
fold3_outcome_survive = len(outcome_survival_list)-fold1_outcome_survive-fold2_outcome_survive
outcome_df_folds.iloc[2] = [fold3_outcome_dead, fold3_outcome_survive]
outcome_fold4 = all_outcome_dfs_folds['Fold4']

for n in range(len(outcome_fold4)):
    all_record_ids.append(outcome_fold2['RecordID'][n])
    if (outcome_fold4['In-hospital_death'][n] == 1):
        outcome_dead_list.append(outcome_fold4['RecordID'][n])
    else :
        outcome_survival_list.append(outcome_fold4['RecordID'][n])

fold4_outcome_dead = len(outcome_dead_list)-fold1_outcome_dead-fold2_outcome_dead-fold3_outcome_dead
fold4_outcome_survive = len(outcome_survival_list)-fold1_outcome_survive-fold2_outcome_survive-fold3_outcome_survive
outcome_df_folds.iloc[3] = [fold4_outcome_dead, fold4_outcome_survive]

print("The total number of patients are", len(all_record_ids))
print("The number of dead patients are", len(outcome_dead_list))
print("The number of patients who survived are", len(outcome_survival_list), "\n")

print("Number of Patients based on Dead or Survival Outcome:")
print(outcome_df_folds)
outcome_df_folds.plot(kind='bar', stacked=False, figsize=[12,6])  
plt.title('Number of Patients based on Dead or Survival Outcome') 
plt.xticks(rotation=0)
plt.show()       
The total number of patients are 4000
The number of dead patients are 554
The number of patients who survived are 3446 

Number of Patients based on Dead or Survival Outcome:
       Dead  Survived
Fold1   136       864
Fold2   148       852
Fold3   142       858
Fold4   128       872
In [40]:
from fractions import Fraction

print("The ratio of death over survival in each fold:")
for index, row in outcome_df_folds.iterrows():
    print("Ratio of", index, ":", Fraction(row['Dead']/row['Survived']).limit_denominator(), "=", row['Dead']/row['Survived'])
The ratio of death over survival in each fold:
Ratio of Fold1 : 17/108 = 0.1574074074074074
Ratio of Fold2 : 37/213 = 0.17370892018779344
Ratio of Fold3 : 71/429 = 0.1655011655011655
Ratio of Fold4 : 16/109 = 0.14678899082568808

Observation: The ratio between death and survival outcomes in each fold is almost similar.

Objective: To examine how ICU Type is associated to the length of stays and in-hospital death

In [41]:
sns.catplot(x="ICUType", kind="count", col="In-hospital_death", data=all_features_df, aspect=1)
Out[41]:
<seaborn.axisgrid.FacetGrid at 0x1b525f1dba8>

Above are the observations of in-hospital death against ICU types:

  • Cardiac has the disproportionate number of deaths - lesser deaths when compared to the rest of the other ICU.

Objective: To examine the most frequent monitored variables for each ICUType

In [42]:
# frequency of each temporal variable for death and ICU types
all_features_df__freq[all_features_df__freq['ICUType'] == 1].loc[:, 'ALP':'pH'].sum().sort_values(ascending=False).plot.bar(figsize = (14, 5))
plt.xlabel('Variables'); plt.ylabel('Sum of Frequency');
plt.title('Sum of Frequency vs Variables in Coronary Care Unit');
# Top 12 frequently monitored variables are HR, MAP, SysABP, DiasABP, Urine, NISysABP, NIDiasABP, NIMAP, RespRate, Temp, GCS, FiO2
In [43]:
all_features_df__freq[all_features_df__freq['ICUType'] == 2].loc[:, 'ALP':'pH'].sum().sort_values(ascending=False).plot.bar(figsize = (14, 5), color='orange')
plt.xlabel('Variables'); plt.ylabel('Sum of Frequency');
plt.title('Sum of Frequency vs Variables in Cardiac Surgery Recovery Unit');
# Top 12 frequently monitored variables are HR, MAP, SysABP, DiasABP, Urine, Temp, GCS, NISysABP, NIDiasABP, NIMAP, pH, PaCO2
In [44]:
all_features_df__freq[all_features_df__freq['ICUType'] == 3].loc[:, 'ALP':'pH'].sum().sort_values(ascending=False).plot.bar(figsize = (14, 5), color='green')
plt.xlabel('Variables'); plt.ylabel('Sum of Frequency');
plt.title('Sum of Frequency vs Variables in Medical ICU');
# Top 12 frequently monitored variables are HR, NISysABP, NIDiasABP, NIMAP, Urine, SysABP, DiasABP, MAP, RespRate, Temp, GCS, FiO2
In [45]:
all_features_df__freq[all_features_df__freq['ICUType'] == 4].loc[:, 'ALP':'pH'].sum().sort_values(ascending=False).plot.bar(figsize = (14, 5), color='red')
plt.xlabel('Variables'); plt.ylabel('Sum of Frequency');
plt.title('Sum of Frequency vs Variables in Surgical ICU');
# Top 12 frequently monitored variables are HR, SysABP, DiasABP, MAP, Urine, GCS, NISysABP, NIDiasABP, NIMAP, Temp, RespRate, FiO2
In [46]:
print('ICUType 1 and 2 has difference in variable', set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'NISysABP', 'NIDiasABP', 'NIMAP', 'RespRate', 'Temp', 'GCS', 'FiO2']).symmetric_difference(set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'Temp', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'pH', 'PaCO2'])))
print('ICUType 1 and 3 has no difference in variable', set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'NISysABP', 'NIDiasABP', 'NIMAP', 'RespRate', 'Temp', 'GCS', 'FiO2']).symmetric_difference(set(['HR', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Urine', 'SysABP', 'DiasABP', 'MAP', 'RespRate', 'Temp', 'GCS', 'FiO2'])))
print('ICUType 1 and 4 has no difference in variable', set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'NISysABP', 'NIDiasABP', 'NIMAP', 'RespRate', 'Temp', 'GCS', 'FiO2']).symmetric_difference((['HR', 'SysABP', 'DiasABP', 'MAP', 'Urine', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Temp', 'RespRate', 'FiO2'])))

print('ICUType 2 and 3 has difference in variable', set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'Temp', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'pH', 'PaCO2']).symmetric_difference((['HR', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Urine', 'SysABP', 'DiasABP', 'MAP', 'RespRate', 'Temp', 'GCS', 'FiO2'])))
print('ICUType 2 and 4 has difference in variable', set(['HR', 'MAP', 'SysABP', 'DiasABP', 'Urine', 'Temp', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'pH', 'PaCO2']).symmetric_difference((['HR', 'SysABP', 'DiasABP', 'MAP', 'Urine', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Temp', 'RespRate', 'FiO2'])))

print('ICUType 3 and 4 has no difference in variable', set(['HR', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Urine', 'SysABP', 'DiasABP', 'MAP', 'RespRate', 'Temp', 'GCS', 'FiO2']).symmetric_difference((['HR', 'SysABP', 'DiasABP', 'MAP', 'Urine', 'GCS', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Temp', 'RespRate', 'FiO2'])))
# ICUType 1 and 2, 2 and 3, 2 and 4 have some different variables
# ICUType 1 and 3, 1 and 4, 3 and 4 has same variables

# the different variables {'pH', 'PaCO2'} is in ICUType 2 and {'RespRate','FiO2'} in ICUType 1,3,4
ICUType 1 and 2 has difference in variable {'PaCO2', 'pH', 'RespRate', 'FiO2'}
ICUType 1 and 3 has no difference in variable set()
ICUType 1 and 4 has no difference in variable set()
ICUType 2 and 3 has difference in variable {'RespRate', 'FiO2', 'PaCO2', 'pH'}
ICUType 2 and 4 has difference in variable {'RespRate', 'FiO2', 'PaCO2', 'pH'}
ICUType 3 and 4 has no difference in variable set()

Objective: Distribution of each features association to In-hospital death based on on the most recent value of the period 48 hours

In [47]:
feat = ['Age', 'HR', 'MAP', 'SysABP','DiasABP', 'NISysABP', 'NIDiasABP', 'NIMAP', 'Temp', 'GCS', 'pH', 'RespRate', 'PaCO2', 'FiO2']
colors = OrderedDict({1: 'blue', 2: 'orange', 3: 'green', 4: 'red'})
icuTypes = OrderedDict({1: 'Coronary Care Unit', 2: "Cardiac Surgery Recovery Unit", 3: 'Medical ICU', 4: 'Surgical ICU'})

for num in range(1, 5):
    
    all_features_df__most_recent[(all_features_df__most_recent['ICUType'] == num) & (all_features_df__most_recent['In-hospital_death'] == 0) ].loc[:, feat].plot.box(figsize = (20, 5), color=colors[num])
    plt.xlabel('Variables'); plt.ylabel('Unit');
    plt.xticks(rotation='vertical')
    plt.title('BoxPlot of variables for ' + icuTypes[num] + ' for those who survive');

    all_features_df__most_recent[(all_features_df__most_recent['ICUType'] == num) & (all_features_df__most_recent['In-hospital_death'] == 1) ].loc[:, feat].plot.box(figsize = (20, 5), color=colors[num])
    plt.xlabel('Variables'); plt.ylabel('Unit');
    plt.xticks(rotation='vertical')
    plt.title('BoxPlot of variables for ' + icuTypes[num] + ' for those who died');

Observation: Compare the distributions of length of stay and in-hospital death for possible associations

In [48]:
plt.figure(figsize = (10, 2))

bin_colors = OrderedDict({0: 'red', 1: 'blue'})

inHospitalDeathTypes = OrderedDict({0: 'Survive', 1: 'In-Hospital Death'})

feature = "Length of Stay"

all_features_df__earliest_stay = all_features_df__earliest.loc[all_features_df__earliest['Length_of_stay'] <= 60]

for inHospitalDeathTypes_level, color in bin_colors.items():
    sns.distplot(all_features_df__earliest_stay.loc[all_features_df__earliest_stay['In-hospital_death'] == inHospitalDeathTypes_level, "Length_of_stay"].dropna(), 
                 color = color, label = inHospitalDeathTypes[inHospitalDeathTypes_level], kde=False)
plt.legend()
plt.title(f'{feature.capitalize()} Distribution'); 
plt.xlabel(f'{feature}'); 
plt.ylabel('Frequency')

plt.subplots_adjust(top = 2)

# The in-hospital death seem to be right skewed. Majority of those who died falls in the range of 0-10 and 10-20 days in hospital

Objective: To examine the distribution of the conditions in which patient come in whether is there association to mortality

In [49]:
visualizeKDEDistribution(all_features_df__earliest, "In-hospital_death")

Key Observations on the condition of patients upon arrival in ICU associating to in-hospital death:

  1. Age: Those below 70 are likely to survive as compared to those who are older.
  2. BUN has a higher kurtosis for those who survive. Those who has BUN more than 25 are more likely to die
  3. Creatinine has a higher kurtosis for those who survive. Those who has more than 2 are more likely to die
  4. GCS has no clear distribution to indicate association to in-hospital
  5. Glucose level greater than 200 is associated to more deaths
  6. HCO3 below 18 and above 27 associated to more deaths
  7. HR of more than 100 are associated to more deaths
  8. Resprate more than 30 has higher association to deaths
  9. Presence of ALT and AST has higher association to deaths
  10. Albumin level below 3.2 has higher association to death.
In [50]:
# it may make more sense to find our in what condition the patient comes in
visualizeKDEDistribution(all_features_df__earliest, "IcuType")

Key observations comparing with the previous graph

  1. Those below 70 are likely to survive as compared to those who are older. (Deaths from mostly blue and yellow)
  2. BUN has a higher kurtosis for those who survive. Those who has BUN more than 25 are more likely to die (blue and green)
  3. Creatinine has a higher kurtosis for those who survive. Those who has more than 2 are more likely to die (blue and green)
  4. GCS has no clear distribution to indicate association to in-hospital
  5. Glucose level greater than 200 is associated to more deaths (Blue and Green)
  6. HCO3 below 18 and above 27 associated to more deaths (Red, Blue, Green)
  7. HR of more than 100 are associated to more deaths (Red, Blue, Green)
  8. Resprate more than 30 (Blue, Green)
  9. presence of ALT and AST has higher association to deaths
  10. Albumin level below 3.2 has higher association to death. (all)
In [51]:
# it may make more sense to find our in what condition the patient comes in
visualizeKDEDistribution(all_features_df__earliest, "Gender")
# Both Genders have quite comparable distribution and density measures for each of the features,

Objective: Explore correlation between features and targets

In [52]:
def plotFeaturesCorrelationMatrix(df, num_feature=None):
    
    if num_feature == None:
        corr = df.corr()
        mask = np.zeros_like(corr)
        mask[np.triu_indices_from(mask)] = True
        with sns.axes_style("white"):
            sns.set(rc={'figure.figsize':(30,20)})
            ax = sns.heatmap(corr, mask=mask, vmin=-1, vmax=1, square=True, cmap='seismic',annot=True)

    else:    
        corr = df[num_feature].corr()
        mask = np.zeros_like(corr)
        mask[np.triu_indices_from(mask)] = True
        with sns.axes_style("white"):
            sns.set(rc={'figure.figsize':(30,20)})
            ax = sns.heatmap(corr, mask=mask, vmin=-1, vmax=1, square=True, cmap='seismic',annot=True)

    return corr
In [53]:
correlation_matrix_earliest = plotFeaturesCorrelationMatrix(all_features_df__earliest)
In [54]:
# to extract and find the top correlated features pair in list format
levels = {'high': 0.8, 'mid': 0.4}
def extractCorrelatedFeatures(correlation_matrix, levels):
    correlation_dict = {}
    
    highly_correlation = correlation_matrix[(correlation_matrix.ix[:,:] >= levels['high']) & (correlation_matrix.ix[:,:] < 1)]
    correlation_dict['high_positive'] = findFeaturePairs(highly_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
 
    highly_correlation = correlation_matrix[(correlation_matrix.ix[:,:] <= (-1 * levels['high']))]
    correlation_dict['high_negative'] = findFeaturePairs(highly_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))

    mid_correlation = correlation_matrix[(correlation_matrix.ix[:,:] >= levels['mid']) & (correlation_matrix.ix[:,:] < levels['high'])]
    correlation_dict['mid_positive'] = findFeaturePairs(mid_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
 

    mid_correlation = correlation_matrix[((correlation_matrix.ix[:,:] <= (-1 * levels['mid'])) & 
                                          (correlation_matrix.ix[:,:] > (-1 * levels['high'])))]
    correlation_dict['mid_negative'] = findFeaturePairs(mid_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
    
    low_correlation = correlation_matrix[(correlation_matrix.ix[:,:] < levels['mid'])]
    correlation_dict['low_positive'] = findFeaturePairs(low_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
 
    low_correlation = correlation_matrix[((correlation_matrix.ix[:,:] >= (-1 * levels['mid'])) & (correlation_matrix.ix[:,:] < 0))]
    correlation_dict['low_negative'] = findFeaturePairs(low_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
 
    no_correlation = correlation_matrix[(correlation_matrix.ix[:,:] == 0)]
    correlation_dict['no'] = findFeaturePairs(no_correlation.dropna(axis=1, how='all').dropna(axis=0, how='all'))
    
    return correlation_dict
In [55]:
# given dataframe iterate to find the correlation pair:

def findFeaturePairs(correlation_matrix):
    arr_pairs = {}
    
    for row in correlation_matrix:
        if row != "RecordID":
            new_df = correlation_matrix[row][correlation_matrix[row].notna()].drop_duplicates(keep='first')
            if "RecordID" not in new_df:
                arr_pairs[frozenset({row, new_df.index[0]})] = new_df[0]

    return arr_pairs
In [56]:
import operator
def printFeaturePairs(correlation_dict):
    # loop each dictionary
    for idx, dictionary in correlation_dict.items():
        print("====================",idx, "====================")
        if '_negative' in idx:
            sorted_d = sorted(dictionary.items(), key=operator.itemgetter(1))
        else:
            sorted_d = sorted(dictionary.items(), key=operator.itemgetter(1), reverse=True)
            
        for _set in sorted_d:
            print("Feature pair {} with corr. coeff. {}".format([feature for feature in _set[0]] , round(_set[1], 3)))

        print("====================",idx, "====================\n")
In [57]:
correlation_dict = extractCorrelatedFeatures(correlation_matrix_earliest, levels) 
printFeaturePairs(correlation_dict)
==================== high_positive ====================
Feature pair ['AST', 'ALT'] with corr. coeff. 0.902
Feature pair ['NIDiasABP', 'NIMAP'] with corr. coeff. 0.901
Feature pair ['DiasABP', 'SysABP'] with corr. coeff. 0.836
Feature pair ['NIMAP', 'NISysABP'] with corr. coeff. 0.819
==================== high_positive ====================

==================== high_negative ====================
==================== high_negative ====================

==================== mid_positive ====================
Feature pair ['BUN', 'Creatinine'] with corr. coeff. 0.683
Feature pair ['NIDiasABP', 'NISysABP'] with corr. coeff. 0.595
Feature pair ['PaCO2', 'HCO3'] with corr. coeff. 0.486
Feature pair ['Cholesterol', 'Albumin'] with corr. coeff. 0.412
==================== mid_positive ====================

==================== mid_negative ====================
Feature pair ['GCS', 'MechVent'] with corr. coeff. -0.659
Feature pair ['GCS', 'PaO2'] with corr. coeff. -0.417
==================== mid_negative ====================

==================== low_positive ====================
==================== low_positive ====================

==================== low_negative ====================
Feature pair ['Length_of_stay', 'Albumin'] with corr. coeff. -0.248
Feature pair ['Age', 'ICUType'] with corr. coeff. -0.187
Feature pair ['RespRate', 'Gender'] with corr. coeff. -0.08
Feature pair ['Length_of_stay', 'HCT'] with corr. coeff. -0.075
Feature pair ['In-hospital_death', 'Weight'] with corr. coeff. -0.051
Feature pair ['Length_of_stay', 'TroponinT'] with corr. coeff. -0.047
Feature pair ['Length_of_stay', 'Urine'] with corr. coeff. -0.047
Feature pair ['Length_of_stay', 'TroponinI'] with corr. coeff. -0.029
Feature pair ['ALP', 'Gender'] with corr. coeff. -0.028
Feature pair ['Length_of_stay', 'ALT'] with corr. coeff. -0.027
Feature pair ['Length_of_stay', 'AST'] with corr. coeff. -0.027
Feature pair ['Age', 'MechVent'] with corr. coeff. -0.026
Feature pair ['Length_of_stay', 'FiO2'] with corr. coeff. -0.024
Feature pair ['NIDiasABP', 'Length_of_stay'] with corr. coeff. -0.021
Feature pair ['In-hospital_death', 'Gender'] with corr. coeff. -0.016
Feature pair ['WBC', 'Gender'] with corr. coeff. -0.015
Feature pair ['Length_of_stay', 'DiasABP'] with corr. coeff. -0.011
Feature pair ['Length_of_stay', 'SysABP'] with corr. coeff. -0.01
Feature pair ['Platelets', 'In-hospital_death'] with corr. coeff. -0.01
Feature pair ['Temp', 'Length_of_stay'] with corr. coeff. -0.005
==================== low_negative ====================

==================== no ====================
==================== no ====================

Observation:

  • There is no clear strong correlation between the features with target variables.
  • Therefore, features need to be further processed to increase.

However, the correlated features can be identified to process them together to prevent multicollinearity

Objective: Use Linear Regression model to find correlated features with target length of stay applied to each aggregation type

In [58]:
from statsmodels.formula.api import ols
def printBestFeaturesForRegression(data, features):
    betas = []
    beta_stds = []
    pvalues = []
    R2s = []
    best_R2_model = None
    for f in features:
        if f == 'Length_of_stay': continue
        res = ols(formula='Length_of_stay ~ {}'.format(f), data=data).fit()
        betas.append(res.params[f])
        beta_stds.append(res.bse[f])
        pvalues.append(res.pvalues[f])
        R2s.append(res.rsquared)
    #     print(res.rsquared)
        if res.rsquared >= np.max(R2s):
            best_R2_model = res

    slr_res = pd.DataFrame(data={'β':np.round(betas,4), 'β_std': np.round(beta_stds,4), 
                                 'p-value':np.round(pvalues,4), 'R-squared':np.round(R2s,4)}, 
                           index=features)
    print('Result of Simple linear Regression')
    print(slr_res[['β', 'β_std', 'p-value','R-squared']])
    print('\nFeatures with significant correlation')
    print(slr_res.index[slr_res['p-value']<0.05])
    print("\npredictor with maximum R-squared is", slr_res['R-squared'].idxmax())
In [59]:
features = ['Age', 'Gender','Height', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'TroponinT', 'Urine', 'WBC', 'Lactate',
       'DiasABP', 'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2', 'SysABP',
       'pH', 'ALP', 'ALT', 'AST', 'Albumin', 'Bilirubin', 'TroponinI',
       'Cholesterol']

printBestFeaturesForRegression(all_features_df__earliest, features)
Result of Simple linear Regression
                  β   β_std  p-value  R-squared
Age         -0.0241  0.0110   0.0281     0.0012
Gender       0.2192  0.3890   0.5731     0.0001
Height       0.0046  0.0137   0.7379     0.0001
ICUType      1.5346  0.1909   0.0000     0.0159
Weight       0.0199  0.0086   0.0209     0.0015
BUN          0.0374  0.0089   0.0000     0.0045
Creatinine   0.4439  0.1305   0.0007     0.0029
GCS         -0.2245  0.0395   0.0000     0.0081
Glucose      0.0017  0.0026   0.4962     0.0001
HCO3        -0.1472  0.0421   0.0005     0.0031
HCT         -0.1515  0.0322   0.0000     0.0056
HR           0.0764  0.0096   0.0000     0.0157
K           -0.3154  0.2721   0.2465     0.0003
Mg          -1.0695  0.4087   0.0089     0.0018
NIDiasABP   -0.0147  0.0119   0.2165     0.0004
NIMAP       -0.0160  0.0115   0.1617     0.0006
NISysABP    -0.0154  0.0074   0.0374     0.0012
Na          -0.0384  0.0414   0.3534     0.0002
Platelets    0.0007  0.0017   0.6685     0.0000
RespRate     0.0666  0.0348   0.0557     0.0033
Temp        -0.0233  0.0684   0.7331     0.0000
TroponinT   -0.1896  0.1372   0.1673     0.0022
Urine       -0.0013  0.0004   0.0037     0.0022
WBC          0.0277  0.0248   0.2643     0.0003
Lactate      0.2081  0.1340   0.1204     0.0011
DiasABP     -0.0073  0.0125   0.5603     0.0001
FiO2        -1.2499  0.9978   0.2104     0.0006
MAP         -0.0071  0.0089   0.4268     0.0002
MechVent     3.9246  0.3952   0.0000     0.0241
PaCO2       -0.0076  0.0220   0.7317     0.0000
PaO2        -0.0074  0.0019   0.0001     0.0051
SaO2        -0.0624  0.0694   0.3683     0.0005
SysABP      -0.0033  0.0064   0.6012     0.0001
pH           0.0576  0.0729   0.4293     0.0002
ALP          0.0041  0.0030   0.1800     0.0011
ALT         -0.0006  0.0005   0.2560     0.0008
AST         -0.0004  0.0004   0.2672     0.0007
Albumin     -5.1325  0.4990   0.0000     0.0616
Bilirubin    0.2146  0.0759   0.0048     0.0046
TroponinI   -0.0299  0.0732   0.6835     0.0008
Cholesterol -0.0183  0.0134   0.1713     0.0062

Features with significant correlation
Index(['Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'HCO3', 'HCT',
       'HR', 'Mg', 'NISysABP', 'Urine', 'MechVent', 'PaO2', 'Albumin',
       'Bilirubin'],
      dtype='object')

predictor with maximum R-squared is Albumin

In the presence and level of the variables when patient is admitted into the ICU, Albumin level is a predictor with the maximum R-Squared with Length of Stay.

The variables that are measured in the earlier part of the ICU admission:

  • 'Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'HCO3', 'HCT', 'HR', 'Mg', 'NISysABP', 'Urine', 'MechVent', 'PaO2', 'Albumin','Bilirubin'

  • has significant correlation with Length of Stay in terms of patients' condition when admitted into the ICUType

In [60]:
features = ['Age', 'Gender','Height', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'TroponinT', 'Urine', 'WBC', 'Lactate',
       'DiasABP', 'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2', 'SysABP',
       'pH', 'ALP', 'ALT', 'AST', 'Albumin', 'Bilirubin', 'TroponinI',
       'Cholesterol']

printBestFeaturesForRegression(all_features_df__most_recent, features)
Result of Simple linear Regression
                  β   β_std  p-value  R-squared
Age         -0.0241  0.0110   0.0281     0.0012
Gender       0.2192  0.3890   0.5731     0.0001
Height       0.0046  0.0137   0.7379     0.0001
ICUType      1.5346  0.1909   0.0000     0.0159
Weight       0.0199  0.0086   0.0209     0.0015
BUN          0.0438  0.0114   0.0001     0.0058
Creatinine   0.4140  0.1697   0.0148     0.0024
GCS         -0.7832  0.0674   0.0000     0.0513
Glucose      0.0179  0.0060   0.0029     0.0036
HCO3        -0.1871  0.0550   0.0007     0.0046
HCT         -0.3932  0.0520   0.0000     0.0223
HR           0.0360  0.0139   0.0094     0.0027
K            0.3215  0.4961   0.5170     0.0002
Mg          -0.6593  0.8166   0.4195     0.0003
NIDiasABP   -0.0488  0.0170   0.0040     0.0036
NIMAP       -0.0557  0.0168   0.0009     0.0048
NISysABP    -0.0308  0.0102   0.0027     0.0039
Na          -0.0902  0.0576   0.1175     0.0010
Platelets   -0.0059  0.0023   0.0117     0.0025
RespRate    -0.0573  0.0411   0.1640     0.0022
Temp        -0.0305  0.1503   0.8391     0.0000
TroponinT   -0.3745  0.2227   0.0932     0.0046
Urine       -0.0062  0.0014   0.0000     0.0074
WBC          0.0211  0.0413   0.6090     0.0001
Lactate     -0.2189  0.2411   0.3641     0.0006
DiasABP      0.0221  0.0206   0.2844     0.0007
FiO2        -2.2920  2.1368   0.2836     0.0007
MAP         -0.0045  0.0169   0.7910     0.0000
MechVent     2.7335  0.4046   0.0000     0.0113
PaCO2       -0.0391  0.0353   0.2677     0.0007
PaO2        -0.0032  0.0061   0.6054     0.0002
SaO2        -0.1123  0.1040   0.2809     0.0014
SysABP      -0.0031  0.0117   0.7929     0.0000
pH           0.0080  0.1094   0.9418     0.0000
ALP          0.0019  0.0036   0.5969     0.0002
ALT         -0.0000  0.0005   0.9474     0.0000
AST         -0.0001  0.0004   0.8702     0.0000
Albumin     -5.4225  0.6062   0.0000     0.0676
Bilirubin    0.1819  0.0914   0.0467     0.0034
TroponinI   -0.1311  0.1091   0.2318     0.0114
Cholesterol -0.0255  0.0132   0.0538     0.0159

Features with significant correlation
Index(['Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Platelets',
       'Urine', 'MechVent', 'Albumin', 'Bilirubin'],
      dtype='object')

predictor with maximum R-squared is Albumin

In monitoring patients' health by the end of 48 hours, Albumin level is a predictor with the maximum R-Squared with Length of Stay.

The Variables of patients' health by the end of 48 hours:

  • 'Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose','HCO3', 'HCT', 'HR', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Platelets','Urine', 'MechVent', 'Albumin', 'Bilirubin'

  • has significant correlation with Length of Stay in terms of patients' health by the end of 48 hours

In [61]:
features = ['Age', 'Gender','Height', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'TroponinT', 'Urine', 'WBC', 'Lactate',
       'DiasABP', 'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SaO2', 'SysABP',
       'pH', 'ALP', 'ALT', 'AST', 'Albumin', 'Bilirubin', 'TroponinI',
       'Cholesterol']

printBestFeaturesForRegression(all_features_df__freq, features)
Result of Simple linear Regression
                  β   β_std  p-value  R-squared
Age         -0.0241  0.0110   0.0281     0.0012
Gender       0.2192  0.3890   0.5731     0.0001
Height       0.0046  0.0137   0.7379     0.0001
ICUType      1.5346  0.1909   0.0000     0.0159
Weight       0.0199  0.0086   0.0209     0.0015
BUN          1.0436  0.1202   0.0000     0.0192
Creatinine   1.0091  0.1190   0.0000     0.0183
GCS          0.0606  0.0256   0.0179     0.0015
Glucose      0.9342  0.1141   0.0000     0.0174
HCO3         0.9920  0.1192   0.0000     0.0177
HCT          0.4849  0.0781   0.0000     0.0099
HR           0.1019  0.0136   0.0000     0.0144
K            0.8125  0.1066   0.0000     0.0150
Mg           1.1113  0.1162   0.0000     0.0235
NIDiasABP   -0.0697  0.0106   0.0000     0.0125
NIMAP       -0.0697  0.0107   0.0000     0.0122
NISysABP    -0.0696  0.0106   0.0000     0.0125
Na           0.9502  0.1083   0.0000     0.0197
Platelets    0.8950  0.1063   0.0000     0.0181
RespRate     0.0477  0.0154   0.0020     0.0089
Temp         0.0609  0.0113   0.0000     0.0075
TroponinT   -0.0566  0.3054   0.8530     0.0000
Urine        0.1146  0.0178   0.0000     0.0109
WBC          1.2120  0.1301   0.0000     0.0221
Lactate      0.6369  0.0814   0.0000     0.0279
DiasABP      0.0608  0.0120   0.0000     0.0093
FiO2         0.3279  0.0435   0.0000     0.0210
MAP          0.0511  0.0115   0.0000     0.0072
MechVent     0.3277  0.0250   0.0000     0.0413
PaCO2        0.3245  0.0440   0.0000     0.0181
PaO2         0.3302  0.0441   0.0000     0.0187
SaO2         0.1209  0.0790   0.1261     0.0013
SysABP       0.0606  0.0120   0.0000     0.0092
pH           0.2805  0.0426   0.0000     0.0144
ALP          0.7130  0.2569   0.0056     0.0046
ALT          0.7311  0.2492   0.0034     0.0051
AST          0.7433  0.2488   0.0029     0.0052
Albumin      0.8738  0.3945   0.0269     0.0031
Bilirubin    0.7776  0.2529   0.0021     0.0056
TroponinI    1.1327  0.5997   0.0604     0.0179
Cholesterol  0.2456  3.5162   0.9444     0.0000

Features with significant correlation
Index(['Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose',
       'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'Platelets', 'RespRate', 'Temp', 'Urine', 'WBC', 'Lactate', 'DiasABP',
       'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SysABP', 'pH', 'ALP',
       'ALT', 'AST', 'Albumin', 'Bilirubin'],
      dtype='object')

predictor with maximum R-squared is MechVent

With the frequency distribution of monitoring patients' health signs, MechVent is a predictor with maximum R-squared for length of stay.

The Frequency of monitoring the variables:

  • 'Age', 'ICUType', 'Weight', 'BUN', 'Creatinine', 'GCS', 'Glucose', 'HCO3', 'HCT', 'HR', 'K', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na', 'Platelets', 'RespRate', 'Temp', 'Urine', 'WBC', 'Lactate', 'DiasABP', 'FiO2', 'MAP', 'MechVent', 'PaCO2', 'PaO2', 'SysABP', 'pH', 'ALP', 'ALT', 'AST', 'Albumin', 'Bilirubin'
  • has significant correlation with Length of Stay in terms of frequency of monitoring each of the variables

2. Feature Engineering

Objective: To convert the temporal data to a single feature vector per patient to obtain the Design Matrix for modelling

Design Matrix 1

Physiological Features

Physiology is the scientific study of the functions and mechanisms which work within a living system. There are 12 physiological features that are tracked annd recorded within the first 24 hours of ICU admission as stated in a journal30024-X.pdf).

These features are:

  • Heart Rate
  • Temperature
  • Systolic Blood Pressure
  • Bicarbonate
  • Glasgow Coma Scale
  • White Blood Cell Count
  • Partial Pressure of Arterial Oxygen in Inspired Air
  • Urine Output
  • Bilirubin Concentration
  • Potassium Concentrations
  • Serum Urea
  • Sodium

Below lists the occurence of each paramater within the respective timeframe for a patient:

In [62]:
physiological_feats = ["Bilirubin", "BUN", "GCS", "HCO3", "HR", "K", "Na", "PaO2", "SysABP", "Temp", "Urine", "WBC"]
X = all_static_dfs
df_tmp = pd.DataFrame()

data = all_patients['132540']
idx = data['Parameter'].isin(physiological_feats)
df_phy_temp = data.loc[idx, :]

Below list a dataframe of a consolidated count for each physiological parameter for each patient.

In [63]:
df_phy_par = df_phy_temp.groupby(['Parameter'])['Value'] 
df_phy_par.describe()
Out[63]:
count mean std min 25% 50% 75% max
Parameter
BUN 3.0 18.333333 2.516611 16.0 17.00 18.00 19.5 21.0
GCS 15.0 13.333333 3.265986 3.0 13.50 15.00 15.0 15.0
HCO3 3.0 22.333333 1.527525 21.0 21.50 22.00 23.0 24.0
HR 68.0 80.794118 6.739411 65.0 80.00 80.00 88.0 90.0
K 2.0 3.900000 0.565685 3.5 3.70 3.90 4.1 4.3
Na 2.0 137.000000 2.828427 135.0 136.00 137.00 138.0 139.0
PaO2 7.0 210.142857 136.881248 82.0 114.50 153.00 281.0 445.0
SysABP 68.0 113.411765 16.338979 66.0 104.50 116.50 125.0 138.0
Temp 46.0 36.939130 0.986234 34.5 36.70 37.45 37.6 37.9
Urine 41.0 151.560976 161.509760 0.0 50.00 90.00 220.0 770.0
WBC 3.0 11.266667 3.350124 7.4 10.25 13.10 13.2 13.3
In [64]:
physiological_feats.append("In-hospital_death") # to add target variable for data exploration
In [65]:
plt.rcParams.update({'font.size': 14})

Objective: To examine the correlation of physiological features based on frequency

In [66]:
df_phy_freq = all_features_df__freq[physiological_feats]
#df_phy_freq

correlation_matrix_phy_freq = df_phy_freq.corr()
fig, ax = plt.subplots(figsize=(12,12)) 
sns.heatmap(data=correlation_matrix_phy_freq, cmap='seismic', annot=True, vmin=-1, vmax=1)
plt.show()

Objective: To examine the correlation of physiological features based on most recent

In [67]:
df_phy_most_recent = all_features_df__most_recent[physiological_feats]
#df_phy_most_recent

correlation_matrix_phy_most_recent = df_phy_most_recent.corr()
fig, ax = plt.subplots(figsize=(12,12)) 
sns.heatmap(data=correlation_matrix_phy_most_recent, cmap='seismic', annot=True, vmin=-1, vmax=1)
plt.show()

Reconsidering Physiological Variables

Based on the findings on the correlation matrix between physiological features, the aggregators to be considered for the design matrix between freq and most_recent. Most of the features have a high correlation when using the freq aggregator, thus, the most of the features used for the design matrix are most_recent features.

freq - Frequency:

  • Bicarbonate (mmol/L)
  • Urine Output (mL)

most_recent - Most Recent:

  • Heart Rate
  • Temperature
  • Systolic Blood Pressure
  • Glasgow Coma Scale
  • White Blood Cell Count
  • Partial Pressure of Arterial Oxygen in Inspired Air
  • Bilirubin Concentration
  • Potassium Concentrations
  • Serum Urea
  • Sodium
In [68]:
#Fix values with wrong inputs in most recent dataframes for physiological variables.
def replace_value(df, c, value=np.nan, below=None, above=None):
    idx = c
    
    if below is not None:
        idx = df[c] < below
        
    if above is not None:
        idx = df[c] > above
    
    
    if 'function' in str(type(value)):
        # value replacement is a function of the input
        df.loc[idx, c] = df.loc[idx, c].apply(value)
    else:
        df.loc[idx, c] = value
        
    return df
In [69]:
df_phy_most_recent = replace_value(df_phy_most_recent, 'SysABP', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'HR', value=np.nan, below=1)
df_phy_most_recent = replace_value(df_phy_most_recent, 'HR', value=np.nan, above=299)

df_phy_most_recent = replace_value(df_phy_most_recent, 'PaO2', value=np.nan, below=1)
df_phy_most_recent = replace_value(df_phy_most_recent, 'PaO2', value=lambda x: x*10, below=20)

df_phy_most_recent = replace_value(df_phy_most_recent, 'Temp', value=np.nan, below=25)
df_phy_most_recent = replace_value(df_phy_most_recent, 'Temp', value=np.nan, above=45)

df_phy_most_recent = replace_value(df_phy_most_recent, 'WBC', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'HCO3', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'Urine', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'BUN', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'Na', value=np.nan, below=1)

df_phy_most_recent = replace_value(df_phy_most_recent, 'Bilirubin', value=np.nan, below=0)
#df_phy_most_recent
In [70]:
phy_feats_freq = ["HCO3", "Urine"]
phy_feats_most_recent = ["HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]
df_phy_freq_v2 = df_phy_freq[phy_feats_freq]
df_phy_most_recent_v2 = df_phy_most_recent[phy_feats_most_recent]
df_phy_v2 = df_phy_freq_v2.join(df_phy_most_recent_v2)
#df_phy_v2
In [71]:
correlation_matrix_phy_v2 = df_phy_v2.corr()
fig, ax = plt.subplots(figsize=(12,12)) 
sns.heatmap(data=correlation_matrix_phy_v2, cmap='seismic', annot=True, vmin=-1, vmax=1)
plt.show()
In [72]:
#This is to retrieve the median values for each feature.
phy_median = pd.DataFrame(data=df_phy_v2.median(), columns=['Median'])
In [73]:
phy_feats_freq = ['RecordID', "HCO3", "Urine"]
phy_feats_most_recent = ["HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]
phy_feats = ["HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]
In [74]:
#Merge required frequency and most recent values from respective dictionaries to form design matrix 1.
design_matrix_1 = {}
design_matrix_1_freq = {}
design_matrix_1_most_recent = {}

for key, ids_list in all_temporal_dfs_folds__most_recent.items():
    
    design_matrix_1[key] = pd.DataFrame()

    design_matrix_1_freq[key] = all_temporal_dfs_folds__freq[key][phy_feats_freq]
    design_matrix_1_most_recent[key] = all_temporal_dfs_folds__most_recent[key][phy_feats_most_recent]
    design_matrix_1[key] = design_matrix_1_freq[key].join(design_matrix_1_most_recent[key])

#design_matrix_1
In [75]:
# Replace NaN with median values for each parameter
# Replace impractical values that are out of range

for key, ids_list in design_matrix_1.items():
    for col in phy_feats:
        
        median_value = phy_median.loc[col]['Median']
        design_matrix_1[key][col].fillna(median_value, inplace=True)
        
design_matrix_1
Out[75]:
{'Fold1':      RecordID  HCO3  Urine     HR  Bilirubin   BUN   GCS    K     Na   PaO2  \
 0    132539.0   2.0   38.0   86.0        0.7   8.0  15.0  4.0  136.0  108.0   
 1    132540.0   3.0   41.0   65.0        0.7  21.0  15.0  3.5  135.0  140.0   
 2    132541.0   3.0   41.0   71.0        2.8   3.0   5.0  3.7  138.0  173.0   
 3    132543.0   3.0    6.0   79.0        0.2  10.0  15.0  3.8  137.0  108.0   
 4    132545.0   2.0   38.0   68.0        0.7  25.0  15.0  4.1  139.0  108.0   
 5    132547.0   4.0   30.0   92.0        0.4  16.0   8.0  3.9  136.0  116.0   
 6    132548.0   3.0   34.0   60.0        0.7  36.0  15.0  4.4  138.0  108.0   
 7    132551.0   5.0   37.0   58.0        0.3  58.0   9.0  4.5  137.0   94.0   
 8    132554.0   2.0    5.0  122.0        0.7  23.0  15.0  4.5  139.0  108.0   
 9    132555.0   3.0   47.0   78.0        0.7  22.0  15.0  4.1  139.0  102.0   
 10   132556.0   3.0   35.0   91.0        0.1  55.0  15.0  3.8  136.0  108.0   
 11   132567.0   2.0   47.0   95.0        0.7   9.0  15.0  4.1  135.0  102.0   
 12   132568.0   2.0   32.0   93.0        0.7  16.0  15.0  3.7  138.0  108.0   
 13   132570.0   3.0   28.0   73.0        0.1  89.0  15.0  3.5  137.0  108.0   
 14   132573.0   2.0   30.0   68.0        0.7  40.0  15.0  4.1  137.0  108.0   
 15   132575.0   3.0   48.0  106.0        0.7  18.0  15.0  4.1  135.0   83.0   
 16   132577.0   9.0   32.0   88.0        0.7  47.0  15.0  4.4  153.0   80.0   
 17   132582.0   3.0   34.0   83.0        0.7  32.0  15.0  4.1  141.0  108.0   
 18   132584.0   7.0   35.0   73.0        1.6  24.0  11.0  3.7  149.0   86.0   
 19   132585.0   3.0   43.0   92.0        0.7   7.0  15.0  3.5  136.0  113.0   
 20   132588.0   2.0   38.0   78.0        8.0   5.0  15.0  3.8  133.0  108.0   
 21   132590.0   3.0   48.0   88.0        0.7  13.0  15.0  4.1  135.0  124.0   
 22   132591.0   2.0   36.0   61.0        0.7  32.0  15.0  4.3  136.0  108.0   
 23   132592.0   8.0   36.0   82.0        0.7  35.0  15.0  4.0  140.0  108.0   
 24   132595.0   4.0   38.0   85.0        0.7   8.0  15.0  4.7  139.0  165.0   
 25   132597.0   2.0   38.0   65.0        0.7  20.0  15.0  4.4  137.0  108.0   
 26   132598.0   3.0   40.0   72.0        0.7  22.0   8.0  4.0  146.0  126.0   
 27   132599.0   4.0   46.0   94.0        2.0  12.0  14.0  3.7  139.0  184.0   
 28   132601.0   3.0   45.0   95.0        0.7  21.0  15.0  5.6  138.0  152.0   
 29   132602.0   2.0   11.0   85.0        0.7  29.0  15.0  3.8  137.0   61.0   
 ..        ...   ...    ...    ...        ...   ...   ...  ...    ...    ...   
 970  134999.0   4.0   13.0  121.0        0.5  22.0  10.0  3.7  136.0  142.0   
 971  135002.0   3.0   45.0   58.0        0.7  12.0   3.0  5.1  146.0  108.0   
 972  135004.0   5.0   27.0   89.0        1.1  20.0  15.0  5.1  136.0   92.0   
 973  135006.0   3.0   23.0  119.0        0.7  12.0   7.0  3.8  138.0  120.0   
 974  135007.0   5.0   47.0   65.0        0.6  21.0  15.0  4.2  142.0   79.0   
 975  135009.0   5.0   40.0  100.0        0.7  24.0  11.0  3.3  145.0  176.0   
 976  135011.0   3.0   26.0   89.0        0.3  33.0  10.0  4.3  139.0  100.0   
 977  135013.0   4.0   48.0   89.0        0.7  19.0  15.0  4.3  134.0   93.0   
 978  135014.0   4.0   49.0  110.0        0.7  14.0  12.0  3.8  140.0  119.0   
 979  135015.0   3.0   19.0   72.0        0.7  31.0  15.0  3.9  141.0  108.0   
 980  135020.0   6.0   28.0   62.0        0.5  12.0  15.0  3.9  140.0   95.0   
 981  135021.0   3.0   36.0   69.0        0.7  19.0   7.0  4.0  139.0  108.0   
 982  135027.0   3.0   34.0   89.0        0.7  19.0   6.0  3.8  141.0  166.0   
 983  135028.0   3.0   44.0   90.0        0.3  16.0   8.0  4.2  141.0  119.0   
 984  135031.0   3.0   45.0   75.0        0.5   6.0   9.0  4.0  140.0   80.0   
 985  135036.0   3.0   35.0   96.0        0.7   4.0  15.0  3.6  143.0  391.0   
 986  135044.0   4.0   37.0   71.0        0.7  61.0  15.0  5.0  140.0   96.0   
 987  135048.0   7.0   36.0  132.0        0.6  30.0   3.0  4.4  138.0  119.0   
 988  135049.0   3.0   43.0   83.0        0.7   8.0  15.0  3.7  140.0  195.0   
 989  135051.0   2.0   35.0   82.0        0.7   8.0   9.0  3.6  141.0  148.0   
 990  135052.0   4.0   26.0  118.0        0.5  19.0  14.0  3.1  143.0  108.0   
 991  135056.0   3.0    8.0   85.0        0.7  19.0  15.0  4.0  139.0  108.0   
 992  135057.0   3.0   33.0   76.0        0.7  19.0   9.0  4.0  139.0   99.0   
 993  135059.0   3.0   38.0   85.0        0.7  29.0  15.0  4.5  136.0  108.0   
 994  135065.0   3.0   18.0   77.0        0.7  18.0  15.0  3.8  142.0   93.0   
 995  135067.0   3.0   38.0   56.0        0.8  33.0  11.0  4.0  146.0  108.0   
 996  135069.0   3.0   17.0   88.0        0.7  21.0  15.0  4.1  140.0   70.0   
 997  135071.0   6.0   39.0   67.0        0.7  59.0  13.0  5.0  137.0   62.0   
 998  135072.0   3.0   47.0  102.0        0.7  25.0  15.0  4.7  133.0  154.0   
 999  135075.0   2.0   45.0   80.0        0.7  17.0  15.0  4.3  127.0   99.0   
 
      SysABP  Temp   WBC  
 0     123.0  37.8   9.4  
 1     103.0  37.1  13.3  
 2     126.0  37.2   6.2  
 3     123.0  37.0   7.9  
 4     123.0  36.7   4.8  
 5      91.0  37.3  13.3  
 6     148.0  36.4   6.2  
 7     126.0  36.6  23.5  
 8     123.0  36.8  15.2  
 9     134.0  37.4  11.8  
 10    123.0  36.1  21.7  
 11    107.0  37.3   9.0  
 12    123.0  37.0  14.1  
 13    123.0  37.7  12.0  
 14    123.0  37.0  11.6  
 15    128.0  36.5  12.5  
 16    140.0  37.7  10.3  
 17    123.0  36.1   3.3  
 18    112.0  37.2  12.9  
 19     95.0  37.9   9.6  
 20    123.0  37.0   6.5  
 21     97.0  36.8  13.6  
 22    123.0  36.3  12.2  
 23    123.0  36.8   9.7  
 24    123.0  37.0   5.8  
 25    123.0  37.2  13.4  
 26    146.0  36.4  20.2  
 27    123.0  36.7   9.4  
 28    116.0  36.7  15.8  
 29    123.0  36.9  14.1  
 ..      ...   ...   ...  
 970   106.0  39.0   6.1  
 971   123.0  36.8  12.7  
 972   120.0  36.2   9.5  
 973   114.0  37.2  18.2  
 974   113.0  37.1  10.0  
 975   144.0  37.4  14.0  
 976   135.0  36.9   8.0  
 977    97.0  37.3  14.1  
 978   120.0  37.3  19.1  
 979   123.0  35.6  31.2  
 980   131.0  36.9  12.4  
 981   130.0  37.2  10.8  
 982   110.0  37.7   7.5  
 983    95.0  37.9  19.0  
 984   119.0  37.1  17.2  
 985   175.0  38.1   9.3  
 986   126.0  36.0  28.4  
 987   123.0  38.3  15.9  
 988   145.0  37.6  10.4  
 989    89.0  37.1  10.4  
 990   123.0  37.3  27.5  
 991   123.0  36.7  10.8  
 992   113.0  36.8  10.8  
 993   123.0  37.0   5.8  
 994    89.0  35.9  13.8  
 995   123.0  37.1   9.0  
 996    92.0  36.3   5.6  
 997   112.0  36.2   9.4  
 998   125.0  36.3   9.7  
 999   130.0  36.6  16.5  
 
 [1000 rows x 13 columns],
 'Fold2':      RecordID  HCO3  Urine     HR  Bilirubin    BUN   GCS    K     Na   PaO2  \
 0    135076.0   4.0   26.0   64.0        1.7   17.0   9.0  4.2  145.0   74.0   
 1    135077.0   6.0   47.0  100.0        0.7   39.0  15.0  4.8  140.0   76.0   
 2    135079.0   3.0   50.0   86.0        0.7   49.0  15.0  4.9  138.0   61.0   
 3    135080.0   2.0   46.0   62.0        0.7   13.0  15.0  4.0  139.0   91.0   
 4    135081.0   3.0   43.0   68.0        0.7   12.0  15.0  4.3  139.0  182.0   
 5    135083.0   2.0    9.0   65.0        0.3   13.0  15.0  3.7  141.0  108.0   
 6    135084.0   2.0   11.0   72.0        0.7   28.0  15.0  4.6  138.0  108.0   
 7    135086.0   4.0   37.0   80.0        1.0   34.0  10.0  4.2  144.0  159.0   
 8    135087.0   3.0   31.0  148.0        0.7   16.0  15.0  4.2  136.0  249.0   
 9    135088.0   5.0   43.0   80.0        0.9   25.0   9.0  4.9  138.0   68.0   
 10   135089.0   5.0   44.0  106.0        0.3   38.0  15.0  4.0  145.0  108.0   
 11   135092.0   4.0   14.0   95.0        0.7   23.0  15.0  4.2  130.0   78.0   
 12   135093.0   6.0   34.0  118.0        0.9   34.0   6.0  3.8  139.0   96.0   
 13   135098.0   1.0   44.0   79.0        0.7   10.0  15.0  4.9  133.0  129.0   
 14   135102.0   2.0   25.0   55.0        0.5   17.0  10.0  3.8  147.0  108.0   
 15   135103.0   3.0   42.0   85.0        0.7   24.0  15.0  4.5  134.0   82.0   
 16   135104.0   2.0   16.0   93.0        0.7   14.0  15.0  3.8  141.0  108.0   
 17   135105.0   3.0   43.0  109.0        0.7   76.0  15.0  4.1  140.0   82.0   
 18   135107.0   2.0   32.0   81.0        0.3   24.0  14.0  3.7  141.0  108.0   
 19   135110.0   3.0   31.0   65.0        0.2   41.0  15.0  3.6  145.0  108.0   
 20   135111.0   1.0    8.0   50.0        0.7   22.0  15.0  4.2  140.0  171.0   
 21   135115.0   3.0   42.0   62.0        0.5   19.0  10.0  4.5  138.0   92.0   
 22   135116.0   3.0   38.0   85.0        0.9    5.0  15.0  4.2  133.0  108.0   
 23   135127.0   2.0   37.0  110.0        0.7   14.0  14.0  4.0  139.0  115.0   
 24   135129.0   2.0   41.0   84.0        0.7   11.0  15.0  4.6  130.0   99.0   
 25   135130.0   3.0   43.0   94.0        0.6   10.0   6.0  3.9  137.0   83.0   
 26   135135.0   1.0    7.0   60.0        0.4   25.0  13.0  4.5  138.0  108.0   
 27   135141.0   3.0    7.0   43.0        0.7   25.0  15.0  3.7  138.0  108.0   
 28   135142.0   3.0   30.0   77.0       10.9    9.0   7.0  3.8  135.0  147.0   
 29   135145.0   4.0   40.0  101.0        1.0  137.0  15.0  4.8  138.0  179.0   
 ..        ...   ...    ...    ...        ...    ...   ...  ...    ...    ...   
 970  137537.0   2.0   31.0   87.0        1.9   30.0  15.0  3.4  141.0  108.0   
 971  137538.0   2.0   31.0  125.0        0.7   28.0  15.0  4.9  138.0  104.0   
 972  137542.0   5.0   49.0   72.0        1.5    7.0   4.0  3.5  142.0   92.0   
 973  137545.0   2.0   43.0   98.0        0.7   14.0  15.0  4.3  135.0   75.0   
 974  137548.0   2.0   21.0   94.0        0.7   13.0  15.0  3.9  142.0  142.0   
 975  137549.0   1.0   30.0  113.0        0.5   23.0   3.0  2.5  144.0  108.0   
 976  137552.0   4.0   25.0  103.0        0.7   20.0  15.0  3.4  139.0   73.0   
 977  137556.0   2.0   15.0   74.0        0.7    8.0  15.0  3.9  141.0  108.0   
 978  137562.0   2.0    8.0  101.0        1.1   21.0  15.0  3.7  140.0   97.0   
 979  137563.0   2.0   46.0   76.0        0.7   37.0   6.0  4.5  129.0  109.0   
 980  137564.0   3.0   27.0   59.0        0.8   10.0   7.0  4.3  144.0   98.0   
 981  137567.0   2.0   43.0   80.0        0.7   18.0  15.0  4.4  134.0   82.0   
 982  137568.0   2.0   23.0   74.0        0.7    6.0  15.0  3.1  138.0  108.0   
 983  137569.0   6.0   45.0  110.0        0.8   21.0   8.0  4.5  136.0   75.0   
 984  137570.0   2.0   29.0  116.0        0.7    9.0  15.0  3.9  135.0  105.0   
 985  137573.0   2.0   46.0   55.0        0.7   17.0  15.0  4.3  140.0   86.0   
 986  137576.0   5.0   50.0  110.0        0.7    6.0   3.0  3.8  146.0  192.0   
 987  137577.0   8.0   40.0   87.0        0.4   51.0   3.0  4.1  141.0   87.0   
 988  137578.0   4.0   46.0   93.0        1.4   37.0  15.0  3.4  146.0  113.0   
 989  137579.0   4.0   41.0   73.0        0.6   20.0  15.0  3.9  136.0   78.0   
 990  137580.0   3.0   39.0   91.0        0.7   32.0  15.0  5.3  136.0  142.0   
 991  137581.0   3.0   32.0   69.0        1.3   16.0  15.0  3.1  131.0  108.0   
 992  137583.0   3.0   25.0  101.0        0.7   16.0  15.0  4.1  139.0  108.0   
 993  137584.0   6.0   42.0   81.0        2.5   15.0  10.0  3.6  142.0   61.0   
 994  137586.0   2.0   41.0   75.0        0.7   18.0  15.0  3.6  142.0  108.0   
 995  137587.0   3.0   45.0   62.0        0.5   16.0  11.0  4.0  131.0  150.0   
 996  137588.0   4.0   37.0   66.0        0.7   48.0  15.0  3.2  137.0  108.0   
 997  137589.0   3.0   42.0   70.0        0.7   23.0  15.0  4.3  133.0   68.0   
 998  137590.0   2.0   44.0   56.0        0.7   14.0  15.0  3.8  134.0  211.0   
 999  137592.0   3.0   19.0  102.0        0.7   14.0  15.0  4.2  138.0  108.0   
 
      SysABP  Temp    WBC  
 0     144.0  37.6  16.00  
 1     162.0  37.7  13.20  
 2     148.0  36.6   6.60  
 3      71.0  37.3   8.60  
 4     109.0  35.8  10.80  
 5     123.0  37.1   6.70  
 6     123.0  36.6  11.20  
 7     102.0  37.4  28.00  
 8     134.0  38.4   9.40  
 9     133.0  39.0  31.00  
 10    119.0  35.6   7.50  
 11    123.0  37.4   9.00  
 12      0.0  36.8   3.20  
 13    132.0  36.1  16.90  
 14    123.0  35.8   3.20  
 15    101.0  37.2  16.20  
 16    123.0  36.8   8.70  
 17    111.0  36.7  16.80  
 18    123.0  36.7  20.50  
 19    123.0  37.4  15.00  
 20    137.0  36.7  21.10  
 21     94.0  36.6  14.40  
 22    123.0  37.0   0.75  
 23    123.0  37.2  11.50  
 24    159.0  37.0   8.00  
 25    149.0  38.2  35.60  
 26    123.0  35.9   6.90  
 27    123.0  36.9   8.50  
 28    107.0  37.2  10.60  
 29    123.0  35.7  14.70  
 ..      ...   ...    ...  
 970   123.0  37.2  56.40  
 971   114.0  37.3   7.60  
 972   136.0  37.1  17.00  
 973   120.0  37.2  11.10  
 974    97.0  36.6   7.10  
 975   123.0  36.1   4.50  
 976   123.0  37.6   8.20  
 977   123.0  36.9   7.80  
 978   173.0  37.3  10.70  
 979   119.0  37.1  15.20  
 980   114.0  36.3   5.10  
 981   123.0  37.4  10.20  
 982   123.0  36.6   4.90  
 983    99.0  38.3  12.00  
 984   119.0  36.1   8.80  
 985   120.0  36.5   8.50  
 986   148.0  36.6  12.60  
 987    81.0  36.4  23.30  
 988   123.0  36.3  15.20  
 989   143.0  36.3  23.60  
 990    81.0  36.9  13.40  
 991   123.0  37.4  13.70  
 992   123.0  36.7   9.70  
 993   114.0  38.9   7.80  
 994   154.0  36.9   5.90  
 995   137.0  37.6  11.40  
 996   123.0  36.4   9.10  
 997   122.0  37.1  14.80  
 998   150.0  36.4  14.20  
 999   123.0  36.8   5.50  
 
 [1000 rows x 13 columns],
 'Fold3':      RecordID  HCO3  Urine     HR  Bilirubin   BUN   GCS    K     Na   PaO2  \
 0    137593.0   4.0   45.0   93.0        0.7  17.0  15.0  3.8  137.0   97.0   
 1    137594.0   3.0   31.0   61.0        0.7  16.0  11.0  3.8  140.0  108.0   
 2    137595.0   4.0   49.0   89.0        0.6  12.0  11.0  4.3  137.0   88.0   
 3    137598.0   3.0   38.0   95.0        0.7  57.0  11.0  4.4  130.0   98.0   
 4    137600.0   2.0   38.0   88.0        0.7  23.0  14.0  4.2  136.0   96.0   
 5    137602.0   4.0   42.0   96.0        0.4   9.0   9.0  3.7  140.0  130.0   
 6    137604.0   4.0   36.0   69.0        0.3  61.0  14.0  4.4  149.0  122.0   
 7    137606.0   3.0   45.0   89.0        0.7  47.0  15.0  3.5  139.0  108.0   
 8    137609.0   4.0   10.0   58.0        0.7   9.0  15.0  4.3  131.0  108.0   
 9    137619.0   4.0   39.0   93.0        0.7  12.0   7.0  4.0  125.0  140.0   
 10   137624.0   1.0    7.0   72.0        0.7  12.0  15.0  4.1  133.0  108.0   
 11   137626.0   6.0   34.0   79.0        0.3  48.0  15.0  4.6  146.0   50.0   
 12   137627.0   3.0   46.0   70.0        0.7  16.0  15.0  4.5  138.0  137.0   
 13   137628.0   4.0   38.0   63.0        0.4  41.0  15.0  3.6  140.0  107.0   
 14   137630.0   3.0   49.0   94.0        0.7  15.0  15.0  4.6  132.0  178.0   
 15   137631.0   2.0   14.0   80.0        0.7  13.0  14.0  4.1  136.0  108.0   
 16   137633.0   3.0   30.0   79.0        0.7  23.0  11.0  3.9  142.0  108.0   
 17   137635.0   3.0   35.0   94.0        0.7  14.0   9.0  5.3  145.0   91.0   
 18   137636.0   3.0   50.0   61.0        0.6  22.0  14.0  4.5  140.0  114.0   
 19   137637.0   3.0   31.0   94.0        1.0  41.0  15.0  3.9  142.0   86.0   
 20   137638.0   3.0   38.0  104.0        0.7  22.0  14.0  4.4  132.0  108.0   
 21   137639.0   3.0   41.0  110.0        0.7  22.0  15.0  5.2  128.0   96.0   
 22   137640.0   2.0   32.0   78.0        0.7  30.0  15.0  4.4  131.0  180.0   
 23   137642.0   2.0   41.0   45.0        0.9  32.0   3.0  4.2  148.0   96.0   
 24   137643.0   3.0   41.0   78.0        2.9  28.0  14.0  4.0  137.0  108.0   
 25   137648.0   2.0   48.0   77.0        0.7  39.0  15.0  4.6  135.0  117.0   
 26   137649.0   4.0   36.0   71.0        0.8  23.0   3.0  4.0  146.0  367.0   
 27   137656.0   4.0   32.0   72.0        0.7  42.0  10.0  5.3  130.0   89.0   
 28   137657.0   2.0   45.0  112.0        0.7  31.0  15.0  3.9  144.0  113.0   
 29   137658.0   2.0   44.0   72.0        0.7  23.0  15.0  5.2  139.0   72.0   
 ..        ...   ...    ...    ...        ...   ...   ...  ...    ...    ...   
 970  140033.0   3.0   42.0   54.0        1.2  19.0  15.0  4.0  139.0  198.0   
 971  140034.0   3.0   38.0   80.0        0.7  17.0  15.0  4.0  140.0  125.0   
 972  140035.0   3.0   51.0   63.0        0.7  23.0  15.0  4.4  136.0  107.0   
 973  140037.0   2.0   39.0   80.0        0.3   8.0  15.0  3.6  140.0  108.0   
 974  140038.0   3.0   51.0   70.0        0.7  23.0  15.0  4.0  139.0   88.0   
 975  140041.0   3.0   33.0  102.0        0.2  23.0  15.0  4.1  144.0  102.0   
 976  140048.0   3.0   28.0   92.0        0.2  23.0  15.0  4.4  135.0   68.0   
 977  140049.0   8.0   37.0   86.0        0.7  30.0  11.0  3.3  141.0  119.0   
 978  140050.0   2.0   40.0   73.0        0.7  32.0  15.0  4.1  142.0  124.0   
 979  140054.0   3.0   30.0   99.0        0.7   6.0  14.0  3.8  138.0  108.0   
 980  140055.0   4.0   40.0   80.0        0.7  26.0  12.0  4.1  141.0   74.0   
 981  140060.0   5.0   24.0   85.0        0.7  75.0  15.0  4.3  130.0  108.0   
 982  140063.0   2.0   21.0   73.0        0.7   8.0  14.0  3.2  139.0  108.0   
 983  140065.0   2.0   34.0   85.0        0.3   6.0  15.0  3.0  139.0  108.0   
 984  140068.0   8.0   45.0   84.0        0.7   7.0   8.0  4.2  140.0  113.0   
 985  140070.0   4.0   32.0   82.0        1.1  22.0  10.0  4.5  136.0   98.0   
 986  140071.0   2.0    6.0   76.0        0.3   9.0  15.0  3.5  141.0  108.0   
 987  140072.0   3.0   47.0   89.0        0.7  13.0   9.0  4.9  135.0   99.0   
 988  140073.0   3.0    9.0   90.0        0.4  10.0   3.0  4.4  140.0  290.0   
 989  140074.0   2.0   30.0   88.0        0.7  26.0  15.0  4.7  133.0   98.0   
 990  140077.0   3.0   49.0   89.0        0.7  14.0  15.0  4.8  139.0   71.0   
 991  140080.0   5.0   44.0   67.0        0.6  34.0  10.0  3.7  148.0   77.0   
 992  140085.0   4.0   29.0  132.0        0.7  47.0   9.0  5.1  138.0   97.0   
 993  140086.0   2.0   39.0   89.0        0.7   7.0  12.0  3.5  140.0  108.0   
 994  140088.0   4.0   41.0  107.0        0.9  21.0  15.0  4.0  137.0   90.0   
 995  140091.0   1.0   41.0   97.0        0.7  12.0  15.0  4.2  135.0  129.0   
 996  140095.0   3.0   25.0   79.0        1.1   7.0  15.0  3.8  139.0   45.0   
 997  140097.0   3.0   47.0   75.0        0.7  32.0  15.0  4.8  139.0   82.0   
 998  140099.0   3.0   41.0   53.0        0.7  11.0  14.0  4.5  135.0  195.0   
 999  140100.0   3.0   43.0   61.0        0.7   7.0  15.0  3.6  138.0  107.0   
 
      SysABP  Temp   WBC  
 0     118.0  37.4  15.9  
 1     151.0  36.4  11.1  
 2     128.0  37.7   9.2  
 3     123.0  36.7  26.0  
 4     105.0  37.0  10.8  
 5     110.0  37.7  13.5  
 6     164.0  36.9  13.8  
 7     123.0  36.8  10.2  
 8     123.0  37.4   2.8  
 9     149.0  38.6  12.3  
 10    112.0  37.4  13.4  
 11    123.0  36.6   8.1  
 12    125.0  37.0  10.6  
 13    177.0  36.2   5.4  
 14    105.0  36.9  14.8  
 15    123.0  36.7  10.4  
 16    123.0  38.1  11.4  
 17    137.0  37.6   9.7  
 18     96.0  36.8  13.2  
 19    132.0  37.1  37.3  
 20    123.0  36.4  11.5  
 21    137.0  37.3  17.4  
 22     91.0  36.9  25.6  
 23     82.0  36.1  16.0  
 24    136.0  37.1   9.8  
 25    123.0  37.8   9.4  
 26    155.0  36.7  17.0  
 27     92.0  35.3  20.4  
 28    129.0  37.8  13.1  
 29    118.0  36.6  16.0  
 ..      ...   ...   ...  
 970    99.0  36.6  14.2  
 971   154.0  37.7  15.8  
 972   144.0  36.9  13.8  
 973   123.0  38.0  10.0  
 974   118.0  37.0  10.7  
 975   195.0  36.6  35.8  
 976   123.0  37.6  14.6  
 977   122.0  37.1  13.0  
 978   123.0  37.0  17.3  
 979   123.0  37.4  12.7  
 980   142.0  37.0   9.4  
 981   123.0  36.4  15.8  
 982   123.0  35.0   7.6  
 983   123.0  37.1   7.8  
 984   116.0  37.6  21.0  
 985   111.0  37.8   7.5  
 986   123.0  37.1   9.2  
 987   102.0  38.3  20.3  
 988    99.0  36.0  22.3  
 989    90.0  37.3  16.0  
 990   131.0  36.3   3.0  
 991   123.0  37.3   9.6  
 992   107.0  37.3   7.8  
 993   123.0  37.2   4.8  
 994   167.0  37.6  14.9  
 995   101.0  36.4  20.2  
 996   127.0  37.3  15.2  
 997   114.0  36.9   9.6  
 998   148.0  37.4  11.1  
 999   108.0  36.7   8.8  
 
 [1000 rows x 13 columns],
 'Fold4':      RecordID  HCO3  Urine     HR  Bilirubin    BUN   GCS    K     Na   PaO2  \
 0    140101.0   2.0   44.0   97.0        0.7   13.0  10.0  3.9  142.0  145.0   
 1    140102.0   2.0   34.0   89.0        0.7   26.0  11.0  4.1  142.0   88.0   
 2    140104.0   3.0   40.0   98.0        0.7   18.0  15.0  4.1  138.0  143.0   
 3    140106.0   3.0   47.0  106.0        0.6   14.0  15.0  4.5  135.0   74.0   
 4    140107.0   3.0   35.0   84.0        0.7   22.0  11.0  3.3  140.0  109.0   
 5    140112.0   2.0   38.0   87.0        0.5   29.0  13.0  4.3  138.0  175.0   
 6    140115.0   2.0   46.0  114.0        0.7   18.0   9.0  3.5  143.0  108.0   
 7    140116.0   3.0   44.0   90.0        0.7   11.0  14.0  4.4  137.0  102.0   
 8    140117.0   2.0   38.0   85.0        0.7    9.0  15.0  4.0  142.0  167.0   
 9    140118.0   3.0    4.0   74.0        0.7   11.0  15.0  4.3  135.0  164.0   
 10   140124.0   8.0   16.0  107.0        1.0   20.0  15.0  3.1  146.0  131.0   
 11   140133.0   5.0   34.0   79.0        0.7   20.0  13.0  4.1  138.0   93.0   
 12   140139.0   3.0   44.0   62.0        0.7   15.0   7.0  3.9  135.0  102.0   
 13   140142.0   2.0   33.0   92.0        0.7   10.0  15.0  3.7  138.0   90.0   
 14   140149.0   3.0   49.0   72.0        0.7   14.0   3.0  4.1  134.0   86.0   
 15   140152.0   3.0   52.0   90.0        0.7   22.0  15.0  4.8  144.0  117.0   
 16   140155.0   2.0   31.0   99.0        0.3   35.0  15.0  4.7  146.0   68.0   
 17   140156.0   2.0   26.0   74.0        0.7   13.0  14.0  3.8  140.0  117.0   
 18   140161.0   5.0   38.0   77.0        0.6   46.0   8.0  4.6  137.0  158.0   
 19   140165.0   9.0   36.0   83.0        0.7   39.0   4.0  4.5  143.0  131.0   
 20   140166.0   3.0   45.0   77.0        0.7   31.0  14.0  4.5  137.0   91.0   
 21   140170.0   3.0   42.0   85.0        0.7   26.0  15.0  4.3  137.0   77.0   
 22   140171.0   4.0   35.0   92.0        0.7   15.0   7.0  3.7  139.0  103.0   
 23   140175.0   2.0   12.0   75.0        0.7   31.0  15.0  3.8  140.0   99.0   
 24   140183.0   2.0   46.0   80.0        0.7   25.0  15.0  3.6  138.0  114.0   
 25   140185.0   3.0   42.0   70.0        0.7   14.0  12.0  4.2  135.0  132.0   
 26   140190.0   5.0    5.0   66.0       10.4   25.0  15.0  3.3  141.0  108.0   
 27   140192.0   4.0   46.0   76.0        0.7   21.0  15.0  3.8  139.0  108.0   
 28   140193.0   3.0    5.0  108.0        2.3   56.0  13.0  4.0  141.0  108.0   
 29   140194.0   4.0   31.0  106.0        0.7   68.0   9.0  4.4  132.0  123.0   
 ..        ...   ...    ...    ...        ...    ...   ...  ...    ...    ...   
 970  142591.0   3.0   41.0   84.0        0.7   22.0  15.0  4.2  136.0  193.0   
 971  142595.0   5.0   25.0   78.0        0.7    9.0  15.0  3.6  132.0  108.0   
 972  142601.0   3.0   48.0   92.0        0.7   10.0  11.0  3.8  132.0   96.0   
 973  142603.0   3.0   46.0   90.0        0.7   26.0   6.0  3.3  147.0  183.0   
 974  142607.0   3.0   11.0   74.0        0.6   17.0  15.0  4.2  139.0  108.0   
 975  142609.0   3.0   33.0   80.0        0.7    7.0   6.0  3.5  141.0   99.0   
 976  142612.0   2.0   39.0   72.0        0.3   23.0   9.0  4.3  137.0   66.0   
 977  142618.0   4.0   38.0  106.0        0.6   10.0   6.0  3.6  141.0   90.0   
 978  142621.0   3.0   39.0   60.0        0.3   40.0  15.0  3.9  143.0  108.0   
 979  142626.0   4.0   37.0   86.0        0.7   48.0   7.0  4.9  136.0   91.0   
 980  142634.0   3.0   28.0   62.0        1.3   13.0  14.0  3.3  143.0  108.0   
 981  142635.0   7.0   40.0   71.0        0.7   27.0   8.0  4.0  149.0  142.0   
 982  142637.0   3.0   47.0   67.0        0.7   29.0  15.0  4.2  143.0  104.0   
 983  142638.0   2.0   27.0   75.0        0.7   59.0  15.0  4.3  140.0   67.0   
 984  142640.0   3.0    9.0  108.0        5.4    6.0  15.0  3.2  130.0  108.0   
 985  142641.0   2.0   35.0   75.0        0.2    6.0  15.0  3.4  141.0  108.0   
 986  142646.0   3.0   39.0   65.0        0.9    9.0  15.0  4.1  138.0   80.0   
 987  142649.0   3.0   21.0   74.0        0.7   20.0  15.0  3.4  147.0  108.0   
 988  142653.0   4.0   46.0   84.0        0.7   16.0  13.0  4.1  140.0  114.0   
 989  142654.0   5.0   11.0   56.0       34.7   96.0  15.0  3.8  135.0   92.0   
 990  142655.0   3.0   41.0   97.0        0.3   20.0  15.0  4.1  142.0   75.0   
 991  142659.0   2.0   36.0   60.0        9.2   39.0  15.0  3.7  136.0   80.0   
 992  142661.0   3.0   33.0   83.0        0.7   14.0  11.0  3.8  136.0  108.0   
 993  142662.0   2.0   25.0   70.0        0.9   69.0  15.0  4.7  136.0  108.0   
 994  142664.0   3.0   46.0  106.0        0.7   15.0  10.0  3.7  143.0  118.0   
 995  142665.0   2.0   39.0   89.0        0.7   18.0  15.0  3.9  136.0  123.0   
 996  142667.0   5.0    8.0   80.0        0.7    7.0  15.0  3.4  142.0  108.0   
 997  142670.0   4.0   36.0   86.0        0.5    6.0   5.0  3.9  142.0  127.0   
 998  142671.0   4.0   13.0   82.0        0.6  114.0   3.0  5.0  144.0  190.0   
 999  142673.0   6.0   48.0   84.0        0.5   24.0  14.0  4.9  145.0  122.0   
 
      SysABP  Temp   WBC  
 0     123.0  37.8  10.7  
 1     120.0  37.1  11.8  
 2      82.0  37.1  17.3  
 3     101.0  37.5   8.4  
 4     130.0  38.0  10.8  
 5     139.0  36.9   8.3  
 6     123.0  38.4  12.1  
 7      96.0  37.7   7.9  
 8     123.0  37.0  12.8  
 9     113.0  37.0   9.1  
 10    136.0  37.6  12.1  
 11    124.0  37.0   8.4  
 12     90.0  37.5  14.7  
 13    117.0  36.4  32.9  
 14    116.0  37.3  12.6  
 15    153.0  36.5  17.2  
 16    140.0  36.3  25.6  
 17    101.0  37.3  13.5  
 18    125.0  36.7  12.7  
 19    141.0  36.1  15.4  
 20    141.0  36.1   7.3  
 21    120.0  37.1  11.9  
 22    108.0  38.9  22.1  
 23    123.0  37.7  14.6  
 24    141.0  37.1   6.1  
 25    152.0  37.2   6.6  
 26    123.0  35.7   6.5  
 27    122.0  36.3  16.0  
 28    123.0  36.8  22.0  
 29    149.0  36.9  15.4  
 ..      ...   ...   ...  
 970   105.0  36.7   9.6  
 971   123.0  36.2   2.7  
 972   108.0  37.0  17.0  
 973   157.0  37.8  10.3  
 974   111.0  37.5  13.4  
 975   139.0  37.8  12.4  
 976   150.0  37.3  12.0  
 977   103.0  38.0   5.0  
 978   123.0  36.1  12.5  
 979   124.0  37.4  23.5  
 980   123.0  36.6   8.0  
 981   108.0  37.7  12.9  
 982   104.0  37.1  16.6  
 983    95.0  36.3  10.5  
 984   123.0  36.9   4.2  
 985   123.0  36.6   9.0  
 986   113.0  37.2   5.7  
 987   123.0  37.3   7.0  
 988   158.0  36.2  13.6  
 989   123.0  36.2   7.1  
 990   157.0  38.6  15.2  
 991   123.0  36.4  15.9  
 992   123.0  36.8  11.4  
 993   123.0  36.1  11.0  
 994   112.0  37.2   8.0  
 995   152.0  36.9  17.5  
 996   123.0  36.8   3.0  
 997   113.0  38.4  10.6  
 998   145.0  37.4  11.5  
 999   129.0  37.3  11.0  
 
 [1000 rows x 13 columns]}

Design Matrix 2

Vital Signs:

  • MechVent
  • TempGCS
  • PaO2
  • FiO2
  • HR
  • MAP

The most recent value of the respective vital signs variables are taken as it has a high significance on a patients' condition at the end of the 48 hours.

Nan values for respective vital feature variables will be replaced with the normal range and assume that patient in the particular variable is good, thus, measurements are not taken. The NaN values are not replaced with mean values because the condition of a patient in a particular variable (ie MAP) does not depend on the condition on the rest of the patients.

The data are carefully evaulated based on each fold, where missing data are replaced and the data are categories on wehther they are in the normal range.

temp is created to make a copy on the most recent of the temporal data.

In [76]:
temp = all_temporal_dfs_folds__most_recent.copy()
MechVent

Reason for choosing MechVent: Patients are unable to breathe normally on their own. They require the mechanical means to assist or replace their spontaneous breathing. The usuage of MechVent is related to the mortality rate of a patient.

  • Pranikoff, T. M., Hirschl, R. B., Steimle, C. N., Anderson, H. L., & Bartlett, R. H. (n.d.). Mortality is directly related to the duration of mechanical ventilation before the initiation of extracorporeal life support for severe respiratory failure. Retrieved from doi: 10.1097/00003246-199701000-00008

NaN values are replace with 0.

Patients who were on MechVent are categorized as 1 and those without are categories as 0.

In [77]:
#Fold 1
mechvent_list_fold1=[]
for i in temp['Fold1']['MechVent']:
    if i == 1.0:
        mechvent_list_fold1.append(1)
    else:
        mechvent_list_fold1.append(0)

#Fold 2
mechvent_list_fold2=[]
for i in temp['Fold2']['MechVent']:
    if i == 1.0:
        mechvent_list_fold2.append(1)
    else:
        mechvent_list_fold2.append(0)

#Fold 3
mechvent_list_fold3=[]
for i in temp['Fold3']['MechVent']:
    if i == 1.0:
        mechvent_list_fold3.append(1)
    else:
        mechvent_list_fold3.append(0)

#Fold 4
mechvent_list_fold4=[]
for i in temp['Fold4']['MechVent']:
    if i == 1.0:
        mechvent_list_fold4.append(1)
    else:
        mechvent_list_fold4.append(0)
Temp

Reason for choosing Temp: A body temperature that is too high can cause malfunction and ultimately failure of most organs, which eventually cause death.

Hypothermia: <35.0 °C, Normal: 36.5–37.5 °C, Fever: 37.5 or 38.3 °C, Hyperthermia: >37.5 or 38.3 °C, Hyperpyrexia: >40.0 or 41.0 °C

NaN values are replace with 37.0.

Patients who are Hypothermia are categorised as 1, Normal as 2, Fever as 3 and Hyperpyrexia as 4.

In [78]:
#Fold 1
temp['Fold1']['Temp'].fillna('37.0', inplace=True)
temp_list1=[]
for i in temp['Fold1']['Temp']:
    if float(i)<35:
        temp_list1.append(1)
    elif float(i)<=37.5:
        temp_list1.append(2)
    elif float(i)<=38.3:
        temp_list1.append(3)
    elif float(i)>38.3:
        temp_list1.append(4)

#Fold 2
temp['Fold2']['Temp'].fillna('37.0', inplace=True)
temp_list2=[]
for i in temp['Fold2']['Temp']:
    if float(i)<35:
        temp_list2.append(1)
    elif float(i)<=37.5:
        temp_list2.append(2)
    elif float(i)<=38.3:
        temp_list2.append(3)
    elif float(i)>38.3:
        temp_list2.append(4)

#Fold 3 
temp['Fold3']['Temp'].fillna('37.0', inplace=True)
temp_list3=[]
for i in temp['Fold3']['Temp']:
    if float(i)<35:
        temp_list3.append(1)
    elif float(i)<=37.5:
        temp_list3.append(2)
    elif float(i)<=38.3:
        temp_list3.append(3)
    elif float(i)>38.3:
        temp_list3.append(4)

#Fold 4 
temp['Fold4']['Temp'].fillna('37.0', inplace=True)
temp_list4=[]
for i in temp['Fold3']['Temp']:
    if float(i)<35:
        temp_list4.append(1)
    elif float(i)<=37.5:
        temp_list4.append(2)
    elif float(i)<=38.3:
        temp_list4.append(3)
    elif float(i)>38.3:
        temp_list4.append(4)
GCS (Glasgow Coma Score)

Reason for choosing GCS: A GCS score of 3 is the lowest possible score and is associated with an extremely high mortality rate, with low chance of survival.

Range of GCS is 3-15. Level 3-8: Coma, 8>good chance of recovery

  • Demetriades D, Kuncir E, Velmahos GC, Rhee P, Alo K, Chan LS. Outcome and Prognostic Factors in Head Injuries With an Admission Glasgow Coma Scale Score of 3. Arch Surg. 2004;139(10):1066–1068. doi:10.1001/archsurg.139.10.1066

NaN values are replace with 13.

Patients who are coma are categorized as 1 and those who are safe are categories as 0.

In [79]:
#Fold 1
GCS_list1=[]
temp['Fold1']['GCS'].fillna('13', inplace=True)
for i in temp['Fold1']['GCS']:
    if float(i)<=8:
        GCS_list1.append(1)
    elif float(i)>8:
        GCS_list1.append(0)

#Fold 2
GCS_list2=[]
temp['Fold2']['GCS'].fillna('13', inplace=True)
for i in temp['Fold2']['GCS']:
    if float(i)<=8:
        GCS_list2.append(1)
    elif float(i)>8:
        GCS_list2.append(0)
        
#Fold 3
GCS_list3=[]
temp['Fold3']['GCS'].fillna('13', inplace=True)
for i in temp['Fold3']['GCS']:
    if float(i)<=8:
        GCS_list3.append(1)
    elif float(i)>8:
        GCS_list3.append(0)
        
#Fold 4
GCS_list4=[]
temp['Fold4']['GCS'].fillna('13', inplace=True)
for i in temp['Fold4']['GCS']:
    if float(i)<=8:
        GCS_list4.append(1)
    elif float(i)>8:
        GCS_list4.append(0)
PaO2/ FiO2 Level

Reason for choosing PaO2/ FiO2 Level: The level of PaO2/ FiO2 level is used to compare between the oxygen level in the blood and the oxygen concentration that is breathed It is associated with the chance of mortalty.

There are a total of 3 classfications.

Mild: 200-300. Mortality rate of Mild is 27%.

Moderate: 100-200. Mortality rate of Moderate is 32%.

Severe: <100. Mortality rate of Severe is 45%

  • Villar, J., Blanco, J., Campo, R. d., Andaluz-Ojeda, D., Díaz-Domínguez, F. J., Muriel, A., et al. (2015). Assessment of PaO2/FiO2 for stratification of patients with moderate and severe acute respiratory distress syndrome. Retrieved from https://bmjopen.bmj.com/content/bmjopen/5/3/e006812.full.pdf

NaN values for PaO2 are replace with 80 and NaN values for FiO2 are replaced with 0.5

Patients who are on mild level are categorized as 1, on moderate level as 2 and on severe level at 3.

In [80]:
#Replace all NaN in PaO2 with 80
temp['Fold1']['PaO2'].fillna('80', inplace=True)
temp['Fold2']['PaO2'].fillna('80', inplace=True)
temp['Fold3']['PaO2'].fillna('80', inplace=True)
temp['Fold4']['PaO2'].fillna('80', inplace=True)

#Replace all NaN in FiO2 with 0.5
temp['Fold1']['FiO2'].fillna('0.5', inplace=True)
temp['Fold2']['FiO2'].fillna('0.5', inplace=True)
temp['Fold3']['FiO2'].fillna('0.5', inplace=True)
temp['Fold4']['FiO2'].fillna('0.5', inplace=True)

#To find the PaO2/FiO2 level
PaO2_FiO2_ratio1 = (temp['Fold1']['PaO2'].astype(float))/ (temp['Fold1']['FiO2'].astype(float))
PaO2_FiO2_ratio2 = (temp['Fold2']['PaO2'].astype(float))/ (temp['Fold2']['FiO2'].astype(float))
PaO2_FiO2_ratio3 = (temp['Fold3']['PaO2'].astype(float))/ (temp['Fold3']['FiO2'].astype(float))
PaO2_FiO2_ratio4 = (temp['Fold4']['PaO2'].astype(float))/ (temp['Fold4']['FiO2'].astype(float))


#1: mild, 2: Moderate, 3: Severe
PaO2_FiO2_ratio_list1=[]
for i in PaO2_FiO2_ratio1:
    if float(i)<100:
        PaO2_FiO2_ratio_list1.append(3)
    elif float(i)<=200:
        PaO2_FiO2_ratio_list1.append(2)
    elif float(i)<=300 or float(i)>300:
        PaO2_FiO2_ratio_list1.append(1)
        
PaO2_FiO2_ratio_list2=[]
for i in PaO2_FiO2_ratio2:
    if float(i)<100:
        PaO2_FiO2_ratio_list2.append(3)
    elif float(i)<=200:
        PaO2_FiO2_ratio_list2.append(2)
    elif float(i)<=300 or float(i)>300:
        PaO2_FiO2_ratio_list2.append(1)

PaO2_FiO2_ratio_list3=[]
for i in PaO2_FiO2_ratio3:
    if float(i)<100:
        PaO2_FiO2_ratio_list3.append(3)
    elif float(i)<=200:
        PaO2_FiO2_ratio_list3.append(2)
    elif float(i)<=300 or float(i)>300:
        PaO2_FiO2_ratio_list3.append(1)        
        
PaO2_FiO2_ratio_list4=[]
for i in PaO2_FiO2_ratio4:
    if float(i)<100:
        PaO2_FiO2_ratio_list4.append(3)
    elif float(i)<=200:
        PaO2_FiO2_ratio_list4.append(2)
    elif float(i)<=300 or float(i)>300:
        PaO2_FiO2_ratio_list4.append(1)
HR (Heart Rate)

Reason for choosing HR: There is an association of heart rate with mortality.

The normal range is: 50 -100 bpm

  • Jensen MT, Suadicani P, Hein HO, et al Elevated resting heart rate, physical fitness and all-cause mortality: a 16-year follow-up in the Copenhagen Male Study Heart 2013;99:882-887.

NaN values are replace with 80.

Patients who are on normal HR are categorized as 0 and those who are not are categories as 1.

In [81]:
#Fold 1
temp['Fold1']['HR'].fillna('80', inplace=True)
HR_list1=[]
for i in temp['Fold1']['Temp']:
    if float(i)>=50 and float(i)<=100:
        HR_list1.append(0)
    elif float(i) <50 or float(i)>100:
        HR_list1.append(1)

#Fold 2
temp['Fold2']['HR'].fillna('80', inplace=True)
HR_list2=[]
for i in temp['Fold2']['Temp']:
    if float(i)>=50 and float(i)<=100:
        HR_list2.append(0)
    elif float(i) <50 or float(i)>100:
        HR_list2.append(1)
        
#Fold 3
temp['Fold3']['HR'].fillna('80', inplace=True)
HR_list3=[]
for i in temp['Fold3']['Temp']:
    if float(i)>=50 and float(i)<=100:
        HR_list3.append(0)
    elif float(i) <50 or float(i)>100:
        HR_list3.append(1)

#Fold 4        
temp['Fold4']['HR'].fillna('80', inplace=True)
HR_list4=[]
for i in temp['Fold4']['Temp']:
    if float(i)>=50 and float(i)<=100:
        HR_list4.append(0)
    elif float(i) <50 or float(i)>100:
        HR_list4.append(1)
MAP (Invasive mean arterial blood pressure (mmHg))

Reason for choosing MAP: Normally range is between 65 and 110. Below normal range is bad, where vital organs will not get enough oxygen perfusion and will become hypoxic

MAP is a better factor in determining a patient's mortality as compared to systolic blood pressure

  • Lehman, Li-wei & Saeed, Mohammed & Talmor, Daniel & Mark, Roger & Malhotra, Atul. (2013). Methods of Blood Pressure Measurement in the ICU. Critical care medicine. 41. 34-40. 10.1097/CCM.0b013e318265ea46.

On top of that, MAP range is associated with mortality

  • Ascha, E. J., Yang, D. M., Weiss, S. M., & Sessler, D. I. (2015, July). Anesthesiology. Retrieved from Intraoperative Mean Arterial Pressure Variability and 30-day Mortality in Patients Having Noncardiac Surgery: doi: 10.1097/ALN.0000000000000686

NaN values are replace with 80.

Patients who are on normal MAP range are categorized as 0 and those who are not are categories as 1.

In [82]:
MAP_list1=[]
temp['Fold1']['MAP'].fillna('80', inplace=True)
for i in temp['Fold1']['MAP']:
    if float(i)>=65 and float(i)<=110:
        MAP_list1.append(0)
    else:
        MAP_list1.append(1) 

MAP_list2=[]
temp['Fold2']['MAP'].fillna('80', inplace=True)
for i in temp['Fold2']['MAP']:
    if float(i)>=65 and float(i)<=110:
        MAP_list2.append(0)
    else:
        MAP_list2.append(1) 
        
MAP_list3=[]
temp['Fold3']['MAP'].fillna('80', inplace=True)
for i in temp['Fold3']['MAP']:
    if float(i)>=65 and float(i)<=110:
        MAP_list3.append(0)
    else:
        MAP_list3.append(1) 
        
MAP_list4=[]
temp['Fold4']['MAP'].fillna('80', inplace=True)
for i in temp['Fold4']['MAP']:
    if float(i)>=65 and float(i)<=110:
        MAP_list4.append(0)
    else:
        MAP_list4.append(1) 

Now, collated a dataframe for each of the folds which contains the 6 variables above. Then, combine the dataframes and form design matrix 2.

In [83]:
vital_sign_list1 =  list(zip(list(all_static_dfs_folds['Fold1']['RecordID']), mechvent_list_fold1, temp_list1, GCS_list1, PaO2_FiO2_ratio_list1, HR_list1, MAP_list1))        
vital_sign_list2 =  list(zip(list(all_static_dfs_folds['Fold2']['RecordID']), mechvent_list_fold2, temp_list2, GCS_list2, PaO2_FiO2_ratio_list2, HR_list2, MAP_list2))        
vital_sign_list3 =  list(zip(list(all_static_dfs_folds['Fold3']['RecordID']), mechvent_list_fold3, temp_list3, GCS_list3, PaO2_FiO2_ratio_list3, HR_list3, MAP_list3))        
vital_sign_list4 =  list(zip(list(all_static_dfs_folds['Fold4']['RecordID']), mechvent_list_fold4, temp_list4, GCS_list4, PaO2_FiO2_ratio_list4, HR_list4, MAP_list4))        
        
vital_sign_list1_df = pd.DataFrame(vital_sign_list1, columns = ['RecordID', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']) 
vital_sign_list2_df = pd.DataFrame(vital_sign_list2, columns = ['RecordID', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio','HR', 'MAP']) 
vital_sign_list3_df = pd.DataFrame(vital_sign_list3, columns = ['RecordID', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']) 
vital_sign_list4_df = pd.DataFrame(vital_sign_list4, columns = ['RecordID', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']) 
  
design_matrix_2 = {}
design_matrix_2['Fold1'] = vital_sign_list1_df
design_matrix_2['Fold2'] = vital_sign_list2_df
design_matrix_2['Fold3'] = vital_sign_list3_df
design_matrix_2['Fold4'] = vital_sign_list4_df

design_matrix_2
Out[83]:
{'Fold1':      RecordID  MechVent  Temp  GCS  PaO2_FiO2_ratio  HR  MAP
 0    132539.0         0     3    0                2   1    0
 1    132540.0         1     2    0                1   1    0
 2    132541.0         1     2    1                1   1    0
 3    132543.0         0     2    0                2   1    0
 4    132545.0         0     2    0                2   1    0
 5    132547.0         1     2    1                1   1    0
 6    132548.0         0     2    0                2   1    0
 7    132551.0         1     2    0                1   1    1
 8    132554.0         0     2    0                2   1    0
 9    132555.0         1     2    0                1   1    0
 10   132556.0         0     2    0                2   1    0
 11   132567.0         1     2    0                1   1    0
 12   132568.0         0     2    0                2   1    0
 13   132570.0         0     3    0                2   1    0
 14   132573.0         0     2    0                2   1    0
 15   132575.0         1     2    0                2   1    0
 16   132577.0         0     3    0                2   1    0
 17   132582.0         0     2    0                2   1    0
 18   132584.0         1     2    0                2   1    0
 19   132585.0         1     3    0                1   1    1
 20   132588.0         0     2    0                2   1    0
 21   132590.0         1     2    0                1   1    0
 22   132591.0         0     2    0                1   1    0
 23   132592.0         0     2    0                2   1    0
 24   132595.0         0     2    0                1   1    0
 25   132597.0         0     2    0                2   1    0
 26   132598.0         1     2    1                1   1    0
 27   132599.0         1     2    0                1   1    0
 28   132601.0         1     2    0                1   1    0
 29   132602.0         1     2    0                3   1    0
 ..        ...       ...   ...  ...              ...  ..  ...
 970  134999.0         1     4    0                1   1    0
 971  135002.0         1     2    1                2   1    0
 972  135004.0         0     2    0                2   1    0
 973  135006.0         1     2    1                1   1    0
 974  135007.0         1     2    0                2   1    0
 975  135009.0         1     2    0                1   1    0
 976  135011.0         1     2    0                2   1    0
 977  135013.0         1     2    0                2   1    0
 978  135014.0         1     2    0                1   1    0
 979  135015.0         0     2    0                2   1    0
 980  135020.0         0     2    0                2   1    0
 981  135021.0         1     2    1                2   1    0
 982  135027.0         1     3    1                1   1    0
 983  135028.0         1     3    1                1   1    0
 984  135031.0         1     2    0                2   1    0
 985  135036.0         1     3    0                1   1    1
 986  135044.0         0     2    0                2   1    0
 987  135048.0         1     3    1                2   1    0
 988  135049.0         1     3    0                1   1    0
 989  135051.0         1     2    0                1   1    0
 990  135052.0         0     2    0                1   1    0
 991  135056.0         0     2    0                2   1    0
 992  135057.0         1     2    0                2   1    0
 993  135059.0         0     2    0                2   1    0
 994  135065.0         1     2    0                2   1    0
 995  135067.0         0     2    0                2   1    0
 996  135069.0         1     2    0                2   1    1
 997  135071.0         0     2    0                3   1    0
 998  135072.0         1     2    0                1   1    0
 999  135075.0         1     2    0                1   1    0
 
 [1000 rows x 7 columns],
 'Fold2':      RecordID  MechVent  Temp  GCS  PaO2_FiO2_ratio  HR  MAP
 0    135076.0         1     3    0                3   1    0
 1    135077.0         1     3    0                2   1    0
 2    135079.0         1     2    0                2   1    0
 3    135080.0         1     2    0                2   1    0
 4    135081.0         1     2    0                1   1    0
 5    135083.0         0     2    0                2   1    0
 6    135084.0         0     2    0                2   1    0
 7    135086.0         1     2    0                1   1    0
 8    135087.0         1     4    0                1   1    0
 9    135088.0         1     4    0                2   1    0
 10   135089.0         0     2    0                2   1    0
 11   135092.0         0     2    0                1   1    0
 12   135093.0         1     2    1                2   1    0
 13   135098.0         1     2    0                1   1    0
 14   135102.0         1     2    0                2   1    0
 15   135103.0         1     2    0                2   1    0
 16   135104.0         0     2    0                2   1    0
 17   135105.0         0     2    0                3   1    0
 18   135107.0         0     2    0                2   1    0
 19   135110.0         1     2    0                1   1    0
 20   135111.0         0     2    0                1   1    0
 21   135115.0         1     2    0                1   1    0
 22   135116.0         0     2    0                2   1    0
 23   135127.0         1     2    0                2   1    0
 24   135129.0         1     2    0                1   1    0
 25   135130.0         1     3    1                2   1    0
 26   135135.0         0     2    0                2   1    0
 27   135141.0         0     2    0                2   1    0
 28   135142.0         1     2    1                1   1    1
 29   135145.0         0     2    0                1   1    0
 ..        ...       ...   ...  ...              ...  ..  ...
 970  137537.0         0     2    0                3   1    0
 971  137538.0         1     2    0                1   1    0
 972  137542.0         1     2    1                1   1    0
 973  137545.0         1     2    0                2   1    0
 974  137548.0         1     2    0                1   1    1
 975  137549.0         0     2    1                2   1    0
 976  137552.0         0     3    0                2   1    0
 977  137556.0         0     2    0                2   1    0
 978  137562.0         0     2    0                2   1    1
 979  137563.0         1     2    1                2   1    0
 980  137564.0         1     2    1                1   1    0
 981  137567.0         1     2    0                2   1    0
 982  137568.0         0     2    0                2   1    0
 983  137569.0         1     3    1                2   1    0
 984  137570.0         1     2    0                1   1    0
 985  137573.0         1     2    0                2   1    0
 986  137576.0         1     2    1                1   1    0
 987  137577.0         1     2    1                2   1    1
 988  137578.0         1     2    0                2   1    0
 989  137579.0         0     2    0                2   1    0
 990  137580.0         1     2    0                1   1    1
 991  137581.0         0     2    0                2   1    0
 992  137583.0         0     2    0                2   1    0
 993  137584.0         1     4    0                2   1    0
 994  137586.0         0     2    0                2   1    0
 995  137587.0         1     3    0                1   1    0
 996  137588.0         0     2    0                2   1    0
 997  137589.0         1     2    0                2   1    0
 998  137590.0         0     2    0                1   1    0
 999  137592.0         0     2    0                2   1    0
 
 [1000 rows x 7 columns],
 'Fold3':      RecordID  MechVent  Temp  GCS  PaO2_FiO2_ratio  HR  MAP
 0    137593.0         1     2    0                2   1    0
 1    137594.0         0     2    0                2   1    0
 2    137595.0         1     3    0                2   1    0
 3    137598.0         1     2    0                1   1    0
 4    137600.0         1     2    0                2   1    0
 5    137602.0         1     3    0                1   1    0
 6    137604.0         1     2    0                1   1    0
 7    137606.0         0     2    0                2   1    0
 8    137609.0         0     2    0                2   1    0
 9    137619.0         1     4    1                1   1    0
 10   137624.0         0     2    0                2   1    0
 11   137626.0         1     2    0                2   1    0
 12   137627.0         1     2    0                1   1    0
 13   137628.0         1     2    0                1   1    0
 14   137630.0         0     2    0                1   1    0
 15   137631.0         0     2    0                2   1    0
 16   137633.0         0     3    0                1   1    0
 17   137635.0         1     3    0                2   1    0
 18   137636.0         1     2    0                2   1    0
 19   137637.0         1     2    0                2   1    0
 20   137638.0         0     2    0                2   1    0
 21   137639.0         1     2    0                2   1    0
 22   137640.0         1     2    0                1   1    1
 23   137642.0         1     2    1                2   1    1
 24   137643.0         0     2    0                2   1    0
 25   137648.0         1     3    0                1   1    1
 26   137649.0         1     2    1                1   1    0
 27   137656.0         1     2    0                2   1    1
 28   137657.0         0     3    0                1   1    0
 29   137658.0         1     2    0                2   1    0
 ..        ...       ...   ...  ...              ...  ..  ...
 970  140033.0         1     2    0                1   1    1
 971  140034.0         0     3    0                1   1    0
 972  140035.0         1     2    0                1   1    0
 973  140037.0         0     3    0                2   1    0
 974  140038.0         1     2    0                1   1    0
 975  140041.0         0     2    0                1   1    1
 976  140048.0         1     3    0                3   1    0
 977  140049.0         1     2    0                1   1    0
 978  140050.0         0     2    0                1   1    0
 979  140054.0         0     2    0                2   1    0
 980  140055.0         1     2    0                2   1    0
 981  140060.0         0     2    0                2   1    0
 982  140063.0         0     2    0                2   1    0
 983  140065.0         0     2    0                2   1    0
 984  140068.0         1     3    1                1   1    0
 985  140070.0         1     3    0                1   1    0
 986  140071.0         0     2    0                2   1    0
 987  140072.0         1     3    0                1   1    1
 988  140073.0         1     2    1                1   1    0
 989  140074.0         0     2    0                2   1    1
 990  140077.0         1     2    0                2   1    0
 991  140080.0         1     2    0                2   1    0
 992  140085.0         1     2    0                2   1    0
 993  140086.0         0     2    0                2   1    0
 994  140088.0         1     3    0                2   1    0
 995  140091.0         1     2    0                1   1    0
 996  140095.0         1     2    0                2   1    0
 997  140097.0         1     2    0                2   1    0
 998  140099.0         1     2    0                1   1    1
 999  140100.0         1     2    0                1   1    0
 
 [1000 rows x 7 columns],
 'Fold4':      RecordID  MechVent  Temp  GCS  PaO2_FiO2_ratio  HR  MAP
 0    140101.0         1     2    0                1   1    0
 1    140102.0         1     2    0                2   1    0
 2    140104.0         1     3    0                1   1    0
 3    140106.0         1     2    0                2   1    0
 4    140107.0         1     2    0                1   1    0
 5    140112.0         0     3    0                1   1    0
 6    140115.0         1     2    0                2   1    0
 7    140116.0         1     2    0                2   1    0
 8    140117.0         0     2    0                1   1    0
 9    140118.0         1     4    0                1   1    1
 10   140124.0         0     2    0                1   1    0
 11   140133.0         0     2    0                2   1    0
 12   140139.0         1     2    1                1   1    0
 13   140142.0         1     2    0                2   1    0
 14   140149.0         1     2    1                2   1    0
 15   140152.0         1     2    0                1   1    0
 16   140155.0         1     3    0                2   1    0
 17   140156.0         0     3    0                1   1    0
 18   140161.0         1     2    1                1   1    0
 19   140165.0         1     2    1                1   1    0
 20   140166.0         1     2    0                2   1    0
 21   140170.0         1     2    0                2   1    0
 22   140171.0         1     2    1                1   1    0
 23   140175.0         0     2    0                2   1    0
 24   140183.0         1     2    0                1   1    0
 25   140185.0         1     3    0                1   1    0
 26   140190.0         0     2    0                2   1    0
 27   140192.0         0     2    0                2   1    1
 28   140193.0         0     3    0                2   1    0
 29   140194.0         1     2    0                1   1    0
 ..        ...       ...   ...  ...              ...  ..  ...
 970  142591.0         1     2    0                1   1    0
 971  142595.0         0     3    0                2   1    0
 972  142601.0         1     2    0                1   1    0
 973  142603.0         1     3    1                1   1    1
 974  142607.0         0     2    0                2   1    0
 975  142609.0         1     2    1                2   1    0
 976  142612.0         1     3    0                2   1    0
 977  142618.0         1     2    1                1   1    0
 978  142621.0         0     2    0                2   1    0
 979  142626.0         0     2    1                3   1    0
 980  142634.0         0     2    0                2   1    0
 981  142635.0         1     2    1                1   1    0
 982  142637.0         1     2    0                2   1    1
 983  142638.0         0     2    0                2   1    0
 984  142640.0         0     3    0                2   1    0
 985  142641.0         0     3    0                2   1    0
 986  142646.0         0     2    0                2   1    0
 987  142649.0         0     3    0                2   1    0
 988  142653.0         1     2    0                2   1    0
 989  142654.0         0     2    0                2   1    0
 990  142655.0         1     2    0                2   1    0
 991  142659.0         1     2    0                3   1    0
 992  142661.0         0     2    0                2   1    0
 993  142662.0         0     2    0                2   1    0
 994  142664.0         1     3    0                1   1    0
 995  142665.0         1     2    0                1   1    0
 996  142667.0         0     2    0                2   1    0
 997  142670.0         1     2    1                1   1    0
 998  142671.0         1     2    1                1   1    0
 999  142673.0         1     2    0                1   1    0
 
 [1000 rows x 7 columns]}

Design Matrix 3

Objective: To use the most recent data as a whole to see how feature performs in the model and iterate to improve performance

all_temporal_dfs_folds__most_recent is used for design matrix. Those are the features that are within the 48 hours and the features can be further selected by forward stepwise regression model in refine after that.

In [84]:
all_temporal_dfs_folds__most_recent = getAllTemporalDataFrameByAggregationTypeInFolds(cv_fold, all_patients, "most_recent")
all_temporal_dfs_folds__most_recent
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4 folds of patients' Temporals data has been extracted with aggregator most_recent
Out[84]:
{'Fold1':       Age   BUN  Creatinine   GCS  Gender  Glucose  HCO3   HCT     HR  Height  \
 0    54.0   8.0         0.7  15.0     0.0    115.0  28.0  30.3   86.0    -1.0   
 1    76.0  21.0         1.3  15.0     1.0    146.0  24.0  29.4   65.0   175.3   
 2    44.0   3.0         0.3   5.0     0.0    143.0  25.0  29.4   71.0    -1.0   
 3    68.0  10.0         0.7  15.0     1.0    117.0  28.0  36.3   79.0   180.3   
 4    88.0  25.0         1.0  15.0     0.0     92.0  20.0  30.9   68.0    -1.0   
 5    64.0  16.0         0.7   8.0     1.0    153.0  21.0  35.5   92.0   180.3   
 6    68.0  36.0         4.1  15.0     0.0    115.0  26.0  30.0   60.0   162.6   
 7    78.0  58.0         0.6   9.0     0.0    116.0  12.0  33.0   58.0   162.6   
 8    64.0  23.0         0.7  15.0     0.0    112.0  25.0  28.3  122.0    -1.0   
 9    74.0  22.0         1.3  15.0     1.0    114.0  26.0  28.4   78.0   175.3   
 10   64.0  55.0         1.2  15.0     0.0     81.0  18.0  28.7   91.0    -1.0   
 11   71.0   9.0         0.6  15.0     0.0    138.0  27.0  27.4   95.0   157.5   
 12   66.0  16.0         1.3  15.0     0.0    110.0  25.0  30.0   93.0   157.5   
 13   84.0  89.0         3.3  15.0     1.0    167.0  31.0  28.0   73.0   170.2   
 14   77.0  40.0         1.1  15.0     1.0    151.0  31.0  31.8   68.0   162.6   
 15   78.0  18.0         1.1  15.0     1.0    148.0  22.0  27.7  106.0   167.6   
 16   65.0  47.0         2.4  15.0     1.0    110.0  20.0  30.2   88.0    -1.0   
 17   84.0  32.0         1.1  15.0     1.0    182.0  27.0  29.7   83.0   182.9   
 18   78.0  24.0         1.4  11.0     0.0    137.0  17.0  33.8   73.0    -1.0   
 19   40.0   7.0         0.5  15.0     0.0     96.0  28.0  21.5   92.0   165.1   
 20   48.0   5.0         2.2  15.0     0.0    110.0  23.0  23.6   78.0   154.9   
 21   58.0  13.0         0.6  15.0     1.0     91.0  27.0  24.9   88.0   188.0   
 22   81.0  32.0         1.2  15.0     1.0    129.0  22.0  28.4   61.0    -1.0   
 23   35.0  35.0         1.4  15.0     0.0     68.0  17.0  25.3   82.0    -1.0   
 24   26.0   8.0         0.6   NaN     0.0     95.0  24.0  26.9    NaN    -1.0   
 25   66.0  20.0         4.7  15.0     0.0    104.0  21.0  31.5   65.0   137.2   
 26   80.0  22.0         0.7   8.0     0.0    129.0  21.0  32.8   72.0    -1.0   
 27   53.0  12.0         0.5  14.0     0.0     94.0  23.0  23.2   94.0   177.8   
 28   74.0  21.0         1.4  15.0     1.0    170.0  26.0  26.6   95.0   177.8   
 29   80.0  29.0         1.3  15.0     1.0    106.0  29.0  39.9   85.0   180.3   
 ..    ...   ...         ...   ...     ...      ...   ...   ...    ...     ...   
 970  59.0  22.0         1.8  10.0     1.0    143.0  23.0  31.1  121.0   167.6   
 971  80.0  12.0         0.8   3.0     0.0    243.0  22.0  24.5   58.0    -1.0   
 972  81.0  20.0         1.0  15.0     1.0    108.0  28.0  39.5   89.0   180.3   
 973  43.0  12.0         0.9   7.0     1.0     95.0  22.0  39.4  119.0   177.8   
 974  69.0  21.0         0.9  15.0     0.0    117.0  25.0  29.1   65.0   157.5   
 975  84.0  24.0         1.0  11.0     0.0    142.0  27.0  32.8  100.0    -1.0   
 976  60.0  33.0         5.2  10.0     1.0    123.0  26.0  31.2   89.0    -1.0   
 977  82.0  19.0         0.8  15.0     0.0    135.0  25.0  25.7   89.0   152.4   
 978  83.0  14.0         0.6  12.0     1.0    110.0  22.0  29.1  110.0   170.2   
 979  80.0  31.0         1.1  15.0     1.0     89.0  21.0  34.6   72.0    -1.0   
 980  84.0  12.0         0.6  15.0     0.0     79.0  28.0  32.6   62.0    -1.0   
 981  71.0   NaN         NaN   7.0     1.0      NaN   NaN   NaN   69.0   180.3   
 982  89.0  19.0         0.6   6.0     0.0    147.0  26.0  34.2   89.0    -1.0   
 983  65.0  16.0         0.7   8.0     1.0     24.0  22.0  32.7   90.0   180.3   
 984  69.0   6.0         0.7   9.0     1.0    118.0  22.0  30.7   75.0   172.7   
 985  50.0   4.0         0.5  15.0     0.0     73.0  20.0  24.7   96.0   162.6   
 986  82.0  61.0         1.8  15.0     0.0    193.0  14.0  35.5   71.0    -1.0   
 987  59.0  30.0         1.0   3.0     0.0    185.0  32.0  28.2  132.0    -1.0   
 988  19.0   8.0         0.8  15.0     1.0     96.0  26.0  38.8   83.0    -1.0   
 989  79.0   8.0         0.5   9.0     0.0    115.0  23.0  27.8   82.0   152.4   
 990  84.0  19.0         0.7  14.0     0.0     77.0  23.0  33.9  118.0    -1.0   
 991  66.0   NaN         NaN  15.0     0.0      NaN   NaN   NaN   85.0    -1.0   
 992  84.0   NaN         NaN   9.0     0.0      NaN   NaN   NaN   76.0    -1.0   
 993  90.0  29.0         1.1   NaN     0.0    184.0  21.0  30.4    NaN    -1.0   
 994  71.0  18.0         0.6  15.0     0.0    118.0  22.0  36.2   77.0   160.0   
 995  35.0  33.0         1.1  11.0     0.0     82.0  19.0  29.6   56.0    -1.0   
 996  73.0  21.0         0.7  15.0     1.0    127.0  24.0  39.2   88.0    -1.0   
 997  81.0  59.0         2.1  13.0     1.0    123.0  26.0  26.0   67.0    -1.0   
 998  63.0  25.0         1.4  15.0     1.0    229.0  26.0  25.7  102.0   172.7   
 999  82.0  17.0         0.6  15.0     0.0    139.0  27.0  29.8   80.0   162.6   
 
      ...    pH    ALP     ALT     AST  Albumin  Bilirubin  Lactate  \
 0    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 1    ...  7.37    NaN     NaN     NaN      NaN        NaN      NaN   
 2    ...  7.47  105.0    75.0   164.0      2.3        2.8      0.9   
 3    ...   NaN  105.0    12.0    15.0      4.4        0.2      NaN   
 4    ...   NaN    NaN     NaN     NaN      3.3        NaN      NaN   
 5    ...  7.46  101.0    60.0   162.0      NaN        0.4      NaN   
 6    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 7    ...  7.37   47.0    46.0    82.0      1.9        0.3      1.8   
 8    ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 9    ...  7.38    NaN     NaN     NaN      NaN        NaN      NaN   
 10   ...   NaN  402.0    36.0    47.0      2.7        0.1      5.9   
 11   ...  7.41    NaN     NaN     NaN      NaN        NaN      NaN   
 12   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 13   ...   NaN   19.0    15.0    20.0      NaN        0.1      NaN   
 14   ...   NaN    NaN     NaN    57.0      2.9        NaN      NaN   
 15   ...  7.49    NaN     NaN     NaN      NaN        NaN      1.5   
 16   ...  7.36    NaN     NaN     NaN      NaN        NaN      NaN   
 17   ...   NaN    NaN     NaN     NaN      2.6        NaN      NaN   
 18   ...  7.39   51.0    10.0    20.0      2.5        1.6      0.8   
 19   ...  7.44    NaN     NaN     NaN      NaN        NaN      NaN   
 20   ...   NaN  173.0    63.0   152.0      2.0        8.0      NaN   
 21   ...  7.44    NaN     NaN     NaN      NaN        NaN      4.0   
 22   ...   NaN    NaN     NaN     NaN      NaN        NaN      1.6   
 23   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 24   ...  7.37    NaN     NaN     NaN      NaN        NaN      0.7   
 25   ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 26   ...  7.48    NaN     NaN     NaN      NaN        NaN      NaN   
 27   ...  7.45  112.0    13.0    20.0      2.0        2.0      1.3   
 28   ...  7.42    NaN     NaN     NaN      NaN        NaN      NaN   
 29   ...  7.53    NaN     NaN     NaN      NaN        NaN      NaN   
 ..   ...   ...    ...     ...     ...      ...        ...      ...   
 970  ...  7.34   65.0    19.0    18.0      3.6        0.5      NaN   
 971  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 972  ...  7.49   71.0    29.0    27.0      3.5        1.1      NaN   
 973  ...  7.43   64.0    78.0   280.0      3.6        0.7      NaN   
 974  ...  7.42   47.0    23.0    41.0      2.8        0.6      1.1   
 975  ...  7.45    NaN     NaN     NaN      3.9        NaN      1.4   
 976  ...  7.47  113.0    17.0    45.0      3.6        0.3      1.3   
 977  ...  7.37    NaN     NaN     NaN      NaN        NaN      3.9   
 978  ...  7.47    NaN     NaN     NaN      NaN        NaN      1.8   
 979  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 980  ...  7.35  108.0   116.0    48.0      2.7        0.5      0.9   
 981  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 982  ...  7.39    NaN     NaN     NaN      3.6        NaN      2.0   
 983  ...  7.35   65.0    20.0    30.0      NaN        0.3      NaN   
 984  ...  7.41  188.0    21.0    24.0      2.4        0.5      NaN   
 985  ...  7.48    NaN     NaN     NaN      NaN        NaN      2.1   
 986  ...  7.21    NaN     NaN     NaN      NaN        NaN      1.6   
 987  ...  7.32   42.0    31.0    26.0      2.8        0.6      1.6   
 988  ...  7.39    NaN     NaN     NaN      NaN        NaN      0.8   
 989  ...  7.46   48.0    37.0    68.0      3.0        0.7      1.0   
 990  ...   NaN  110.0    65.0    57.0      3.2        0.5      1.5   
 991  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 992  ...  7.40    NaN     NaN     NaN      NaN        NaN      NaN   
 993  ...   NaN    NaN     NaN     NaN      NaN        NaN      NaN   
 994  ...  7.35    NaN     NaN     NaN      NaN        NaN      NaN   
 995  ...   NaN   61.0  1343.0  1785.0      3.0        0.8      1.2   
 996  ...  7.35    NaN     NaN     NaN      NaN        NaN      1.4   
 997  ...  7.38    NaN     NaN     NaN      NaN        NaN      2.7   
 998  ...  7.43    NaN     NaN     NaN      NaN        NaN      NaN   
 999  ...  7.37    NaN     NaN     NaN      NaN        NaN      NaN   
 
      Cholesterol  TroponinI  TroponinT  
 0            NaN        NaN        NaN  
 1            NaN        NaN        NaN  
 2            NaN        NaN        NaN  
 3            NaN        NaN        NaN  
 4            NaN        NaN        NaN  
 5          212.0        1.3        NaN  
 6            NaN        0.8        NaN  
 7            NaN        3.1        NaN  
 8            NaN        NaN        NaN  
 9            NaN        NaN        NaN  
 10           NaN        NaN        NaN  
 11           NaN        NaN        NaN  
 12           NaN        NaN        NaN  
 13           NaN        NaN        NaN  
 14           NaN        6.6        NaN  
 15           NaN        NaN        NaN  
 16           NaN        NaN        NaN  
 17           NaN        NaN        NaN  
 18          84.0        NaN       0.31  
 19           NaN        NaN        NaN  
 20           NaN        NaN        NaN  
 21           NaN        NaN        NaN  
 22           NaN        NaN       0.13  
 23           NaN        NaN       0.37  
 24           NaN        NaN        NaN  
 25           NaN        1.8        NaN  
 26           NaN        NaN        NaN  
 27           NaN        NaN       0.02  
 28           NaN        NaN        NaN  
 29           NaN        NaN        NaN  
 ..           ...        ...        ...  
 970          NaN        NaN        NaN  
 971          NaN        NaN       0.08  
 972        190.0        NaN        NaN  
 973        150.0        NaN        NaN  
 974          NaN        NaN       0.59  
 975          NaN        NaN        NaN  
 976          NaN        NaN       4.77  
 977          NaN        NaN        NaN  
 978          NaN        NaN       0.05  
 979          NaN        NaN       0.02  
 980          NaN        1.2        NaN  
 981          NaN        NaN        NaN  
 982          NaN        NaN        NaN  
 983          NaN        NaN       1.38  
 984         96.0        4.6        NaN  
 985          NaN        NaN        NaN  
 986          NaN        NaN        NaN  
 987          NaN        NaN        NaN  
 988          NaN        NaN        NaN  
 989          NaN        NaN        NaN  
 990          NaN        NaN       0.29  
 991          NaN        NaN        NaN  
 992          NaN        NaN        NaN  
 993          NaN        NaN        NaN  
 994          NaN        NaN        NaN  
 995          NaN        NaN       0.39  
 996          NaN        NaN        NaN  
 997          NaN        NaN        NaN  
 998          NaN        NaN        NaN  
 999          NaN        NaN        NaN  
 
 [1000 rows x 42 columns],
 'Fold2':        ALP     ALT     AST   Age    BUN  Bilirubin  Creatinine  DiasABP  FiO2  \
 0     66.0    26.0    77.0  56.0   17.0        1.7         0.7     80.0  1.00   
 1      NaN     NaN     NaN  72.0   39.0        NaN         5.0     69.0  0.50   
 2      NaN     NaN     NaN  68.0   49.0        NaN         3.7     70.0  0.40   
 3      NaN     NaN     NaN  77.0   13.0        NaN         1.1     59.0  0.50   
 4      NaN     NaN     NaN  66.0   12.0        NaN         0.8     47.0   NaN   
 5     31.0   122.0    95.0  35.0   13.0        0.3         0.6      NaN   NaN   
 6      NaN     NaN     NaN  79.0   28.0        NaN         1.0      NaN   NaN   
 7     46.0    67.0   123.0  44.0   34.0        1.0         0.9     52.0  0.50   
 8      NaN     NaN     NaN  21.0   16.0        NaN         1.0     62.0  0.40   
 9     81.0  3633.0  4146.0  71.0   25.0        0.9         2.5     59.0  0.60   
 10    45.0    31.0    85.0  90.0   38.0        0.3         1.3     62.0   NaN   
 11     NaN     NaN     NaN  53.0   23.0        NaN         1.1      NaN  0.35   
 12     NaN     NaN     NaN  70.0   34.0        0.9         1.4      0.0  0.50   
 13     NaN     NaN     NaN  70.0   10.0        NaN         0.7     53.0  0.40   
 14     NaN     NaN     NaN  47.0   17.0        0.5         0.6      NaN  0.40   
 15     NaN     NaN     NaN  47.0   24.0        NaN         1.3     54.0  0.50   
 16     NaN     NaN     NaN  57.0   14.0        NaN         0.5      NaN   NaN   
 17   103.0    28.0    34.0  88.0   76.0        0.7         4.2     54.0  0.95   
 18     NaN    95.0    90.0  90.0   24.0        0.3         0.8      NaN   NaN   
 19    97.0     6.0     7.0  68.0   41.0        0.2         1.5      NaN  0.30   
 20     NaN     NaN     NaN  51.0   22.0        NaN         0.8     58.0   NaN   
 21    75.0    55.0    77.0  52.0   19.0        0.5         1.1     47.0  0.40   
 22    89.0    40.0    32.0  49.0    5.0        0.9         0.1      NaN   NaN   
 23     NaN     NaN     NaN  66.0   14.0        NaN         1.1      NaN  1.00   
 24     NaN     NaN     NaN  78.0   11.0        NaN         0.7     60.0  0.40   
 25   102.0    66.0    52.0  45.0   10.0        0.6         0.6     79.0  0.60   
 26    66.0    16.0    22.0  90.0   25.0        0.4         1.2      NaN   NaN   
 27     NaN     NaN     NaN  83.0   25.0        NaN         1.3      NaN   NaN   
 28    99.0    21.0    85.0  51.0    9.0       10.9         0.6     42.0  0.50   
 29    94.0   392.0   158.0  49.0  137.0        1.0         9.6      NaN   NaN   
 ..     ...     ...     ...   ...    ...        ...         ...      ...   ...   
 970   65.0    13.0    27.0  85.0   30.0        1.9         0.9      NaN  1.00   
 971    NaN     NaN     NaN  60.0   28.0        NaN         1.0     70.0  0.40   
 972   41.0    45.0    61.0  23.0    7.0        1.5         0.7     78.0  0.40   
 973    NaN     NaN     NaN  63.0   14.0        NaN         0.7     70.0   NaN   
 974    NaN     NaN     NaN  65.0   13.0        NaN         0.9     40.0   NaN   
 975   48.0    14.0    14.0  74.0   23.0        0.5         1.2      NaN   NaN   
 976    NaN     NaN     NaN  24.0   20.0        NaN         1.8      NaN   NaN   
 977    NaN     NaN     NaN  32.0    8.0        NaN         0.7      NaN   NaN   
 978   87.0    41.0    31.0  40.0   21.0        1.1         1.4    105.0   NaN   
 979    NaN     NaN     NaN  43.0   37.0        NaN         2.1     65.0  0.60   
 980   56.0    26.0    57.0  79.0   10.0        0.8         0.4     62.0  0.30   
 981    NaN     NaN     NaN  48.0   18.0        NaN         0.6     56.0  0.60   
 982    NaN     NaN     NaN  41.0    6.0        NaN         0.6      NaN   NaN   
 983   25.0     7.0    17.0  81.0   21.0        0.8         0.9     63.0  0.60   
 984    NaN     NaN     NaN  67.0    9.0        NaN         0.7     75.0  0.40   
 985    NaN     NaN     NaN  55.0   17.0        NaN         0.7     69.0  0.50   
 986    NaN     NaN     NaN  22.0    6.0        NaN         0.8     80.0  0.40   
 987   35.0    43.0    69.0  80.0   51.0        0.4         1.9     52.0  0.60   
 988  289.0   414.0   216.0  90.0   37.0        1.4         1.4      NaN  0.80   
 989   73.0    35.0    22.0  65.0   20.0        0.6         0.5     56.0   NaN   
 990    NaN     NaN     NaN  63.0   32.0        NaN         1.0     45.0  0.70   
 991  318.0   129.0    28.0  63.0   16.0        1.3         0.9      NaN   NaN   
 992    NaN     NaN     NaN  64.0   16.0        NaN         0.8      NaN   NaN   
 993   41.0    16.0    25.0  40.0   15.0        2.5         0.6     75.0  0.40   
 994    NaN     NaN     NaN  80.0   18.0        NaN         1.2     66.0   NaN   
 995   62.0    35.0   119.0  87.0   16.0        0.5         0.6     39.0  0.40   
 996   72.0    19.0    53.0  90.0   48.0        NaN         2.0      NaN   NaN   
 997    NaN     NaN     NaN  79.0   23.0        NaN         0.9     50.0  0.40   
 998    NaN     NaN     NaN  88.0   14.0        NaN         1.3     65.0   NaN   
 999    NaN     NaN     NaN  61.0   14.0        NaN         0.9      NaN   NaN   
 
       GCS  ...    WBC  Weight    pH  Albumin  Lactate  TroponinT  SaO2  \
 0     9.0  ...  16.00  108.10  7.43      NaN      NaN        NaN   NaN   
 1    15.0  ...  13.20  100.00  7.32      2.7      NaN        NaN   NaN   
 2    15.0  ...   6.60  104.90  7.34      NaN      1.5       1.40   NaN   
 3    15.0  ...   8.60   87.60  7.37      NaN      2.8        NaN  98.0   
 4    15.0  ...    NaN   73.40  7.40      NaN      NaN        NaN  99.0   
 5    15.0  ...   6.70   -1.00   NaN      3.6      NaN        NaN   NaN   
 6    15.0  ...  11.20   81.70   NaN      NaN      NaN        NaN   NaN   
 7    10.0  ...  28.00   70.00  7.51      NaN      1.1        NaN  94.0   
 8    15.0  ...   9.40   84.00  7.45      NaN      1.9        NaN   NaN   
 9     9.0  ...  31.00  123.10  7.49      2.4      5.7        NaN  95.0   
 10   15.0  ...   7.50   55.50   NaN      2.5      2.9       0.53   NaN   
 11   15.0  ...   9.00   74.80  7.44      NaN      1.3        NaN   NaN   
 12    6.0  ...   3.20   78.40  7.36      NaN      1.4       0.98  98.0   
 13   15.0  ...  16.90   99.40  7.42      NaN      NaN        NaN  98.0   
 14   10.0  ...   3.20   56.00   NaN      NaN      NaN        NaN   NaN   
 15   15.0  ...  16.20   82.20  7.41      NaN      NaN        NaN  97.0   
 16   15.0  ...   8.70   68.80   NaN      NaN      NaN        NaN   NaN   
 17   15.0  ...  16.80   77.20  7.38      2.4      1.5        NaN   NaN   
 18   14.0  ...  20.50   56.40   NaN      3.3      1.6       0.03   NaN   
 19   15.0  ...  15.00  135.20   NaN      2.6      NaN        NaN   NaN   
 20   15.0  ...  21.10   75.10  7.36      NaN      1.4        NaN   NaN   
 21   10.0  ...  14.40   71.00  7.40      3.5      3.5        NaN   NaN   
 22    NaN  ...   0.75   -1.00   NaN      NaN      0.7        NaN   NaN   
 23   14.0  ...  11.50   96.40  7.40      NaN      2.4        NaN  95.0   
 24   15.0  ...   8.00  102.50  7.42      NaN      NaN        NaN  97.0   
 25    6.0  ...  35.60  115.80  7.43      2.7      1.2        NaN  97.0   
 26   13.0  ...   6.90  128.60   NaN      3.6      NaN       0.01   NaN   
 27   15.0  ...   8.50   80.00   NaN      NaN      NaN       0.10   NaN   
 28    7.0  ...  10.60   73.90  7.44      3.0      NaN        NaN  97.0   
 29   15.0  ...  14.70   88.60  7.44      2.4      5.1       2.49   NaN   
 ..    ...  ...    ...     ...   ...      ...      ...        ...   ...   
 970  15.0  ...  56.40   -1.00   NaN      3.3      NaN        NaN   NaN   
 971  15.0  ...   7.60   91.57  7.46      NaN      NaN        NaN  97.0   
 972   4.0  ...  17.00   -1.00  7.52      3.7      1.9        NaN   NaN   
 973  15.0  ...  11.10  146.60  7.42      NaN      NaN        NaN  96.0   
 974  15.0  ...   7.10   70.80  7.39      NaN      NaN        NaN   NaN   
 975   3.0  ...   4.50   59.20   NaN      1.7      5.0        NaN   NaN   
 976  15.0  ...   8.20  134.20  7.53      NaN      NaN        NaN   NaN   
 977  15.0  ...   7.80  102.00   NaN      NaN      NaN        NaN   NaN   
 978  15.0  ...  10.70   72.00  7.48      NaN      NaN        NaN  97.0   
 979   6.0  ...  15.20  107.10  7.27      1.5      1.3       0.48  96.0   
 980   7.0  ...   5.10   88.60  7.42      2.9      NaN        NaN  97.0   
 981  15.0  ...  10.20   79.10  7.39      NaN      NaN        NaN   NaN   
 982  15.0  ...   4.90  143.80   NaN      NaN      NaN        NaN   NaN   
 983   8.0  ...  12.00   70.00  7.35      1.8      0.9       0.02  96.0   
 984  15.0  ...   8.80   70.00  7.44      NaN      1.3       0.14   NaN   
 985  15.0  ...   8.50   91.60  7.33      NaN      NaN        NaN  92.0   
 986   3.0  ...  12.60   75.00  7.53      NaN      2.0        NaN   NaN   
 987   3.0  ...  23.30   71.00  7.35      2.2      3.4       0.03  97.0   
 988  15.0  ...  15.20   65.00  7.24      3.1      1.4       1.31   NaN   
 989  15.0  ...  23.60   67.00  7.43      2.1      1.3       0.04  95.0   
 990  15.0  ...  13.40   98.00  7.32      NaN      NaN        NaN  98.0   
 991  15.0  ...  13.70   86.60   NaN      2.6      NaN        NaN   NaN   
 992  15.0  ...   9.70   -1.00   NaN      NaN      NaN        NaN   NaN   
 993  10.0  ...   7.80  101.00  7.45      2.7      0.8        NaN  97.0   
 994  15.0  ...   5.90   -1.00   NaN      3.1      2.0       0.02   NaN   
 995  11.0  ...  11.40   39.20  7.46      3.4      1.7       1.14   NaN   
 996  15.0  ...   9.10   83.30   NaN      NaN      NaN        NaN   NaN   
 997  15.0  ...  14.80   75.80  7.41      NaN      1.5        NaN  94.0   
 998  15.0  ...  14.20   59.00  7.43      NaN      NaN        NaN   NaN   
 999  15.0  ...   5.50   71.00   NaN      NaN      NaN        NaN   NaN   
 
      RespRate  Cholesterol  TroponinI  
 0         NaN          NaN        NaN  
 1         NaN          NaN        NaN  
 2         NaN          NaN        NaN  
 3         NaN          NaN        NaN  
 4         NaN          NaN        NaN  
 5        21.0          NaN        NaN  
 6         NaN          NaN        NaN  
 7         NaN          NaN        NaN  
 8         NaN          NaN        NaN  
 9         NaN          NaN        NaN  
 10       24.0         91.0        NaN  
 11        NaN          NaN        NaN  
 12        NaN          NaN        NaN  
 13        NaN          NaN        NaN  
 14        NaN          NaN        NaN  
 15        NaN          NaN        NaN  
 16       20.0        243.0        NaN  
 17        NaN          NaN        NaN  
 18       25.0          NaN        NaN  
 19        NaN          NaN        NaN  
 20       12.0          NaN        NaN  
 21        NaN          NaN        NaN  
 22        NaN          NaN        NaN  
 23        NaN          NaN        NaN  
 24        NaN          NaN        NaN  
 25        NaN          NaN        NaN  
 26       10.0          NaN        NaN  
 27       15.0          NaN        NaN  
 28        NaN          NaN        NaN  
 29        NaN          NaN        NaN  
 ..        ...          ...        ...  
 970      20.0          NaN        NaN  
 971       NaN          NaN        NaN  
 972       NaN          NaN        NaN  
 973       NaN          NaN        NaN  
 974       NaN        207.0        NaN  
 975      30.0          NaN        NaN  
 976      18.0          NaN        NaN  
 977      17.0          NaN        NaN  
 978      21.0          NaN        NaN  
 979       NaN          NaN        NaN  
 980       NaN          NaN        NaN  
 981       NaN          NaN        NaN  
 982      16.0          NaN        NaN  
 983       NaN          NaN        NaN  
 984       NaN          NaN        NaN  
 985       NaN          NaN        NaN  
 986       NaN          NaN        NaN  
 987       NaN          NaN        NaN  
 988       NaN          NaN        NaN  
 989       NaN          NaN        NaN  
 990       NaN        217.0       10.2  
 991      25.0          NaN        NaN  
 992      18.0          NaN        NaN  
 993       NaN          NaN        NaN  
 994      22.0          NaN        NaN  
 995       NaN        158.0        NaN  
 996      19.0          NaN        NaN  
 997       NaN          NaN        NaN  
 998      19.0          NaN        NaN  
 999       NaN          NaN        NaN  
 
 [1000 rows x 42 columns],
 'Fold3':       Age   BUN  Creatinine  DiasABP  FiO2   GCS  Gender  Glucose  HCO3   HCT  \
 0    57.0  17.0         0.7     69.0  0.70  15.0     1.0    186.0  30.0  30.5   
 1    87.0  16.0         0.8     55.0   NaN  11.0     1.0     92.0  27.0  36.3   
 2    73.0  12.0         0.8     62.0  0.50  11.0     0.0    162.0  23.0  32.4   
 3    72.0  57.0         4.3      NaN  0.40  11.0     0.0    207.0  17.0  23.2   
 4    76.0  23.0         1.4     50.0  0.50  14.0     1.0    143.0  23.0  25.1   
 5    59.0   9.0         0.4     62.0  0.40   9.0     0.0    128.0  24.0  34.3   
 6    76.0  61.0         4.0     57.0  0.40  14.0     1.0    215.0  25.0  28.5   
 7    43.0  47.0         5.0      NaN   NaN  15.0     0.0     82.0  27.0  30.3   
 8    60.0   9.0         0.7      NaN   NaN  15.0     0.0    163.0  28.0  27.5   
 9    60.0  12.0         0.6     60.0  0.35   7.0     1.0    131.0  23.0  27.5   
 10   60.0  12.0         0.9     55.0   NaN  15.0     1.0    127.0  25.0  36.4   
 11   69.0  48.0         2.2      NaN  0.50  15.0     1.0    113.0  31.0  35.2   
 12   74.0  16.0         0.8     50.0  0.50  15.0     0.0    119.0  24.0  31.4   
 13   78.0  41.0         5.5     62.0  0.35  15.0     0.0    141.0  30.0  31.4   
 14   82.0  15.0         0.7     99.0  0.60  15.0     0.0    118.0  23.0  29.6   
 15   24.0  13.0         0.7      NaN   NaN  14.0     0.0    166.0  25.0  36.1   
 16   87.0  23.0         1.1      NaN  0.35  11.0     1.0    170.0  25.0  21.7   
 17   90.0  14.0         1.1     83.0  0.60   9.0     1.0    109.0  21.0  32.2   
 18   68.0  22.0         1.5     88.0  1.00  14.0     1.0    118.0  24.0  35.2   
 19   72.0  41.0         0.7     62.0  0.50  15.0     0.0    161.0  31.0  29.6   
 20   81.0  22.0         0.8      NaN   NaN  14.0     0.0    114.0  34.0  37.0   
 21   75.0  22.0         1.4     71.0  0.70  15.0     1.0    176.0  24.0  32.6   
 22   70.0  30.0         1.1     45.0  0.35  15.0     1.0    200.0  20.0  30.1   
 23   77.0  32.0         3.2     34.0  0.50   3.0     1.0     96.0  26.0  28.1   
 24   72.0  28.0         0.5     59.0  0.70  14.0     0.0    202.0  24.0  27.1   
 25   68.0  39.0         2.5     40.0  0.50  15.0     1.0    121.0  19.0  24.9   
 26   63.0  23.0         1.1     78.0   NaN   3.0     1.0    156.0  23.0  34.6   
 27   68.0  42.0         3.0     47.0  0.50  10.0     1.0    219.0  20.0  25.8   
 28   76.0  31.0         2.1     76.0   NaN  15.0     1.0    131.0  21.0  27.1   
 29   49.0  23.0         0.9     40.0  0.40  15.0     1.0    118.0  24.0  25.5   
 ..    ...   ...         ...      ...   ...   ...     ...      ...   ...   ...   
 970  68.0  19.0         0.9     47.0  0.50  15.0     1.0    126.0  28.0  34.8   
 971  70.0  17.0         1.0     65.0   NaN  15.0     1.0    144.0  24.0  33.7   
 972  74.0  23.0         0.9     62.0  0.40  15.0     0.0    112.0  27.0  30.1   
 973  79.0   8.0         0.6      NaN   NaN  15.0     0.0     74.0  24.0  32.4   
 974  55.0  23.0         1.0     49.0  0.40  15.0     0.0      NaN  27.0  24.4   
 975  48.0  23.0         0.8     98.0   NaN  15.0     0.0    123.0  27.0  26.4   
 976  74.0  23.0         1.1      NaN  0.70  15.0     1.0     97.0  25.0  30.4   
 977  69.0  30.0         1.2     69.0  0.40  11.0     1.0     70.0  23.0  23.7   
 978  79.0  32.0         1.2      NaN   NaN  15.0     0.0    106.0  21.0  29.5   
 979  25.0   6.0         0.6      NaN   NaN  14.0     0.0    132.0  28.0  27.1   
 980  76.0  26.0         1.1     61.0  0.40  12.0     1.0    203.0  25.0  38.7   
 981  73.0  75.0         3.1      NaN   NaN  15.0     1.0    143.0  14.0  35.1   
 982  90.0   8.0         0.5      NaN   NaN  14.0     0.0    111.0  28.0  29.9   
 983  51.0   6.0         0.6      NaN   NaN  15.0     0.0     89.0  27.0  31.0   
 984  75.0   7.0         0.6     52.0  0.40   8.0     0.0    167.0  21.0  31.3   
 985  57.0  22.0         0.7     53.0  0.40  10.0     1.0    235.0  27.0  29.2   
 986  37.0   9.0         0.9      NaN   NaN  15.0     0.0    108.0  25.0  24.8   
 987  35.0  13.0         1.1     47.0  0.40   9.0     1.0    110.0  24.0  24.4   
 988  26.0  10.0         0.8     66.0  0.50   3.0     1.0    110.0  24.0  27.0   
 989  84.0  26.0         0.9     47.0   NaN  15.0     0.0     90.0  20.0  35.4   
 990  76.0  14.0         0.8     51.0  0.50  15.0     1.0    146.0   NaN  30.9   
 991  90.0  34.0         1.3      NaN  0.50  10.0     0.0    187.0  29.0  30.3   
 992  59.0  47.0         6.9     55.0  0.50   9.0     1.0    118.0  30.0  23.0   
 993  55.0   7.0         0.7      NaN   NaN  12.0     1.0    125.0  20.0  28.4   
 994  66.0  21.0         0.9     80.0  0.50  15.0     1.0    167.0  19.0  36.0   
 995  63.0  12.0         0.7     51.0  0.35  15.0     1.0    152.0  20.0  33.6   
 996  26.0   7.0         0.7     77.0  0.40  15.0     1.0     79.0  26.0  31.4   
 997  78.0  32.0         1.4     53.0  0.50  15.0     1.0     92.0  27.0  30.9   
 998  77.0  11.0         0.8    135.0  0.40  14.0     1.0     75.0  18.0  35.0   
 999  38.0   7.0         0.5     57.0  0.50  15.0     0.0    108.0  23.0  27.2   
 
      ...  RespRate    ALP    ALT    AST  Bilirubin  Lactate  Albumin  \
 0    ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 1    ...      10.0    NaN    NaN    NaN        NaN      NaN      NaN   
 2    ...       NaN   28.0   17.0   29.0        0.6     1.80      NaN   
 3    ...       NaN    NaN    NaN    NaN        NaN     2.60      NaN   
 4    ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 5    ...       NaN   66.0   19.0   38.0        0.4     1.50      3.4   
 6    ...       NaN  118.0   68.0   84.0        0.3      NaN      3.4   
 7    ...      16.0    NaN    NaN    NaN        NaN      NaN      NaN   
 8    ...      41.0    NaN    NaN    NaN        NaN      NaN      NaN   
 9    ...       NaN    NaN    NaN    NaN        NaN     2.30      3.3   
 10   ...      16.0    NaN    NaN    NaN        NaN      NaN      NaN   
 11   ...       NaN   57.0   10.0   16.0        0.3     1.20      3.7   
 12   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 13   ...       NaN  101.0    9.0   17.0        0.4     1.80      3.6   
 14   ...      19.0    NaN    NaN    NaN        NaN     1.90      NaN   
 15   ...      20.0    NaN    NaN    NaN        NaN      NaN      NaN   
 16   ...      22.0   67.0   13.0   20.0        0.7     7.60      3.0   
 17   ...       NaN    NaN    NaN    NaN        NaN     1.70      NaN   
 18   ...       NaN   25.0   27.0   45.0        0.6     1.60      NaN   
 19   ...       NaN   98.0   17.0   21.0        1.0     0.90      2.7   
 20   ...      21.0    NaN    NaN    NaN        NaN      NaN      NaN   
 21   ...       NaN    NaN    NaN    NaN        NaN     0.90      NaN   
 22   ...       NaN    NaN    NaN    NaN        NaN     1.91      NaN   
 23   ...       NaN   43.0   24.0  458.0        0.9     3.50      NaN   
 24   ...      15.0   36.0   31.0   17.0        2.9      NaN      5.3   
 25   ...       NaN    NaN    NaN    NaN        NaN     1.60      NaN   
 26   ...       NaN   60.0   21.0   72.0        0.8     2.90      3.1   
 27   ...       NaN  190.0  124.0   33.0        0.7     1.60      2.1   
 28   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 29   ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 ..   ...       ...    ...    ...    ...        ...      ...      ...   
 970  ...       NaN   41.0   76.0   85.0        1.2     3.30      NaN   
 971  ...      27.0    NaN    NaN    NaN        NaN     1.40      NaN   
 972  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 973  ...      20.0   75.0   12.0   17.0        0.3      NaN      3.0   
 974  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 975  ...       NaN   64.0   10.0   17.0        0.2     2.70      2.2   
 976  ...       NaN    NaN    NaN    NaN        0.2     1.60      2.6   
 977  ...       NaN    NaN    NaN    NaN        NaN     1.80      NaN   
 978  ...      13.0    NaN    NaN    NaN        NaN      NaN      NaN   
 979  ...      12.0    NaN    NaN    NaN        NaN      NaN      NaN   
 980  ...       NaN    NaN   14.0   16.0        NaN      NaN      3.4   
 981  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 982  ...      27.0    NaN    NaN    NaN        NaN      NaN      NaN   
 983  ...      17.0   66.0   74.0   55.0        0.3      NaN      3.7   
 984  ...       NaN    NaN    NaN    NaN        NaN     1.50      NaN   
 985  ...       NaN  132.0  146.0  188.0        1.1     1.60      2.2   
 986  ...      21.0  127.0   68.0   18.0        0.3      NaN      NaN   
 987  ...       NaN    NaN    NaN    NaN        NaN     1.90      NaN   
 988  ...       NaN   68.0    9.0   15.0        0.4     2.70      NaN   
 989  ...       NaN    NaN    NaN    NaN        NaN      NaN      2.6   
 990  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 991  ...       NaN   49.0    9.0   10.0        0.6     1.20      2.7   
 992  ...       NaN    NaN    NaN    NaN        NaN     1.50      NaN   
 993  ...      23.0    NaN    NaN    NaN        NaN      NaN      NaN   
 994  ...       NaN  217.0  468.0  464.0        0.9     2.20      1.9   
 995  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 996  ...       NaN   90.0   12.0   54.0        1.1     0.90      3.6   
 997  ...       NaN    NaN    NaN    NaN        NaN     1.40      NaN   
 998  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 999  ...       NaN    NaN    NaN    NaN        NaN      NaN      NaN   
 
      TroponinT  TroponinI  Cholesterol  
 0          NaN        NaN          NaN  
 1          NaN        NaN          NaN  
 2          NaN        NaN          NaN  
 3          NaN        NaN          NaN  
 4          NaN        NaN          NaN  
 5         0.67        NaN          NaN  
 6          NaN        NaN          NaN  
 7          NaN        NaN          NaN  
 8          NaN        NaN          NaN  
 9          NaN        NaN          NaN  
 10         NaN        NaN          NaN  
 11        0.08        NaN          NaN  
 12         NaN        NaN          NaN  
 13        0.06        NaN          NaN  
 14         NaN        NaN          NaN  
 15         NaN        NaN          NaN  
 16        0.18        NaN          NaN  
 17         NaN        NaN          NaN  
 18         NaN        NaN          NaN  
 19        0.03        NaN          NaN  
 20        0.03        NaN          NaN  
 21         NaN        NaN          NaN  
 22         NaN        NaN          NaN  
 23       11.58        NaN          NaN  
 24         NaN        NaN          NaN  
 25         NaN        NaN          NaN  
 26         NaN        NaN          NaN  
 27         NaN        NaN          NaN  
 28         NaN       13.1          NaN  
 29         NaN        NaN          NaN  
 ..         ...        ...          ...  
 970        NaN       15.0          NaN  
 971        NaN        NaN          NaN  
 972        NaN        NaN          NaN  
 973        NaN        NaN          NaN  
 974        NaN        NaN          NaN  
 975        NaN        NaN          NaN  
 976       0.50        NaN        116.0  
 977       0.11        NaN          NaN  
 978        NaN        NaN          NaN  
 979        NaN        NaN          NaN  
 980       0.01        NaN        237.0  
 981        NaN        NaN          NaN  
 982        NaN        NaN          NaN  
 983        NaN        NaN          NaN  
 984       0.14        NaN          NaN  
 985        NaN        NaN          NaN  
 986        NaN        NaN          NaN  
 987        NaN        NaN          NaN  
 988        NaN        NaN          NaN  
 989        NaN        NaN          NaN  
 990        NaN        NaN          NaN  
 991        NaN        NaN          NaN  
 992        NaN        NaN          NaN  
 993        NaN        NaN          NaN  
 994        NaN        NaN          NaN  
 995        NaN        NaN          NaN  
 996       0.21        NaN          NaN  
 997        NaN        NaN          NaN  
 998        NaN        NaN          NaN  
 999        NaN        NaN          NaN  
 
 [1000 rows x 42 columns],
 'Fold4':       Age    BUN  Creatinine  FiO2   GCS  Gender  Glucose  HCO3   HCT     HR  \
 0    39.0   13.0         0.5  0.40  10.0     0.0     86.0  33.0  32.9   97.0   
 1    70.0   26.0         0.5  0.50  11.0     0.0    144.0  30.0  29.4   89.0   
 2    61.0   18.0         0.9  0.50  15.0     1.0     99.0  30.0  28.8   98.0   
 3    64.0   14.0         1.0  0.60  15.0     1.0    157.0  22.0  26.3  106.0   
 4    45.0   22.0         0.6  0.40  11.0     1.0    139.0  28.0  27.6   84.0   
 5    77.0   29.0         1.7   NaN  13.0     1.0     98.0  23.0  29.9   87.0   
 6    90.0   18.0         0.8  0.40   9.0     0.0     65.0  24.0  29.6  114.0   
 7    66.0   11.0         1.0  0.60  14.0     1.0    110.0  23.0  30.6   90.0   
 8    54.0    9.0         0.7   NaN   NaN     1.0    103.0  26.0  39.7    NaN   
 9    74.0   11.0         0.8  0.40  15.0     1.0    142.0  27.0  24.7   74.0   
 10   73.0   20.0         3.6   NaN  15.0     1.0    141.0  14.0  32.8  107.0   
 11   62.0   20.0         0.7   NaN  13.0     1.0    135.0  26.0  21.6   79.0   
 12   56.0   15.0         1.1  0.40   7.0     1.0    129.0  15.0  39.3   62.0   
 13   57.0   10.0         0.4  0.50  15.0     0.0    106.0  30.0  24.8   92.0   
 14   74.0   14.0         0.8  0.70   3.0     0.0     91.0  23.0  30.2   72.0   
 15   74.0   22.0         1.1  0.50  15.0     1.0     65.0  25.0  30.6   90.0   
 16   67.0   35.0         0.8  0.35  15.0     1.0    173.0  33.0  36.0   99.0   
 17   49.0   13.0         0.9   NaN  14.0     1.0    167.0  24.0  35.5   74.0   
 18   58.0   46.0         2.5  0.50   8.0     0.0    140.0  16.0  24.3   77.0   
 19   72.0   39.0         1.0  0.40   4.0     0.0    159.0  21.0  30.0   83.0   
 20   79.0   31.0         1.3  0.50  14.0     1.0    107.0  19.0  27.1   77.0   
 21   82.0   26.0         1.1  0.40  15.0     1.0    117.0  23.0  31.5   85.0   
 22   45.0   15.0         0.8  0.50   7.0     1.0    208.0  22.0  24.7   92.0   
 23   68.0   31.0         1.3   NaN  15.0     1.0    124.0  23.0  31.7   75.0   
 24   59.0   25.0         1.4  0.40  15.0     0.0    102.0  26.0  26.1   80.0   
 25   24.0   14.0         0.9  0.40  12.0     1.0    141.0  27.0  20.4   70.0   
 26   52.0   25.0         1.0   NaN  15.0     1.0    106.0  15.0  23.4   66.0   
 27   52.0   21.0         0.8   NaN  15.0     0.0    114.0  23.0  29.0   76.0   
 28   85.0   56.0         1.5   NaN  13.0     0.0    108.0  16.0  34.0  108.0   
 29   59.0   68.0         5.9  0.40   9.0     0.0    100.0  19.0  29.1  106.0   
 ..    ...    ...         ...   ...   ...     ...      ...   ...   ...    ...   
 970  69.0   22.0         1.0  0.50  15.0     1.0    120.0  27.0  24.9   84.0   
 971  67.0    9.0         0.6   NaN  15.0     1.0    147.0  18.0  26.2   78.0   
 972  78.0   10.0         0.5  0.40  11.0     0.0     66.0  27.0  30.5   92.0   
 973  61.0   26.0         0.7  0.40   6.0     1.0    136.0  25.0  27.7   90.0   
 974  60.0   17.0         1.0   NaN  15.0     1.0    112.0  26.0  33.0   74.0   
 975  38.0    7.0         0.5  0.50   6.0     0.0    138.0  21.0  25.0   80.0   
 976  55.0   23.0         0.7  0.60   9.0     0.0    140.0  21.0  27.9   72.0   
 977  57.0   10.0         0.5  0.40   6.0     1.0    126.0  25.0  24.3  106.0   
 978  85.0   40.0         1.1   NaN  15.0     0.0    101.0  20.0  33.9   60.0   
 979  83.0   48.0         3.2  1.00   7.0     1.0    156.0  21.0  40.1   86.0   
 980  80.0   13.0         0.6   NaN  14.0     1.0    103.0  17.0  26.9   62.0   
 981  67.0   27.0         0.6  0.50   8.0     0.0    107.0  26.0  28.3   71.0   
 982  73.0   29.0         1.7  0.60  15.0     0.0    111.0  32.0  27.5   67.0   
 983  74.0   59.0         5.5   NaN  15.0     0.0    181.0  17.0  31.4   75.0   
 984  65.0    6.0         0.5   NaN  15.0     1.0    220.0  23.0  36.3  108.0   
 985  50.0    6.0         0.5   NaN  15.0     0.0     88.0  21.0  40.1   75.0   
 986  34.0    9.0         0.8   NaN  15.0     1.0    118.0  24.0  33.4   65.0   
 987  75.0   20.0         1.1   NaN  15.0     1.0    109.0  21.0  28.4   74.0   
 988  72.0   16.0         0.6  0.70  13.0     0.0    149.0  31.0  35.0   84.0   
 989  66.0   96.0         6.5   NaN  15.0     1.0    131.0  12.0  60.3   56.0   
 990  43.0   20.0         1.0  0.40  15.0     1.0     95.0  25.0  35.6   97.0   
 991  88.0   39.0         1.6  1.00  15.0     1.0    112.0  21.0  32.6   60.0   
 992  89.0   14.0         1.0   NaN  11.0     1.0     96.0  23.0  36.3   83.0   
 993  86.0   69.0         2.2   NaN  15.0     1.0    102.0  20.0  31.7   70.0   
 994  51.0   15.0         0.5  0.40  10.0     0.0    111.0  27.0  29.1  106.0   
 995  70.0   18.0         1.0  0.50  15.0     0.0    106.0  22.0  30.3   89.0   
 996  25.0    7.0         0.7   NaN  15.0     1.0     88.0  28.0  31.9   80.0   
 997  44.0    6.0         1.0  0.40   5.0     1.0    132.0  25.0  37.8   86.0   
 998  37.0  114.0        11.7  0.50   3.0     1.0    118.0  21.0  27.1   82.0   
 999  78.0   24.0         1.5  0.50  14.0     0.0    126.0  19.0  30.7   84.0   
 
      ...  SysABP   SaO2    ALP     ALT     AST  Bilirubin  Cholesterol  \
 0    ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 1    ...   120.0    NaN    NaN     NaN     NaN        NaN          NaN   
 2    ...    82.0   98.0    NaN     NaN     NaN        NaN          NaN   
 3    ...   101.0   94.0   71.0    83.0    60.0        0.6          NaN   
 4    ...   130.0   97.0   46.0    34.0    43.0        0.7          NaN   
 5    ...   139.0  100.0  112.0    39.0    87.0        0.5          NaN   
 6    ...     NaN    NaN   88.0    19.0    22.0        0.7        105.0   
 7    ...    96.0    NaN    NaN     NaN     NaN        NaN          NaN   
 8    ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 9    ...   113.0   98.0    NaN     NaN     NaN        NaN          NaN   
 10   ...   136.0    NaN   84.0    55.0    77.0        1.0          NaN   
 11   ...   124.0    NaN    NaN     NaN     NaN        NaN          NaN   
 12   ...    90.0  100.0    NaN     NaN     NaN        0.7        218.0   
 13   ...   117.0   95.0    NaN     NaN     NaN        NaN          NaN   
 14   ...   116.0   98.0    NaN     NaN     NaN        NaN          NaN   
 15   ...   153.0   98.0    NaN     NaN     NaN        NaN          NaN   
 16   ...   140.0    NaN   93.0    63.0    22.0        0.3        101.0   
 17   ...   101.0   97.0    NaN     NaN     NaN        NaN        204.0   
 18   ...   125.0   98.0   84.0     7.0    34.0        0.6          NaN   
 19   ...   141.0   99.0    NaN     NaN     NaN        NaN          NaN   
 20   ...   141.0   97.0    NaN     NaN     NaN        NaN          NaN   
 21   ...   120.0   96.0    NaN     NaN     NaN        NaN          NaN   
 22   ...   108.0    NaN    NaN     NaN     NaN        NaN          NaN   
 23   ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 24   ...   141.0    NaN    NaN     NaN     NaN        NaN          NaN   
 25   ...   152.0    NaN    NaN     NaN     NaN        NaN          NaN   
 26   ...     NaN    NaN  255.0    35.0    50.0       10.4          NaN   
 27   ...   122.0    NaN    NaN     NaN     NaN        NaN          NaN   
 28   ...     NaN    NaN  322.0    50.0    26.0        2.3          NaN   
 29   ...   149.0    NaN    NaN     NaN     NaN        NaN          NaN   
 ..   ...     ...    ...    ...     ...     ...        ...          ...   
 970  ...   105.0    NaN    NaN     NaN     NaN        NaN          NaN   
 971  ...     NaN    NaN   55.0    18.0    21.0        0.7          NaN   
 972  ...   108.0   97.0    NaN     NaN     NaN        NaN          NaN   
 973  ...   157.0   99.0    NaN     NaN     NaN        NaN          NaN   
 974  ...   111.0    NaN   98.0    54.0   223.0        0.6          NaN   
 975  ...   139.0   98.0    NaN    29.0    46.0        NaN          NaN   
 976  ...   150.0   94.0   71.0    38.0    49.0        0.3          NaN   
 977  ...   103.0   98.0    NaN    30.0    22.0        0.6          NaN   
 978  ...     NaN    NaN   51.0    12.0    17.0        0.3          NaN   
 979  ...   124.0   97.0    NaN     NaN     NaN        NaN          NaN   
 980  ...     NaN    NaN   41.0    13.0    16.0        1.3          NaN   
 981  ...   108.0   99.0    NaN     NaN     NaN        NaN          NaN   
 982  ...   104.0   98.0    NaN     NaN     NaN        NaN          NaN   
 983  ...    95.0   91.0    NaN     NaN     NaN        NaN          NaN   
 984  ...     NaN    NaN  215.0    91.0    44.0        5.4          NaN   
 985  ...     NaN    NaN   57.0     9.0    13.0        0.2        132.0   
 986  ...   113.0    NaN   37.0    39.0    46.0        0.9          NaN   
 987  ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 988  ...   158.0    NaN    NaN     NaN     NaN        NaN          NaN   
 989  ...     NaN    NaN  144.0    36.0   205.0       34.7          NaN   
 990  ...   157.0   96.0   48.0   114.0    40.0        0.3          NaN   
 991  ...     NaN   95.0  163.0   115.0    80.0        9.2          NaN   
 992  ...     NaN    NaN    NaN     NaN     NaN        NaN          NaN   
 993  ...     NaN    NaN  155.0    28.0    35.0        0.9          NaN   
 994  ...   112.0   98.0    NaN     NaN     NaN        NaN          NaN   
 995  ...   152.0    NaN    NaN     NaN     NaN        NaN          NaN   
 996  ...     NaN    NaN    NaN     NaN     NaN        NaN        117.0   
 997  ...   113.0    NaN   51.0    20.0    20.0        0.5          NaN   
 998  ...   145.0    NaN  158.0  1513.0  1277.0        0.6          NaN   
 999  ...   129.0   98.0   42.0     9.0    99.0        0.5          NaN   
 
      RespRate  TroponinT  TroponinI  
 0         NaN        NaN        NaN  
 1         NaN        NaN        NaN  
 2         NaN        NaN        NaN  
 3         NaN        NaN        NaN  
 4         NaN        NaN        NaN  
 5         NaN        NaN        NaN  
 6         NaN        NaN        NaN  
 7         NaN        NaN        NaN  
 8         NaN        NaN        NaN  
 9         NaN        NaN        NaN  
 10       22.0        NaN        NaN  
 11       20.0        NaN        NaN  
 12        NaN      11.18        NaN  
 13        NaN        NaN        NaN  
 14        NaN        NaN        NaN  
 15        NaN        NaN        NaN  
 16        NaN       0.03        NaN  
 17       19.0        NaN        NaN  
 18        NaN       1.80        NaN  
 19        NaN       0.38        NaN  
 20        NaN        NaN        NaN  
 21        NaN        NaN        NaN  
 22        NaN        NaN        NaN  
 23       18.0       0.61        NaN  
 24        NaN        NaN        NaN  
 25        NaN        NaN        NaN  
 26       19.0        NaN        NaN  
 27       20.0        NaN        NaN  
 28       20.0        NaN        NaN  
 29        NaN       0.38        NaN  
 ..        ...        ...        ...  
 970       NaN        NaN        NaN  
 971       NaN        NaN        NaN  
 972       NaN        NaN        NaN  
 973       NaN        NaN       11.7  
 974      16.0       4.63        NaN  
 975       NaN        NaN        6.3  
 976       NaN        NaN        NaN  
 977       NaN        NaN        NaN  
 978      15.0        NaN        NaN  
 979      24.0       0.11        NaN  
 980      21.0        NaN        NaN  
 981       NaN       0.03        NaN  
 982       NaN        NaN        NaN  
 983       NaN        NaN        NaN  
 984      13.0        NaN        NaN  
 985      22.0        NaN        NaN  
 986       NaN        NaN        NaN  
 987      24.0        NaN        0.4  
 988       NaN        NaN        NaN  
 989      17.0        NaN        NaN  
 990       NaN       0.02        NaN  
 991       NaN        NaN        NaN  
 992      23.0        NaN        NaN  
 993       NaN        NaN        NaN  
 994       NaN        NaN        NaN  
 995       NaN        NaN        NaN  
 996      18.0        NaN        NaN  
 997       NaN        NaN        NaN  
 998       NaN        NaN        NaN  
 999       NaN        NaN        NaN  
 
 [1000 rows x 42 columns]}
In [85]:
design_matrix_3 = all_temporal_dfs_folds__most_recent.copy()

Design Matrix 4

Objectives: Based on "ICU Mortality Prediction: A Classification

Algorithm for Imbalanced Datasets" by Bhattacharya, S., Rajan, V, Shrivastava, H. (2017), the features have the least number of missing data.

In [86]:
def getVariableStats(row, characteristics='mean'):
    
    # Mean - If average value exceed healthy threshold, more likelihood of mortality?
    if characteristics == 'mean':
        return np.nanmean(row)
    
    # Median - If median exceed healthy threshold, more likelihood of mortality?
    if characteristics == 'median':
        arr = np.asarray(row.sort(), dtype=np.float64)
        return np.nanmedian(arr)
    
    # Mode - If high occurrences exceed healthy threshold, more likelihood of mortality?
    if characteristics == 'mode':
        mode = max(set(row), key=row.count)
        return mode

    # Standard deviation - If large spread, is there more likelihood of mortality?
    if characteristics == 'sd':
        if len(row) >= 2:
            mean = np.nanmean(row)
            return np.nanstd(row, xbar=mean)
        else:
            return 0
In [87]:
# given temporal data of a patient and selected temporal features, return a dictionary of a row
def createDesignMatrix4(data, features, characteristics):
    record_id = data.loc[data['Time'] == '00:00', :]['Value'][0]
    age = data.loc[data['Time'] == '00:00', :]['Value'][1] 
    gender = data.loc[data['Time'] == '00:00', :]['Value'][2] 
    icu_type = data.loc[data['Time'] == '00:00', :]['Value'][4] 

    data = data[['Time', 'Parameter', 'Value']].groupby(['Time','Parameter']).median().reset_index()
    data_pivot = data.pivot(index='Time', columns='Parameter', values='Value')
    
    row_dict = {}

    for idx, row in data_pivot.T.iterrows():
        if idx in features:
            row_dict[idx] = getVariableStats(list(row.dropna()), characteristics="mean")

    row_dict['RecordID'] = record_id
    row_dict['Age'] = age
    row_dict['Gender'] = gender
    row_dict['ICUType'] = icu_type
    return row_dict
In [88]:
# return mostly numerical features
# features are selected from a research paper by Bhattacharya,S., Rajan,V. and Shrivastava, H. (2017). ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets.
# They have the least amount of missing values of below 2%
features = ['RecordID', 'Age', 'Gender', 'ICUType', 'HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']

design_matrix_4 = {}
for key, ids_list in cv_fold.items():
    
    design_matrix_4[key] = pd.DataFrame()
    print(key, "has started extracting temporal data")

    for patient_id in ids_list:
        design_matrix_4[key] = design_matrix_4[key].append(createDesignMatrix4(all_patients[patient_id], features, "mean"), ignore_index=True)

    print(key, "has completed\n")

print(len(design_matrix_4), "patients' Temporals data has been extracted")
Fold1 has started extracting temporal data
Fold1 has completed

Fold2 has started extracting temporal data
Fold2 has completed

Fold3 has started extracting temporal data
Fold3 has completed

Fold4 has started extracting temporal data
Fold4 has completed

4 patients' Temporals data has been extracted

Design Matrix Training and Testing Set Preparation

In [89]:
# Preprocessing
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler

# Model Building
from sklearn.pipeline import Pipeline
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree

from sklearn.feature_selection import VarianceThreshold, SelectKBest
from sklearn.decomposition import PCA


# Model Evaluation
from sklearn.model_selection import GridSearchCV, cross_val_score

from sklearn.metrics import mean_squared_error, r2_score
from sklearn.metrics import precision_score, recall_score, accuracy_score, confusion_matrix
from statsmodels.tools.eval_measures import rmse

Objective: Store design matrix train and test sets into their respective folds and iterations for CV in model building.

  • design_matrices - Store all design matrices across 4 folds.
  • initializeDesignMatrixTrainTestSet function - To group the 4 folds into training and test sets accordingly into 4 iterations for each design matrix.
In [90]:
design_matrices = {} 
In [91]:
# For Design Matrix into training and test
def initializeDesignMatrixTrainTestSet(design_matrix):
    design_matrices_iter = {} 
    
    # Iter1
    design_matrix_X_train = pd.DataFrame()
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold1'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold2'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold3'])
    print('X_train', design_matrix_X_train.shape)

    design_matrix_X_test = design_matrix['Fold4']
    print('X_test', design_matrix_X_test.shape)

    design_matrix_Y_train = pd.DataFrame()
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold1'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold2'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold3'])
    print('Y_train', design_matrix_Y_train.shape)

    design_matrix_Y_test = all_outcome_dfs_folds['Fold4']
    print('Y_test',design_matrix_Y_test.shape)
    
    design_matrices_iter['Iter1'] = {'X_train': design_matrix_X_train, 
                        'X_test': design_matrix_X_test,
                        'Y_train': design_matrix_Y_train,
                        'Y_test': design_matrix_Y_test}

    # Iter2
    design_matrix_X_train = pd.DataFrame()
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold1'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold2'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold4'])
    print('X_train', design_matrix_X_train.shape)

    design_matrix_X_test = design_matrix['Fold3']
    print('X_test', design_matrix_X_test.shape)

    design_matrix_Y_train = pd.DataFrame()
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold1'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold2'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold4'])
    print('Y_train', design_matrix_Y_train.shape)

    design_matrix_Y_test = all_outcome_dfs_folds['Fold3']
    print('Y_test',design_matrix_Y_test.shape)
    
    design_matrices_iter['Iter2'] = {'X_train': design_matrix_X_train, 
                        'X_test': design_matrix_X_test,
                        'Y_train': design_matrix_Y_train,
                        'Y_test': design_matrix_Y_test}
    
    # Iter3
    design_matrix_X_train = pd.DataFrame()
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold1'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold3'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold4'])
    print('X_train', design_matrix_X_train.shape)

    design_matrix_X_test = design_matrix['Fold2']
    print('X_test', design_matrix_X_test.shape)

    design_matrix_Y_train = pd.DataFrame()
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold1'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold3'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold4'])
    print('Y_train', design_matrix_Y_train.shape)

    design_matrix_Y_test = all_outcome_dfs_folds['Fold2']
    print('Y_test',design_matrix_Y_test.shape)
    
    design_matrices_iter['Iter3'] = {'X_train': design_matrix_X_train, 
                        'X_test': design_matrix_X_test,
                        'Y_train': design_matrix_Y_train,
                        'Y_test': design_matrix_Y_test}
    
    # Iter4
    design_matrix_X_train = pd.DataFrame()
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold2'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold3'])
    design_matrix_X_train = design_matrix_X_train.append(design_matrix['Fold4'])
    print('X_train', design_matrix_X_train.shape)
    
    design_matrix_X_test = design_matrix['Fold1']
    print('X_test', design_matrix_X_test.shape)

    design_matrix_Y_train = pd.DataFrame()
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold2'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold3'])
    design_matrix_Y_train = design_matrix_Y_train.append(all_outcome_dfs_folds['Fold4'])
    print('Y_train', design_matrix_Y_train.shape)

    design_matrix_Y_test = all_outcome_dfs_folds['Fold1']
    print('Y_test',design_matrix_Y_test.shape)
    
    design_matrices_iter['Iter4'] = {'X_train': design_matrix_X_train, 
                        'X_test': design_matrix_X_test,
                        'Y_train': design_matrix_Y_train,
                        'Y_test': design_matrix_Y_test}
    
    return design_matrices_iter

Design Matrix 1 - Training and Test Data Preprocessing

In [92]:
design_matrices['1'] = initializeDesignMatrixTrainTestSet(design_matrix_1)
X_train (3000, 13)
X_test (1000, 13)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 13)
X_test (1000, 13)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 13)
X_test (1000, 13)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 13)
X_test (1000, 13)
Y_train (3000, 3)
Y_test (1000, 3)

Design Matrix 2 - Training and Test Data Preprocessing

In [93]:
design_matrices['2'] = initializeDesignMatrixTrainTestSet(design_matrix_2)
X_train (3000, 7)
X_test (1000, 7)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 7)
X_test (1000, 7)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 7)
X_test (1000, 7)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 7)
X_test (1000, 7)
Y_train (3000, 3)
Y_test (1000, 3)

Design Matrix 3 - Training and Test Data Preprocessing

In [94]:
design_matrices['3'] = initializeDesignMatrixTrainTestSet(design_matrix_3)
X_train (3000, 42)
X_test (1000, 42)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 42)
X_test (1000, 42)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 42)
X_test (1000, 42)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 42)
X_test (1000, 42)
Y_train (3000, 3)
Y_test (1000, 3)
In [95]:
features_dm3 = ['ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin',
       'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender',
       'Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate',
       'MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na',
       'PaCO2', 'PaO2', 'Platelets', 'RecordID', 'RespRate', 'SaO2', 'SysABP',
       'Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']
In [96]:
# for design matrix 3 to impute the missing rows for all features
imp_dm3 = SimpleImputer(missing_values=-1, strategy='most_frequent')

simple_imp_dm3 = Pipeline(steps=[('imputer', imp_dm3)])
In [97]:
for key, design_matrix in design_matrices['3'].items():
    design_matrix_3_X_train = design_matrix['X_train'].fillna(value=-1)
    print('Before', design_matrix_3_X_train.shape)
    simple_imp_dm3.fit(design_matrix_3_X_train)
    design_matrix_3_X_train_preprocessed = simple_imp_dm3.transform(design_matrix_3_X_train)

    print("Total number of output features:", design_matrix_3_X_train_preprocessed.shape)
    design_matrix_3_X_train_preprocessed = pd.DataFrame.from_records(design_matrix_3_X_train_preprocessed)
    design_matrix_3_X_train_preprocessed.columns = design_matrix_3_X_train.columns
    
    design_matrix['X_train'] = design_matrix_3_X_train_preprocessed
    
    design_matrix_3_X_test = design_matrix['X_test'].fillna(value=-1)
    print('Before',design_matrix_3_X_test.shape)
    design_matrix_3_X_test_preprocessed =  simple_imp_dm3.fit_transform(design_matrix_3_X_test)

    print("Total number of output features:", design_matrix_3_X_test_preprocessed.shape)
    design_matrix_3_X_test_preprocessed = pd.DataFrame.from_records(design_matrix_3_X_test_preprocessed)
    design_matrix_3_X_test_preprocessed.columns = design_matrix_3_X_test.columns
    design_matrix['X_test'] = design_matrix_3_X_test_preprocessed
Before (3000, 42)
Total number of output features: (3000, 42)
Before (1000, 42)
Total number of output features: (1000, 42)
Before (3000, 42)
Total number of output features: (3000, 42)
Before (1000, 42)
Total number of output features: (1000, 42)
Before (3000, 42)
Total number of output features: (3000, 42)
Before (1000, 42)
Total number of output features: (1000, 42)
Before (3000, 42)
Total number of output features: (3000, 42)
Before (1000, 42)
Total number of output features: (1000, 42)

Design Matrix 4 - Training and Test Data Preprocessing

In [98]:
design_matrices['4'] = initializeDesignMatrixTrainTestSet(design_matrix_4)
X_train (3000, 10)
X_test (1000, 10)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 10)
X_test (1000, 10)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 10)
X_test (1000, 10)
Y_train (3000, 3)
Y_test (1000, 3)
X_train (3000, 10)
X_test (1000, 10)
Y_train (3000, 3)
Y_test (1000, 3)
In [99]:
features_dm4 = ['RecordID', 'Age', 'Gender', 'ICUType', 'HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']
In [100]:
# for design matrix 4 to impute the missing rows for all features
imp_dm4 = SimpleImputer(missing_values=-1, strategy='mean')

simple_imp_dm4 = Pipeline(steps=[('imputer', imp_dm4)])
In [101]:
for key, design_matrix in design_matrices['4'].items():
    design_matrix_4_X_train = design_matrix['X_train'].fillna(value=-1)
    print('Before', design_matrix_4_X_train.shape)
    design_matrix_4_X_train_preprocessed =  simple_imp_dm4.fit_transform(design_matrix_4_X_train)

    print("Total number of output features:", design_matrix_4_X_train_preprocessed.shape)
    design_matrix_4_X_train_preprocessed = pd.DataFrame.from_records(design_matrix_4_X_train_preprocessed)
    design_matrix_4_X_train_preprocessed.columns = design_matrix_4_X_train.columns
    
    design_matrix['X_train'] = design_matrix_4_X_train_preprocessed
    
    
    design_matrix_4_X_test = design_matrix['X_test'].fillna(value=-1)
    print('Before',design_matrix_4_X_test.shape)
    design_matrix_4_X_test_preprocessed =  simple_imp_dm4.fit_transform(design_matrix_4_X_test)

    print("Total number of output features:", design_matrix_4_X_test_preprocessed.shape)
    design_matrix_4_X_test_preprocessed = pd.DataFrame.from_records(design_matrix_4_X_test_preprocessed)
    design_matrix_4_X_test_preprocessed.columns = design_matrix_4_X_test.columns
    
    design_matrix['X_test'] = design_matrix_4_X_test_preprocessed
Before (3000, 10)
Total number of output features: (3000, 10)
Before (1000, 10)
Total number of output features: (1000, 10)
Before (3000, 10)
Total number of output features: (3000, 10)
Before (1000, 10)
Total number of output features: (1000, 10)
Before (3000, 10)
Total number of output features: (3000, 10)
Before (1000, 10)
Total number of output features: (1000, 10)
Before (3000, 10)
Total number of output features: (3000, 10)
Before (1000, 10)
Total number of output features: (1000, 10)

3. Model Building & Evaluation

  • Must Have: Two models for each task (mortality & LoS prediction) must be created in the following way
  • Model 1: Must be evaluated on all the design matrices created
  • Model 2: Must outperform Model 1

Models:

  • Model 1a (regression) - Forward StepWise Multiple Linear Regression
  • Model 1b (regression) - Elastic_net Regression
  • Model 2 (regression) - Multiple Linear Regression
  • Model 1 (classification) - Decision Tree Classifier
  • Model 2 (classification) - Logistic Regression

Regression Model Building and Evaluation

Model 1a (regression)

A foward stepwise regression model to help us identify which sets of features would give us the most explainatory power in predicting Length of Stay reflected by the metrics "Adjusted R-Squared" while having low RSME.

In [102]:
# Model 1a (regression)
# Forward StepWise Multi linear Regression
def forward_selected(data, response):
    remaining = set(data.columns)
    remaining.remove(response)
    selected = []
    current_score, best_new_score = 0.0, 0.0
    t1 = datetime.now()
    while remaining and current_score == best_new_score:
        scores_with_candidates = []
        for candidate in remaining:
            formula = "{} ~ {}".format(response,
                                           ' + '.join(selected + [candidate]))
            score = ols(formula, data).fit().rsquared_adj
            scores_with_candidates.append((score, candidate))
        scores_with_candidates.sort()
        best_new_score, best_candidate = scores_with_candidates.pop()
        if current_score < best_new_score:
            remaining.remove(best_candidate)
            selected.append(best_candidate)
            current_score = best_new_score
        t2 = datetime.now()
        diff = t2-t1
        #to break out of an infinity loop
        if diff.total_seconds() > 30:
            break
    formula = "{} ~ {} ".format(response,
                                   ' + '.join(selected))
    model = ols(formula, data).fit()
    t3 = datetime.now()
    diff = t3-t1
    print("Took about", diff.seconds, "seconds from start")
    return model
In [103]:
# Extract features from formula for usage in the stepwise regression for each iteration
def processFormulaToFeatures(formula):
    return formula[17:len(formula)].replace(" ", "").split("+")
In [104]:
### from statsmodels.formula.api import ols
regression_rsme_2a = {}
for key, val in design_matrices.items():
    print("Design Matrix", key)
    
    regression_rsme_2a[key] = {'train': [], 'test':[], 'regression':[], 'formula': [], 'adjusted_rsquared':[], 'subfeatures': []}
    
    for iternum, value in val.items():
        df_train_ols = pd.merge(value['X_train'], value['Y_train'], how='left', on='RecordID')
        if key == '1':
            features = ['Length_of_stay', "HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]

        if key == '2':
            features = ['Length_of_stay', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']

        if key == '3':
            features = ['Length_of_stay', 'ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender','Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate','MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na','PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2', 'SysABP','Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']

        if key == '4':
            features = ['Length_of_stay','HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']

        regression_3 = forward_selected(df_train_ols[features], 'Length_of_stay')

        Y_pred = regression_3.predict(value['X_test'])
        RMSE = rmse(value['Y_test']['Length_of_stay'], Y_pred)
        
        regression_rsme_2a[key]['train'].append(math.sqrt((regression_3.resid**2).mean()))
        regression_rsme_2a[key]['test'].append(RMSE)   
        regression_rsme_2a[key]['regression'].append(regression_3)
        regression_rsme_2a[key]['formula'].append(regression_3.model.formula)
        regression_rsme_2a[key]['adjusted_rsquared'].append(regression_3.rsquared_adj)
        regression_rsme_2a[key]['subfeatures'].append(processFormulaToFeatures(regression_3.model.formula))
        
    print("Mean Training RSME:", round(mean(regression_rsme_2a[key]['train']),3))
    print("Mean Test RSME:", round(mean(regression_rsme_2a[key]['test']),3))
    print("Mean Adjusted R-Squared:", round(mean(regression_rsme_2a[key]['adjusted_rsquared']),3))
    print("")
Design Matrix 1
Took about 2 seconds from start
Took about 1 seconds from start
Took about 1 seconds from start
Took about 1 seconds from start
Mean Training RSME: 11.645
Mean Test RSME: 11.649
Mean Adjusted R-Squared: 0.085

Design Matrix 2
Took about 30 seconds from start
Took about 30 seconds from start
Took about 30 seconds from start
Took about 30 seconds from start
Mean Training RSME: 11.888
Mean Test RSME: 11.883
Mean Adjusted R-Squared: 0.047

Design Matrix 3
Took about 31 seconds from start
Took about 30 seconds from start
Took about 30 seconds from start
Took about 31 seconds from start
Mean Training RSME: 11.443
Mean Test RSME: 17.912
Mean Adjusted R-Squared: 0.112

Design Matrix 4
Took about 0 seconds from start
Took about 0 seconds from start
Took about 0 seconds from start
Took about 0 seconds from start
Mean Training RSME: 12.093
Mean Test RSME: 12.084
Mean Adjusted R-Squared: 0.014

In [106]:
for key, iteration_dict in regression_rsme_2a.items():
    
    #find the highest r-square
    for idx, val in enumerate(iteration_dict['regression']):
        print('************************************************************\n')
        print("Design Matrix", key)
        print("--For Iteration ", idx+1)
        print(regression_rsme_2a[key]['regression'][idx].summary())
    print('************************************************************\n')
************************************************************

Design Matrix 1
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.087
Model:                            OLS   Adj. R-squared:                  0.084
Method:                 Least Squares   F-statistic:                     31.68
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           1.66e-53
Time:                        15:53:11   Log-Likelihood:                -11721.
No. Observations:                3000   AIC:                         2.346e+04
Df Residuals:                    2990   BIC:                         2.352e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     27.4505      7.611      3.607      0.000      12.527      42.374
GCS           -0.8017      0.065    -12.423      0.000      -0.928      -0.675
BUN            0.0528      0.011      4.697      0.000       0.031       0.075
Urine          0.0851      0.020      4.172      0.000       0.045       0.125
Bilirubin      0.1852      0.077      2.419      0.016       0.035       0.335
HCO3           0.3204      0.142      2.259      0.024       0.042       0.599
HR             0.0253      0.013      1.953      0.051      -0.000       0.051
Na            -0.0945      0.052     -1.801      0.072      -0.197       0.008
SysABP         0.0163      0.009      1.736      0.083      -0.002       0.035
WBC           -0.0418      0.036     -1.173      0.241      -0.112       0.028
==============================================================================
Omnibus:                     2349.743   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            69140.018
Skew:                           3.490   Prob(JB):                         0.00
Kurtosis:                      25.459   Cond. No.                     7.24e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.24e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.091
Model:                            OLS   Adj. R-squared:                  0.089
Method:                 Least Squares   F-statistic:                     42.96
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.06e-58
Time:                        15:53:11   Log-Likelihood:                -11650.
No. Observations:                3000   AIC:                         2.332e+04
Df Residuals:                    2992   BIC:                         2.336e+04
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     30.9554      7.442      4.160      0.000      16.364      45.547
GCS           -0.8054      0.062    -12.929      0.000      -0.928      -0.683
Urine          0.0949      0.020      4.777      0.000       0.056       0.134
BUN            0.0454      0.011      4.183      0.000       0.024       0.067
HCO3           0.4180      0.135      3.092      0.002       0.153       0.683
Na            -0.1133      0.051     -2.230      0.026      -0.213      -0.014
HR             0.0252      0.013      1.993      0.046       0.000       0.050
Bilirubin      0.0964      0.067      1.445      0.149      -0.034       0.227
==============================================================================
Omnibus:                     2348.483   Durbin-Watson:                   2.046
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70855.349
Skew:                           3.477   Prob(JB):                         0.00
Kurtosis:                      25.770   Cond. No.                     5.88e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.88e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.084
Model:                            OLS   Adj. R-squared:                  0.081
Method:                 Least Squares   F-statistic:                     34.24
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.07e-52
Time:                        15:53:11   Log-Likelihood:                -11666.
No. Observations:                3000   AIC:                         2.335e+04
Df Residuals:                    2991   BIC:                         2.341e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     33.6907      8.183      4.117      0.000      17.646      49.735
GCS           -0.7801      0.063    -12.359      0.000      -0.904      -0.656
BUN            0.0484      0.011      4.236      0.000       0.026       0.071
Urine          0.0789      0.020      3.932      0.000       0.040       0.118
HCO3           0.3404      0.139      2.444      0.015       0.067       0.613
HR             0.0277      0.013      2.161      0.031       0.003       0.053
Bilirubin      0.1339      0.075      1.788      0.074      -0.013       0.281
Na            -0.1092      0.053     -2.069      0.039      -0.213      -0.006
K             -0.7514      0.451     -1.664      0.096      -1.637       0.134
==============================================================================
Omnibus:                     2280.993   Durbin-Watson:                   2.044
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            61085.882
Skew:                           3.371   Prob(JB):                         0.00
Kurtosis:                      24.053   Cond. No.                     6.43e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.43e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.086
Model:                            OLS   Adj. R-squared:                  0.083
Method:                 Least Squares   F-statistic:                     28.17
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.09e-52
Time:                        15:53:11   Log-Likelihood:                -11441.
No. Observations:                3000   AIC:                         2.290e+04
Df Residuals:                    2989   BIC:                         2.297e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.4227      9.458      2.477      0.013       4.878      41.967
GCS           -0.6649      0.059    -11.245      0.000      -0.781      -0.549
BUN            0.0527      0.011      4.912      0.000       0.032       0.074
Urine          0.0910      0.019      4.870      0.000       0.054       0.128
HCO3           0.4165      0.136      3.068      0.002       0.150       0.683
Bilirubin      0.1684      0.071      2.356      0.019       0.028       0.309
Na            -0.1101      0.049     -2.249      0.025      -0.206      -0.014
HR             0.0198      0.012      1.649      0.099      -0.004       0.043
WBC           -0.0511      0.032     -1.595      0.111      -0.114       0.012
Temp           0.2262      0.155      1.459      0.145      -0.078       0.530
K             -0.5528      0.412     -1.342      0.180      -1.360       0.255
==============================================================================
Omnibus:                     1987.992   Durbin-Watson:                   2.069
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            34597.532
Skew:                           2.902   Prob(JB):                         0.00
Kurtosis:                      18.592   Cond. No.                     8.22e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.22e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

************************************************************

Design Matrix 2
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.051
Model:                            OLS   Adj. R-squared:                  0.049
Method:                 Least Squares   F-statistic:                     32.03
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           6.75e-32
Time:                        15:53:11   Log-Likelihood:                -11779.
No. Observations:                3000   AIC:                         2.357e+04
Df Residuals:                    2994   BIC:                         2.361e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          10.5281      1.284      8.199      0.000       8.010      13.046
GCS                 4.8230      0.638      7.559      0.000       3.572       6.074
MechVent            2.2615      0.536      4.222      0.000       1.211       3.312
MAP                 1.5135      0.690      2.192      0.028       0.160       2.867
Temp                0.8997      0.440      2.046      0.041       0.038       1.762
PaO2_FiO2_ratio    -0.8218      0.428     -1.920      0.055      -1.661       0.018
==============================================================================
Omnibus:                     2332.958   Durbin-Watson:                   2.035
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            64869.478
Skew:                           3.475   Prob(JB):                         0.00
Kurtosis:                      24.695   Cond. No.                         18.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.057
Model:                            OLS   Adj. R-squared:                  0.055
Method:                 Least Squares   F-statistic:                     35.90
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           8.34e-36
Time:                        15:53:11   Log-Likelihood:                -11706.
No. Observations:                3000   AIC:                         2.342e+04
Df Residuals:                    2994   BIC:                         2.346e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept           9.8588      1.258      7.834      0.000       7.391      12.326
GCS                 4.8765      0.615      7.931      0.000       3.671       6.082
MechVent            2.4777      0.518      4.783      0.000       1.462       3.493
MAP                 2.3706      0.687      3.452      0.001       1.024       3.717
Temp                0.8477      0.422      2.007      0.045       0.020       1.676
PaO2_FiO2_ratio    -0.6499      0.415     -1.567      0.117      -1.463       0.163
==============================================================================
Omnibus:                     2316.539   Durbin-Watson:                   2.050
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            65452.533
Skew:                           3.430   Prob(JB):                         0.00
Kurtosis:                      24.830   Cond. No.                         18.6
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.045
Model:                            OLS   Adj. R-squared:                  0.043
Method:                 Least Squares   F-statistic:                     28.08
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           6.79e-28
Time:                        15:53:11   Log-Likelihood:                -11729.
No. Observations:                3000   AIC:                         2.347e+04
Df Residuals:                    2994   BIC:                         2.351e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          11.0613      1.274      8.682      0.000       8.563      13.560
GCS                 4.6966      0.625      7.516      0.000       3.471       5.922
MechVent            1.9770      0.529      3.735      0.000       0.939       3.015
MAP                 2.0944      0.695      3.014      0.003       0.732       3.457
PaO2_FiO2_ratio    -0.6450      0.422     -1.530      0.126      -1.472       0.182
Temp                0.5165      0.422      1.224      0.221      -0.311       1.344
==============================================================================
Omnibus:                     2250.450   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            55864.167
Skew:                           3.333   Prob(JB):                         0.00
Kurtosis:                      23.062   Cond. No.                         18.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.042
Model:                            OLS   Adj. R-squared:                  0.041
Method:                 Least Squares   F-statistic:                     32.91
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           6.83e-27
Time:                        15:53:11   Log-Likelihood:                -11512.
No. Observations:                3000   AIC:                         2.303e+04
Df Residuals:                    2995   BIC:                         2.306e+04
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      9.8265      0.910     10.796      0.000       8.042      11.611
GCS            4.5947      0.580      7.917      0.000       3.457       5.733
MechVent       1.9559      0.457      4.280      0.000       1.060       2.852
MAP            1.0165      0.650      1.563      0.118      -0.258       2.291
Temp           0.5583      0.387      1.443      0.149      -0.200       1.317
==============================================================================
Omnibus:                     1965.349   Durbin-Watson:                   2.060
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31630.078
Skew:                           2.886   Prob(JB):                         0.00
Kurtosis:                      17.823   Cond. No.                         12.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

************************************************************

Design Matrix 3
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     18.19
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           5.52e-69
Time:                        15:53:12   Log-Likelihood:                -11660.
No. Observations:                3000   AIC:                         2.337e+04
Df Residuals:                    2976   BIC:                         2.351e+04
Df Model:                          23                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7584     33.096      0.023      0.982     -64.136      65.653
GCS           -0.7530      0.066    -11.364      0.000      -0.883      -0.623
Albumin       -2.5807      0.517     -4.995      0.000      -3.594      -1.568
ICUType        1.2743      0.233      5.480      0.000       0.818       1.730
BUN            0.0588      0.012      4.878      0.000       0.035       0.082
HCT           -0.1695      0.049     -3.449      0.001      -0.266      -0.073
Lactate       -0.7212      0.198     -3.649      0.000      -1.109      -0.334
DiasABP        0.0527      0.017      3.159      0.002       0.020       0.085
Bilirubin      0.2152      0.078      2.774      0.006       0.063       0.367
Na            -0.1016      0.053     -1.917      0.055      -0.206       0.002
SaO2          -0.2082      0.087     -2.387      0.017      -0.379      -0.037
Weight         0.0199      0.010      2.030      0.042       0.001       0.039
Urine         -0.0029      0.001     -2.179      0.029      -0.006      -0.000
Height        -0.0241      0.014     -1.710      0.087      -0.052       0.004
HR             0.0229      0.013      1.754      0.080      -0.003       0.049
pH             8.7500      4.216      2.075      0.038       0.483      17.017
PaO2           0.0082      0.005      1.630      0.103      -0.002       0.018
Glucose        0.0083      0.005      1.504      0.133      -0.003       0.019
NISysABP      -0.0125      0.009     -1.331      0.183      -0.031       0.006
RespRate      -0.1039      0.077     -1.352      0.176      -0.255       0.047
ALP            0.0042      0.003      1.384      0.166      -0.002       0.010
AST           -0.0005      0.000     -1.314      0.189      -0.001       0.000
Age           -0.0168      0.014     -1.206      0.228      -0.044       0.010
Platelets      0.0022      0.002      1.003      0.316      -0.002       0.006
==============================================================================
Omnibus:                     2332.457   Durbin-Watson:                   2.033
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70740.713
Skew:                           3.437   Prob(JB):                         0.00
Kurtosis:                      25.774   Cond. No.                     1.02e+05
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.02e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.117
Method:                 Least Squares   F-statistic:                     20.92
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           6.82e-71
Time:                        15:53:12   Log-Likelihood:                -11597.
No. Observations:                3000   AIC:                         2.324e+04
Df Residuals:                    2979   BIC:                         2.336e+04
Df Model:                          20                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      75.5846     12.876      5.870      0.000      50.339     100.831
GCS            -0.7791      0.063    -12.278      0.000      -0.904      -0.655
Albumin        -2.5783      0.506     -5.096      0.000      -3.570      -1.586
ICUType         1.2870      0.230      5.598      0.000       0.836       1.738
BUN             0.0525      0.011      4.713      0.000       0.031       0.074
Na             -0.1175      0.052     -2.259      0.024      -0.219      -0.016
HCT            -0.1431      0.049     -2.947      0.003      -0.238      -0.048
Lactate        -0.6505      0.174     -3.743      0.000      -0.991      -0.310
RespRate       -0.2222      0.070     -3.195      0.001      -0.359      -0.086
DiasABP         0.0429      0.017      2.536      0.011       0.010       0.076
NISysABP       -0.0195      0.009     -2.084      0.037      -0.038      -0.001
SaO2           -0.2406      0.111     -2.173      0.030      -0.458      -0.024
HR              0.0229      0.013      1.791      0.073      -0.002       0.048
Bilirubin       0.1235      0.067      1.857      0.063      -0.007       0.254
Glucose         0.0106      0.005      2.027      0.043       0.000       0.021
Urine          -0.0028      0.001     -2.015      0.044      -0.006   -7.64e-05
Age            -0.0172      0.013     -1.291      0.197      -0.043       0.009
PaCO2          -0.0422      0.029     -1.446      0.148      -0.099       0.015
Cholesterol    -0.0157      0.013     -1.227      0.220      -0.041       0.009
TroponinI      -0.1076      0.087     -1.233      0.218      -0.279       0.064
Platelets       0.0026      0.002      1.230      0.219      -0.002       0.007
==============================================================================
Omnibus:                     2333.061   Durbin-Watson:                   2.030
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            72716.594
Skew:                           3.428   Prob(JB):                         0.00
Kurtosis:                      26.124   Cond. No.                     2.38e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.38e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.119
Model:                            OLS   Adj. R-squared:                  0.114
Method:                 Least Squares   F-statistic:                     21.21
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           9.63e-69
Time:                        15:53:12   Log-Likelihood:                -11608.
No. Observations:                3000   AIC:                         2.326e+04
Df Residuals:                    2980   BIC:                         2.338e+04
Df Model:                          19                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     78.8420     11.564      6.818      0.000      56.168     101.516
GCS           -0.7138      0.065    -10.950      0.000      -0.842      -0.586
Albumin       -3.9431      0.552     -7.142      0.000      -5.026      -2.861
ICUType        0.9368      0.228      4.112      0.000       0.490       1.384
BUN            0.0515      0.012      4.459      0.000       0.029       0.074
HCT           -0.1171      0.048     -2.453      0.014      -0.211      -0.023
Age           -0.0491      0.013     -3.674      0.000      -0.075      -0.023
Urine         -0.0038      0.001     -2.775      0.006      -0.007      -0.001
Lactate       -0.5052      0.202     -2.499      0.012      -0.902      -0.109
SaO2          -0.2152      0.082     -2.615      0.009      -0.377      -0.054
Na            -0.1144      0.053     -2.151      0.032      -0.219      -0.010
Bilirubin      0.1505      0.074      2.034      0.042       0.005       0.295
PaO2           0.0072      0.005      1.450      0.147      -0.003       0.017
SysABP         0.0153      0.009      1.652      0.099      -0.003       0.034
HR             0.0225      0.013      1.744      0.081      -0.003       0.048
NIMAP         -0.0200      0.016     -1.285      0.199      -0.051       0.011
Height        -0.0185      0.014     -1.311      0.190      -0.046       0.009
AST           -0.0004      0.000     -1.083      0.279      -0.001       0.000
RespRate      -0.0759      0.070     -1.090      0.276      -0.212       0.061
K             -0.4540      0.448     -1.013      0.311      -1.332       0.424
==============================================================================
Omnibus:                     2250.051   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            60211.632
Skew:                           3.300   Prob(JB):                         0.00
Kurtosis:                      23.931   Cond. No.                     3.33e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.33e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.108
Model:                            OLS   Adj. R-squared:                  0.102
Method:                 Least Squares   F-statistic:                     17.97
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.22e-60
Time:                        15:53:12   Log-Likelihood:                -11406.
No. Observations:                3000   AIC:                         2.285e+04
Df Residuals:                    2979   BIC:                         2.298e+04
Df Model:                          20                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.2663      5.947      3.912      0.000      11.606      34.927
GCS           -0.6593      0.060    -11.031      0.000      -0.776      -0.542
Albumin       -1.9552      0.486     -4.025      0.000      -2.908      -1.003
BUN            0.0486      0.011      4.613      0.000       0.028       0.069
ICUType        1.1535      0.212      5.447      0.000       0.738       1.569
HCT           -0.1452      0.044     -3.270      0.001      -0.232      -0.058
ALP            0.0095      0.003      3.156      0.002       0.004       0.015
Weight         0.0208      0.009      2.422      0.015       0.004       0.038
Na            -0.1076      0.049     -2.219      0.027      -0.203      -0.013
Lactate       -0.4777      0.197     -2.429      0.015      -0.863      -0.092
Urine         -0.0020      0.001     -1.612      0.107      -0.004       0.000
RespRate      -0.1202      0.064     -1.866      0.062      -0.247       0.006
HR             0.0218      0.012      1.809      0.071      -0.002       0.045
Bilirubin      0.1223      0.072      1.704      0.089      -0.018       0.263
SaO2          -0.1135      0.077     -1.473      0.141      -0.265       0.038
SysABP         0.0123      0.009      1.426      0.154      -0.005       0.029
WBC           -0.0451      0.032     -1.409      0.159      -0.108       0.018
FiO2           1.8464      1.460      1.264      0.206      -1.017       4.710
Temp           0.1927      0.154      1.250      0.211      -0.110       0.495
NISysABP      -0.0097      0.009     -1.122      0.262      -0.027       0.007
Height        -0.0146      0.014     -1.070      0.285      -0.041       0.012
MechVent      23.2663      5.947      3.912      0.000      11.606      34.927
==============================================================================
Omnibus:                     1989.780   Durbin-Watson:                   2.056
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            35745.669
Skew:                           2.892   Prob(JB):                         0.00
Kurtosis:                      18.890   Cond. No.                     2.91e+18
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.86e-29. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
************************************************************

************************************************************

Design Matrix 4
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     19.22
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           5.06e-09
Time:                        15:53:12   Log-Likelihood:                -11838.
No. Observations:                3000   AIC:                         2.368e+04
Df Residuals:                    2997   BIC:                         2.370e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.0032      2.127      3.293      0.001       2.833      11.174
HR             0.0963      0.016      6.096      0.000       0.065       0.127
MAP           -0.0207      0.020     -1.058      0.290      -0.059       0.018
==============================================================================
Omnibus:                     2344.530   Durbin-Watson:                   2.019
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            64995.321
Skew:                           3.503   Prob(JB):                         0.00
Kurtosis:                      24.699   Cond. No.                     1.11e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.11e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.017
Method:                 Least Squares   F-statistic:                     13.99
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.61e-11
Time:                        15:53:12   Log-Likelihood:                -11766.
No. Observations:                3000   AIC:                         2.354e+04
Df Residuals:                    2995   BIC:                         2.357e+04
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -27.6645     12.436     -2.225      0.026     -52.048      -3.281
HR             0.0959      0.016      6.110      0.000       0.065       0.127
Temp           0.8992      0.334      2.689      0.007       0.243       1.555
MAP           -0.0263      0.019     -1.402      0.161      -0.063       0.010
Mg             0.8212      0.741      1.109      0.268      -0.631       2.274
==============================================================================
Omnibus:                     2376.999   Durbin-Watson:                   2.021
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70465.592
Skew:                           3.551   Prob(JB):                         0.00
Kurtosis:                      25.656   Cond. No.                     6.98e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.98e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.016
Method:                 Least Squares   F-statistic:                     10.96
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           1.78e-10
Time:                        15:53:12   Log-Likelihood:                -11771.
No. Observations:                3000   AIC:                         2.355e+04
Df Residuals:                    2994   BIC:                         2.359e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -15.8113     12.751     -1.240      0.215     -40.813       9.191
HR             0.1014      0.016      6.366      0.000       0.070       0.133
Temp           0.6361      0.337      1.890      0.059      -0.024       1.296
MAP           -0.0324      0.019     -1.720      0.086      -0.069       0.005
Mg             1.1163      0.763      1.464      0.143      -0.379       2.612
K             -0.6373      0.488     -1.306      0.192      -1.594       0.319
==============================================================================
Omnibus:                     2258.869   Durbin-Watson:                   2.026
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            55691.132
Skew:                           3.355   Prob(JB):                         0.00
Kurtosis:                      23.013   Cond. No.                     7.15e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.15e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     13.49
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           9.60e-09
Time:                        15:53:12   Log-Likelihood:                -11556.
No. Observations:                3000   AIC:                         2.312e+04
Df Residuals:                    2996   BIC:                         2.314e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.5487      2.606      2.896      0.004       2.438      12.659
HR             0.0849      0.014      5.862      0.000       0.057       0.113
MAP           -0.0443      0.019     -2.301      0.021      -0.082      -0.007
Mg             0.9288      0.696      1.334      0.182      -0.437       2.294
==============================================================================
Omnibus:                     1979.821   Durbin-Watson:                   2.038
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31688.277
Skew:                           2.919   Prob(JB):                         0.00
Kurtosis:                      17.813   Cond. No.                     1.52e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.52e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Model 1b (regression)

The key features are identified and run in an elastic net regression with regularization in the training in attempt to reduce further the RSME and the adjusted R-Squared

In [107]:
# For Elastic_net Regression
regression_rsme_2b = {}

for key, val in design_matrices.items():
    print("Design Matrix", key)
    
    regression_rsme_2b[key] = {'train': [], 'test':[], 'regression':[], 'adjusted_rsquared':[]}
    
    for iternum, value in val.items():
        df_train_ols = pd.merge(value['X_train'], value['Y_train'], how='left', on='RecordID')
        
        idx = -1
        if iternum == 'Iter1':
            idx = 0
        if iternum == 'Iter2':
            idx = 1
        if iternum == 'Iter3':
            idx = 2
        if iternum == 'Iter4':
            idx = 3
        
        if key == '1':
            features = ['Length_of_stay', "HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]
            new_reduced_features = regression_rsme_2a[key]['subfeatures'][idx]

        if key == '2':
            features = ['Length_of_stay', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']
            new_reduced_features = regression_rsme_2a[key]['subfeatures'][idx]
        if key == '3':
            features = ['Length_of_stay', 'ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender','Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate','MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na','PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2', 'SysABP','Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']
            new_reduced_features = regression_rsme_2a[key]['subfeatures'][idx]

        if key == '4':
            features = ['Length_of_stay','HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']
            new_reduced_features = regression_rsme_2a[key]['subfeatures'][idx]


        regression_3b = ols(formula='Length_of_stay ~'+ ' + '.join(new_reduced_features), data=df_train_ols[features]).fit_regularized(method='elastic_net', refit=True)

        Y_pred = regression_3b.predict(value['X_test'])
        RMSE = rmse(value['Y_test']['Length_of_stay'], Y_pred)
        
        regression_rsme_2b[key]['train'].append(math.sqrt((regression_3b.resid**2).mean()))
        regression_rsme_2b[key]['test'].append(RMSE)  
        regression_rsme_2b[key]['regression'].append(regression_3b)
        regression_rsme_2b[key]['adjusted_rsquared'].append(regression_3b.rsquared_adj)

    print("Mean Training RSME:", round(mean(regression_rsme_2b[key]['train']),3))
    print("Mean Test RSME:", round(mean(regression_rsme_2b[key]['test']),3))
    print("Mean Adjusted R-Squared:", round(mean(regression_rsme_2b[key]['adjusted_rsquared']),3))
    print("")   
Design Matrix 1
Mean Training RSME: 11.645
Mean Test RSME: 11.649
Mean Adjusted R-Squared: 0.085

Design Matrix 2
Mean Training RSME: 11.888
Mean Test RSME: 11.883
Mean Adjusted R-Squared: 0.047

Design Matrix 3
Mean Training RSME: 11.443
Mean Test RSME: 17.912
Mean Adjusted R-Squared: 0.112

Design Matrix 4
Mean Training RSME: 12.093
Mean Test RSME: 12.084
Mean Adjusted R-Squared: 0.014

In [108]:
for key, iteration_dict in regression_rsme_2b.items():
    
    #find the highest r-square
    for idx, val in enumerate(iteration_dict['regression']):
        print('************************************************************\n')
        print("Design Matrix", key)
        print("--For Iteration ", idx+1)
        print(regression_rsme_2b[key]['regression'][idx].summary())
    print('************************************************************\n')
************************************************************

Design Matrix 1
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.087
Model:                            OLS   Adj. R-squared:                  0.084
Method:                 Least Squares   F-statistic:                     28.51
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           9.21e-53
Time:                        15:53:19   Log-Likelihood:                -11721.
No. Observations:                3000   AIC:                         2.346e+04
Df Residuals:                    2990   BIC:                         2.353e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     27.4505      7.611      3.607      0.000      12.527      42.374
GCS           -0.8017      0.065    -12.423      0.000      -0.928      -0.675
BUN            0.0528      0.011      4.697      0.000       0.031       0.075
Urine          0.0851      0.020      4.172      0.000       0.045       0.125
Bilirubin      0.1852      0.077      2.419      0.016       0.035       0.335
HCO3           0.3204      0.142      2.259      0.024       0.042       0.599
HR             0.0253      0.013      1.953      0.051      -0.000       0.051
Na            -0.0945      0.052     -1.801      0.072      -0.197       0.008
SysABP         0.0163      0.009      1.736      0.083      -0.002       0.035
WBC           -0.0418      0.036     -1.173      0.241      -0.112       0.028
==============================================================================
Omnibus:                     2349.743   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            69140.018
Skew:                           3.490   Prob(JB):                         0.00
Kurtosis:                      25.459   Cond. No.                     7.24e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.24e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.091
Model:                            OLS   Adj. R-squared:                  0.089
Method:                 Least Squares   F-statistic:                     37.59
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.64e-57
Time:                        15:53:19   Log-Likelihood:                -11650.
No. Observations:                3000   AIC:                         2.332e+04
Df Residuals:                    2992   BIC:                         2.337e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     30.9554      7.442      4.160      0.000      16.364      45.547
GCS           -0.8054      0.062    -12.929      0.000      -0.928      -0.683
Urine          0.0949      0.020      4.777      0.000       0.056       0.134
BUN            0.0454      0.011      4.183      0.000       0.024       0.067
HCO3           0.4180      0.135      3.092      0.002       0.153       0.683
Na            -0.1133      0.051     -2.230      0.026      -0.213      -0.014
HR             0.0252      0.013      1.993      0.046       0.000       0.050
Bilirubin      0.0964      0.067      1.445      0.149      -0.034       0.227
==============================================================================
Omnibus:                     2348.483   Durbin-Watson:                   2.046
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70855.349
Skew:                           3.477   Prob(JB):                         0.00
Kurtosis:                      25.770   Cond. No.                     5.88e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.88e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.084
Model:                            OLS   Adj. R-squared:                  0.081
Method:                 Least Squares   F-statistic:                     30.44
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.36e-51
Time:                        15:53:19   Log-Likelihood:                -11666.
No. Observations:                3000   AIC:                         2.335e+04
Df Residuals:                    2991   BIC:                         2.341e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     33.6907      8.183      4.117      0.000      17.646      49.735
GCS           -0.7801      0.063    -12.359      0.000      -0.904      -0.656
BUN            0.0484      0.011      4.236      0.000       0.026       0.071
Urine          0.0789      0.020      3.932      0.000       0.040       0.118
HCO3           0.3404      0.139      2.444      0.015       0.067       0.613
HR             0.0277      0.013      2.161      0.031       0.003       0.053
Bilirubin      0.1339      0.075      1.788      0.074      -0.013       0.281
Na            -0.1092      0.053     -2.069      0.039      -0.213      -0.006
K             -0.7514      0.451     -1.664      0.096      -1.637       0.134
==============================================================================
Omnibus:                     2280.993   Durbin-Watson:                   2.044
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            61085.882
Skew:                           3.371   Prob(JB):                         0.00
Kurtosis:                      24.053   Cond. No.                     6.43e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.43e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 1
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.086
Model:                            OLS   Adj. R-squared:                  0.083
Method:                 Least Squares   F-statistic:                     25.61
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.14e-51
Time:                        15:53:19   Log-Likelihood:                -11441.
No. Observations:                3000   AIC:                         2.291e+04
Df Residuals:                    2989   BIC:                         2.298e+04
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.4227      9.458      2.477      0.013       4.878      41.967
GCS           -0.6649      0.059    -11.245      0.000      -0.781      -0.549
BUN            0.0527      0.011      4.912      0.000       0.032       0.074
Urine          0.0910      0.019      4.870      0.000       0.054       0.128
HCO3           0.4165      0.136      3.068      0.002       0.150       0.683
Bilirubin      0.1684      0.071      2.356      0.019       0.028       0.309
Na            -0.1101      0.049     -2.249      0.025      -0.206      -0.014
HR             0.0198      0.012      1.649      0.099      -0.004       0.043
WBC           -0.0511      0.032     -1.595      0.111      -0.114       0.012
Temp           0.2262      0.155      1.459      0.145      -0.078       0.530
K             -0.5528      0.412     -1.342      0.180      -1.360       0.255
==============================================================================
Omnibus:                     1987.992   Durbin-Watson:                   2.069
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            34597.532
Skew:                           2.902   Prob(JB):                         0.00
Kurtosis:                      18.592   Cond. No.                     8.22e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.22e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

************************************************************

Design Matrix 2
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.051
Model:                            OLS   Adj. R-squared:                  0.049
Method:                 Least Squares   F-statistic:                     26.69
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           3.94e-31
Time:                        15:53:19   Log-Likelihood:                -11779.
No. Observations:                3000   AIC:                         2.357e+04
Df Residuals:                    2994   BIC:                         2.361e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          10.5281      1.284      8.199      0.000       8.010      13.046
GCS                 4.8230      0.638      7.559      0.000       3.572       6.074
MechVent            2.2615      0.536      4.222      0.000       1.211       3.312
MAP                 1.5135      0.690      2.192      0.028       0.160       2.867
Temp                0.8997      0.440      2.046      0.041       0.038       1.762
PaO2_FiO2_ratio    -0.8218      0.428     -1.920      0.055      -1.661       0.018
==============================================================================
Omnibus:                     2332.958   Durbin-Watson:                   2.035
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            64869.478
Skew:                           3.475   Prob(JB):                         0.00
Kurtosis:                      24.695   Cond. No.                         18.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.057
Model:                            OLS   Adj. R-squared:                  0.055
Method:                 Least Squares   F-statistic:                     29.92
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           5.13e-35
Time:                        15:53:19   Log-Likelihood:                -11706.
No. Observations:                3000   AIC:                         2.343e+04
Df Residuals:                    2994   BIC:                         2.347e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept           9.8588      1.258      7.834      0.000       7.391      12.326
GCS                 4.8765      0.615      7.931      0.000       3.671       6.082
MechVent            2.4777      0.518      4.783      0.000       1.462       3.493
MAP                 2.3706      0.687      3.452      0.001       1.024       3.717
Temp                0.8477      0.422      2.007      0.045       0.020       1.676
PaO2_FiO2_ratio    -0.6499      0.415     -1.567      0.117      -1.463       0.163
==============================================================================
Omnibus:                     2316.539   Durbin-Watson:                   2.050
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            65452.533
Skew:                           3.430   Prob(JB):                         0.00
Kurtosis:                      24.830   Cond. No.                         18.6
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.045
Model:                            OLS   Adj. R-squared:                  0.043
Method:                 Least Squares   F-statistic:                     23.40
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           3.73e-27
Time:                        15:53:19   Log-Likelihood:                -11729.
No. Observations:                3000   AIC:                         2.347e+04
Df Residuals:                    2994   BIC:                         2.351e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          11.0613      1.274      8.682      0.000       8.563      13.560
GCS                 4.6966      0.625      7.516      0.000       3.471       5.922
MechVent            1.9770      0.529      3.735      0.000       0.939       3.015
MAP                 2.0944      0.695      3.014      0.003       0.732       3.457
PaO2_FiO2_ratio    -0.6450      0.422     -1.530      0.126      -1.472       0.182
Temp                0.5165      0.422      1.224      0.221      -0.311       1.344
==============================================================================
Omnibus:                     2250.450   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            55864.167
Skew:                           3.333   Prob(JB):                         0.00
Kurtosis:                      23.062   Cond. No.                         18.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

Design Matrix 2
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.042
Model:                            OLS   Adj. R-squared:                  0.041
Method:                 Least Squares   F-statistic:                     26.33
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.11e-26
Time:                        15:53:19   Log-Likelihood:                -11512.
No. Observations:                3000   AIC:                         2.304e+04
Df Residuals:                    2995   BIC:                         2.307e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      9.8265      0.910     10.796      0.000       8.042      11.611
GCS            4.5947      0.580      7.917      0.000       3.457       5.733
MechVent       1.9559      0.457      4.280      0.000       1.060       2.852
MAP            1.0165      0.650      1.563      0.118      -0.258       2.291
Temp           0.5583      0.387      1.443      0.149      -0.200       1.317
==============================================================================
Omnibus:                     1965.349   Durbin-Watson:                   2.060
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31630.078
Skew:                           2.886   Prob(JB):                         0.00
Kurtosis:                      17.823   Cond. No.                         12.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
************************************************************

************************************************************

Design Matrix 3
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     17.43
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.24e-68
Time:                        15:53:19   Log-Likelihood:                -11660.
No. Observations:                3000   AIC:                         2.337e+04
Df Residuals:                    2976   BIC:                         2.352e+04
Df Model:                          24                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7584     33.096      0.023      0.982     -64.136      65.653
GCS           -0.7530      0.066    -11.364      0.000      -0.883      -0.623
Albumin       -2.5807      0.517     -4.995      0.000      -3.594      -1.568
ICUType        1.2743      0.233      5.480      0.000       0.818       1.730
BUN            0.0588      0.012      4.878      0.000       0.035       0.082
HCT           -0.1695      0.049     -3.449      0.001      -0.266      -0.073
Lactate       -0.7212      0.198     -3.649      0.000      -1.109      -0.334
DiasABP        0.0527      0.017      3.159      0.002       0.020       0.085
Bilirubin      0.2152      0.078      2.774      0.006       0.063       0.367
Na            -0.1016      0.053     -1.917      0.055      -0.206       0.002
SaO2          -0.2082      0.087     -2.387      0.017      -0.379      -0.037
Weight         0.0199      0.010      2.030      0.042       0.001       0.039
Urine         -0.0029      0.001     -2.179      0.029      -0.006      -0.000
Height        -0.0241      0.014     -1.710      0.087      -0.052       0.004
HR             0.0229      0.013      1.754      0.080      -0.003       0.049
pH             8.7500      4.216      2.075      0.038       0.483      17.017
PaO2           0.0082      0.005      1.630      0.103      -0.002       0.018
Glucose        0.0083      0.005      1.504      0.133      -0.003       0.019
NISysABP      -0.0125      0.009     -1.331      0.183      -0.031       0.006
RespRate      -0.1039      0.077     -1.352      0.176      -0.255       0.047
ALP            0.0042      0.003      1.384      0.166      -0.002       0.010
AST           -0.0005      0.000     -1.314      0.189      -0.001       0.000
Age           -0.0168      0.014     -1.206      0.228      -0.044       0.010
Platelets      0.0022      0.002      1.003      0.316      -0.002       0.006
==============================================================================
Omnibus:                     2332.457   Durbin-Watson:                   2.033
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70740.713
Skew:                           3.437   Prob(JB):                         0.00
Kurtosis:                      25.774   Cond. No.                     1.02e+05
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.02e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.117
Method:                 Least Squares   F-statistic:                     19.92
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.98e-70
Time:                        15:53:19   Log-Likelihood:                -11597.
No. Observations:                3000   AIC:                         2.324e+04
Df Residuals:                    2979   BIC:                         2.337e+04
Df Model:                          21                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      75.5846     12.876      5.870      0.000      50.339     100.831
GCS            -0.7791      0.063    -12.278      0.000      -0.904      -0.655
Albumin        -2.5783      0.506     -5.096      0.000      -3.570      -1.586
ICUType         1.2870      0.230      5.598      0.000       0.836       1.738
BUN             0.0525      0.011      4.713      0.000       0.031       0.074
Na             -0.1175      0.052     -2.259      0.024      -0.219      -0.016
HCT            -0.1431      0.049     -2.947      0.003      -0.238      -0.048
Lactate        -0.6505      0.174     -3.743      0.000      -0.991      -0.310
RespRate       -0.2222      0.070     -3.195      0.001      -0.359      -0.086
DiasABP         0.0429      0.017      2.536      0.011       0.010       0.076
NISysABP       -0.0195      0.009     -2.084      0.037      -0.038      -0.001
SaO2           -0.2406      0.111     -2.173      0.030      -0.458      -0.024
HR              0.0229      0.013      1.791      0.073      -0.002       0.048
Bilirubin       0.1235      0.067      1.857      0.063      -0.007       0.254
Glucose         0.0106      0.005      2.027      0.043       0.000       0.021
Urine          -0.0028      0.001     -2.015      0.044      -0.006   -7.64e-05
Age            -0.0172      0.013     -1.291      0.197      -0.043       0.009
PaCO2          -0.0422      0.029     -1.446      0.148      -0.099       0.015
Cholesterol    -0.0157      0.013     -1.227      0.220      -0.041       0.009
TroponinI      -0.1076      0.087     -1.233      0.218      -0.279       0.064
Platelets       0.0026      0.002      1.230      0.219      -0.002       0.007
==============================================================================
Omnibus:                     2333.061   Durbin-Watson:                   2.030
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            72716.594
Skew:                           3.428   Prob(JB):                         0.00
Kurtosis:                      26.124   Cond. No.                     2.38e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.38e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.119
Model:                            OLS   Adj. R-squared:                  0.114
Method:                 Least Squares   F-statistic:                     20.15
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           4.24e-68
Time:                        15:53:19   Log-Likelihood:                -11608.
No. Observations:                3000   AIC:                         2.326e+04
Df Residuals:                    2980   BIC:                         2.338e+04
Df Model:                          20                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     78.8420     11.564      6.818      0.000      56.168     101.516
GCS           -0.7138      0.065    -10.950      0.000      -0.842      -0.586
Albumin       -3.9431      0.552     -7.142      0.000      -5.026      -2.861
ICUType        0.9368      0.228      4.112      0.000       0.490       1.384
BUN            0.0515      0.012      4.459      0.000       0.029       0.074
HCT           -0.1171      0.048     -2.453      0.014      -0.211      -0.023
Age           -0.0491      0.013     -3.674      0.000      -0.075      -0.023
Urine         -0.0038      0.001     -2.775      0.006      -0.007      -0.001
Lactate       -0.5052      0.202     -2.499      0.012      -0.902      -0.109
SaO2          -0.2152      0.082     -2.615      0.009      -0.377      -0.054
Na            -0.1144      0.053     -2.151      0.032      -0.219      -0.010
Bilirubin      0.1505      0.074      2.034      0.042       0.005       0.295
PaO2           0.0072      0.005      1.450      0.147      -0.003       0.017
SysABP         0.0153      0.009      1.652      0.099      -0.003       0.034
HR             0.0225      0.013      1.744      0.081      -0.003       0.048
NIMAP         -0.0200      0.016     -1.285      0.199      -0.051       0.011
Height        -0.0185      0.014     -1.311      0.190      -0.046       0.009
AST           -0.0004      0.000     -1.083      0.279      -0.001       0.000
RespRate      -0.0759      0.070     -1.090      0.276      -0.212       0.061
K             -0.4540      0.448     -1.013      0.311      -1.332       0.424
==============================================================================
Omnibus:                     2250.051   Durbin-Watson:                   2.047
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            60211.632
Skew:                           3.300   Prob(JB):                         0.00
Kurtosis:                      23.931   Cond. No.                     3.33e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.33e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 3
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.108
Model:                            OLS   Adj. R-squared:                  0.101
Method:                 Least Squares   F-statistic:                     16.33
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           7.23e-59
Time:                        15:53:19   Log-Likelihood:                -11406.
No. Observations:                3000   AIC:                         2.286e+04
Df Residuals:                    2978   BIC:                         2.300e+04
Df Model:                          22                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.2663      5.948      3.912      0.000      11.604      34.929
GCS           -0.6593      0.060    -11.029      0.000      -0.776      -0.542
Albumin       -1.9552      0.486     -4.024      0.000      -2.908      -1.003
BUN            0.0486      0.011      4.612      0.000       0.028       0.069
ICUType        1.1535      0.212      5.446      0.000       0.738       1.569
HCT           -0.1452      0.044     -3.269      0.001      -0.232      -0.058
ALP            0.0095      0.003      3.155      0.002       0.004       0.015
Weight         0.0208      0.009      2.422      0.015       0.004       0.038
Na            -0.1076      0.049     -2.218      0.027      -0.203      -0.012
Lactate       -0.4777      0.197     -2.429      0.015      -0.863      -0.092
Urine         -0.0020      0.001     -1.612      0.107      -0.004       0.000
RespRate      -0.1202      0.064     -1.865      0.062      -0.247       0.006
HR             0.0218      0.012      1.808      0.071      -0.002       0.045
Bilirubin      0.1223      0.072      1.703      0.089      -0.018       0.263
SaO2          -0.1135      0.077     -1.472      0.141      -0.265       0.038
SysABP         0.0123      0.009      1.425      0.154      -0.005       0.029
WBC           -0.0451      0.032     -1.409      0.159      -0.108       0.018
FiO2           1.8464      1.460      1.264      0.206      -1.017       4.710
Temp           0.1927      0.154      1.250      0.212      -0.110       0.495
NISysABP      -0.0097      0.009     -1.122      0.262      -0.027       0.007
Height        -0.0146      0.014     -1.069      0.285      -0.041       0.012
MechVent      23.2663      5.948      3.912      0.000      11.604      34.929
==============================================================================
Omnibus:                     1989.780   Durbin-Watson:                   2.056
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            35745.669
Skew:                           2.892   Prob(JB):                         0.00
Kurtosis:                      18.890   Cond. No.                     7.25e+09
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 7.85e-12. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
************************************************************

************************************************************

Design Matrix 4
--For Iteration  1
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     12.82
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           2.55e-08
Time:                        15:53:19   Log-Likelihood:                -11838.
No. Observations:                3000   AIC:                         2.368e+04
Df Residuals:                    2997   BIC:                         2.371e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.0032      2.127      3.293      0.001       2.833      11.174
HR             0.0963      0.016      6.096      0.000       0.065       0.127
MAP           -0.0207      0.020     -1.058      0.290      -0.059       0.018
==============================================================================
Omnibus:                     2344.530   Durbin-Watson:                   2.019
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            64995.321
Skew:                           3.503   Prob(JB):                         0.00
Kurtosis:                      24.699   Cond. No.                     1.11e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.11e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  2
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.017
Method:                 Least Squares   F-statistic:                     11.19
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           1.05e-10
Time:                        15:53:19   Log-Likelihood:                -11766.
No. Observations:                3000   AIC:                         2.354e+04
Df Residuals:                    2995   BIC:                         2.358e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -27.6645     12.436     -2.225      0.026     -52.048      -3.281
HR             0.0959      0.016      6.110      0.000       0.065       0.127
Temp           0.8992      0.334      2.689      0.007       0.243       1.555
MAP           -0.0263      0.019     -1.402      0.161      -0.063       0.010
Mg             0.8212      0.741      1.109      0.268      -0.631       2.274
==============================================================================
Omnibus:                     2376.999   Durbin-Watson:                   2.021
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            70465.592
Skew:                           3.551   Prob(JB):                         0.00
Kurtosis:                      25.656   Cond. No.                     6.98e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.98e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  3
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.016
Method:                 Least Squares   F-statistic:                     9.136
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           6.25e-10
Time:                        15:53:19   Log-Likelihood:                -11771.
No. Observations:                3000   AIC:                         2.356e+04
Df Residuals:                    2994   BIC:                         2.360e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    -15.8113     12.751     -1.240      0.215     -40.813       9.191
HR             0.1014      0.016      6.366      0.000       0.070       0.133
Temp           0.6361      0.337      1.890      0.059      -0.024       1.296
MAP           -0.0324      0.019     -1.720      0.086      -0.069       0.005
Mg             1.1163      0.763      1.464      0.143      -0.379       2.612
K             -0.6373      0.488     -1.306      0.192      -1.594       0.319
==============================================================================
Omnibus:                     2258.869   Durbin-Watson:                   2.026
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            55691.132
Skew:                           3.355   Prob(JB):                         0.00
Kurtosis:                      23.013   Cond. No.                     7.15e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.15e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Design Matrix 4
--For Iteration  4
                            OLS Regression Results                            
==============================================================================
Dep. Variable:         Length_of_stay   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.012
Method:                 Least Squares   F-statistic:                     10.12
Date:                Fri, 01 Nov 2019   Prob (F-statistic):           3.90e-08
Time:                        15:53:19   Log-Likelihood:                -11556.
No. Observations:                3000   AIC:                         2.312e+04
Df Residuals:                    2996   BIC:                         2.315e+04
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.5487      2.606      2.896      0.004       2.438      12.659
HR             0.0849      0.014      5.862      0.000       0.057       0.113
MAP           -0.0443      0.019     -2.301      0.021      -0.082      -0.007
Mg             0.9288      0.696      1.334      0.182      -0.437       2.294
==============================================================================
Omnibus:                     1979.821   Durbin-Watson:                   2.038
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31688.277
Skew:                           2.919   Prob(JB):                         0.00
Kurtosis:                      17.813   Cond. No.                     1.52e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.52e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
************************************************************

Unfortunately, the model 1b does not give us any improvements in performance. Instead, it becomes worst than the model 1a.

Model 2 (regression)

As such, we retried will a multi linear regression model as follows:

In [109]:
# Model 2 (regression)
# Multiple Linear Regression
regression_rsme = {}
for key, val in design_matrices.items():
    print("Design Matrix", key)
    
    regression_rsme[key] = {'train': [], 'test':[], 'regression': [], 'adjusted_rsquared':[]}
    
    for iternum, value in val.items():      
        df_train_ols = pd.merge(value['X_train'], value['Y_train'], how='left', on='RecordID')

        if key == '1':
            features = ['Length_of_stay', "HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]
            sub_features = ["HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]

        if key == '2':
            features = ['Length_of_stay', 'MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']
            sub_features = ['MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']

        if key == '3':
            features = ['Length_of_stay', 'ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender','Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate','MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na','PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2', 'SysABP','Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']
            sub_features = ['ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender','Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate','MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na','PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2', 'SysABP','Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']

        if key == '4':
            features = ['Length_of_stay','HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']
            sub_features = ['HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']

        multlin = ols(formula='Length_of_stay ~'+ ' + '.join(sub_features), data=df_train_ols[features]).fit()

        Y_pred = multlin.predict(value['X_test'])
        RMSE = rmse(value['Y_test']['Length_of_stay'], Y_pred)
        
        regression_rsme[key]['train'].append(math.sqrt((multlin.resid**2).mean()))
        regression_rsme[key]['test'].append(RMSE)
        regression_rsme[key]['regression'].append(multlin)
        regression_rsme[key]['adjusted_rsquared'].append(multlin.rsquared_adj)
        
    print("Mean Training RSME:", round(mean(regression_rsme[key]['train']),3))
    print("Mean Test RSME:", round(mean(regression_rsme[key]['test']),3))
    print("Mean Adjusted R-Squared:", round(mean(regression_rsme[key]['adjusted_rsquared']),3))
    print("")
Design Matrix 1
Mean Training RSME: 11.643
Mean Test RSME: 11.647
Mean Adjusted R-Squared: 0.084

Design Matrix 2
Mean Training RSME: 11.888
Mean Test RSME: 11.881
Mean Adjusted R-Squared: 0.047

Design Matrix 3
Mean Training RSME: 11.432
Mean Test RSME: 19.795
Mean Adjusted R-Squared: 0.108

Design Matrix 4
Mean Training RSME: 12.092
Mean Test RSME: 12.078
Mean Adjusted R-Squared: 0.014

Model 2's multilinear regression model performed better than the combination of Model 1a and 1b (Forward Stepwise Regression + Elastic Net Regression)

Classification Model Building and Evaluation

In [110]:
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectKBest, VarianceThreshold
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, roc_auc_score
In [111]:
estimators = [] #Store the list of estimators
hyperparams_list = [] #Store the list of hyperparams
auc_scores = {} #Store a dictionary of auc scores over the 4 design matrices

Model 1 (classification) - model building

Try decision tree model for classfication.

In [112]:
model_1_preprocessor = Pipeline(steps=[('f_scaler', StandardScaler(with_mean=True))])
In [113]:
# Model 1 (classification) - building
# Decision Tree Classifier
classifer_1_dtc = Pipeline([('preprocessor', model_1_preprocessor),
                ('dtc', DecisionTreeClassifier())])

hparam_grid_classifer_1_dtc = {'dtc__random_state':[0,1,2,3,4,5]} 

estimators.append(classifer_1_dtc)
hyperparams_list.append(hparam_grid_classifer_1_dtc)

Model 2 (classification) - model building

A basic logistic regresison pipeline preprocess will standard scalar to normalise the data, followed by select K best to retain features with the highest scores and PCA to optimise the features that can explain the most of the data.

In [114]:
# Model 2 (classification) - building
# Logistic Regression
classifer_1_lr_1 = Pipeline([('preprocessor', model_1_preprocessor),
                             ('select_best', SelectKBest(k='all')),
                             ('pca', PCA()),
                             ('classifier_LR', LogisticRegression(solver='lbfgs', random_state=0))])
                
hparam_grid_classifer_1_lr_1 = {'classifier_LR__C': [0.5, 1.0, 1.5, 2.0, 2.5, 3.0],'pca__tol': [0.5, 1.0, 1.5, 2.0], 'pca__random_state': [0,1,2,3,4,5]}

estimators.append(classifer_1_lr_1)
hyperparams_list.append(hparam_grid_classifer_1_lr_1)

Other models (classification) - model building

Other models we tried to outperform the models above:

In [115]:
# K-Neighbors Classifier
classifer_1_knc_1 = Pipeline([('preprocessor', model_1_preprocessor),
                             ('select_best', SelectKBest(k='all')),
                             ('pca', PCA()),
                             ('knc', KNeighborsClassifier(n_neighbors=3))])

hparam_grid_classifer_1_knc_1 = {'knc__n_neighbors': [25,50], 'select_best__k': [1, 2, 3, 4, 5],'pca__tol': [0.5, 1.0, 1.5, 2.0], 'pca__random_state': [0,1,2,3,4,5]}


estimators.append(classifer_1_knc_1)
hyperparams_list.append(hparam_grid_classifer_1_knc_1)
In [116]:
# Random Forest Classifier
classifer_4 = Pipeline(steps=[('f_selecter_kbest', SelectKBest(k='all')),
                       ('pca', PCA()),
                       ('rf', RandomForestClassifier())])

hparam_grid_classifer_4 = {'rf__max_depth': [2,3,4,5], 'rf__n_estimators': [10, 20, 30, 40, 50], 'rf__random_state': [0,1,2,3,4,5]}

estimators.append(classifer_4)
hyperparams_list.append(hparam_grid_classifer_4)

Classification Model - Evaluation

In [117]:
t1 = datetime.now()
print("Started Data Extraction at", t1.strftime('%Y-%m-%d %H:%M:%S'))

for key, val in design_matrices.items():
    print('For Design Matrix', key)
    
    auc_scores[key] = {}
    if key == '1':
        features = ["HCO3", "Urine", "HR", "Bilirubin", "BUN", "GCS", "K", "Na", "PaO2", "SysABP", "Temp", "WBC"]

    if key == '2':
        features = ['MechVent', 'Temp', 'GCS', 'PaO2_FiO2_ratio', 'HR', 'MAP']

    if key == '3':
        features = ['ALP', 'ALT', 'AST', 'Age', 'Albumin', 'BUN', 'Bilirubin', 'Cholesterol', 'Creatinine', 'DiasABP', 'FiO2', 'GCS', 'Gender','Glucose', 'HCO3', 'HCT', 'HR', 'Height', 'ICUType', 'K', 'Lactate','MAP', 'MechVent', 'Mg', 'NIDiasABP', 'NIMAP', 'NISysABP', 'Na','PaCO2', 'PaO2', 'Platelets', 'RespRate', 'SaO2', 'SysABP','Temp', 'TroponinI', 'TroponinT', 'Urine', 'WBC', 'Weight', 'pH']

    if key == '4':
        features = ['HR', 'MAP', 'Temp', 'Na', 'K', 'Mg']

    for index, est in enumerate(estimators):
        auc_scores[key][index] = {'train': [], 'test': [], 'accuracy': [], 'precision': [], 'recall': [], 'classifier':[]}
        
        for iternum, value in val.items():
            estimator = GridSearchCV(est, hyperparams_list[index], cv=3, scoring='roc_auc')
            estimator.fit(value['X_train'][features], value['Y_train']['In-hospital_death'])
            #print("\nFor Classifier #", index+1)
            #print('Best hyperparameter:', estimator.best_params_)
            #print('Best Training AUC score:', round(estimator.best_score_,3))
            best_est = estimator.best_estimator_
            scores = cross_val_score(best_est, value['X_train'][features], value['Y_train']['In-hospital_death'], cv=3, scoring='roc_auc')

            Y_pred = best_est.predict(value['X_test'][features])

            #Y_pred = estimator.predict(value['X_test'][features])
            #print("Test AUC Score ", round(roc_auc_score(value['Y_test']['In-hospital_death'], Y_pred),3))
            #print("Test accuracy = {:.2%}".format(accuracy_score(value['Y_test']['In-hospital_death'], Y_pred)))
            #print("Precision: {:.2%}; Recall: {:.2%}".format(precision_score(value['Y_test']['In-hospital_death'], Y_pred), 
            #                                             recall_score(value['Y_test']['In-hospital_death'], Y_pred)))
            
            auc_scores[key][index]['train'].append(np.mean(scores))
            #auc_scores[key][index]['train'].append(round(estimator.best_score_,3))
            auc_scores[key][index]['test'].append(roc_auc_score(value['Y_test']['In-hospital_death'], Y_pred))
            auc_scores[key][index]['accuracy'].append(accuracy_score(value['Y_test']['In-hospital_death'], Y_pred))
            auc_scores[key][index]['precision'].append(precision_score(value['Y_test']['In-hospital_death'], Y_pred))
            auc_scores[key][index]['recall'].append(recall_score(value['Y_test']['In-hospital_death'], Y_pred))
            
            auc_scores[key][index]['classifier'].append(estimator)   
            t2 = datetime.now()
            diff = t2-t1
            
        print("Classifier", index+1, "- Mean Training AUC: ", round(mean(auc_scores[key][index]['train']), 3))
        print("Classifier", index+1, "- Mean Test AUC: ", round(mean(auc_scores[key][index]['test']),3))
        print("Classifier", index+1, "- Mean Accuracy Score: {:.2%}".format(mean(auc_scores[key][index]['accuracy'])))
        print("Classifier", index+1, "- Mean Precision Score: {:.2%}".format(mean(auc_scores[key][index]['precision'])))
        print("Classifier", index+1, "- Mean Recall Score: {:.2%}".format(mean(auc_scores[key][index]['recall'])))                                  
    t2 = datetime.now()
    diff = t2-t1
    #print("Design Matrix", key, "is done.\nTook about", diff.seconds, "seconds from start")
    print('************************************************************\n')

t2 = datetime.now()
diff = t2-t1
#print("\nEntire Data Extraction completed after", diff.seconds, "seconds from start")
Started Data Extraction at 2019-11-01 15:53:41
For Design Matrix 1
Classifier 1 - Mean Training AUC:  0.61
Classifier 1 - Mean Test AUC:  0.625
Classifier 1 - Mean Accuracy Score: 81.50%
Classifier 1 - Mean Precision Score: 33.93%
Classifier 1 - Mean Recall Score: 36.15%
Classifier 2 - Mean Training AUC:  0.805
Classifier 2 - Mean Test AUC:  0.583
Classifier 2 - Mean Accuracy Score: 86.92%
Classifier 2 - Mean Precision Score: 59.72%
Classifier 2 - Mean Recall Score: 18.77%
Classifier 3 - Mean Training AUC:  0.794
Classifier 3 - Mean Test AUC:  0.532
Classifier 3 - Mean Accuracy Score: 86.80%
Classifier 3 - Mean Precision Score: 80.87%
Classifier 3 - Mean Recall Score: 6.68%
Classifier 4 - Mean Training AUC:  0.809
Classifier 4 - Mean Test AUC:  0.528
Classifier 4 - Mean Accuracy Score: 86.30%
Classifier 4 - Mean Precision Score: 54.05%
Classifier 4 - Mean Recall Score: 6.47%
************************************************************

For Design Matrix 2
Classifier 1 - Mean Training AUC:  0.686
Classifier 1 - Mean Test AUC:  0.529
Classifier 1 - Mean Accuracy Score: 85.97%
Classifier 1 - Mean Precision Score: 46.10%
Classifier 1 - Mean Recall Score: 7.18%
Classifier 2 - Mean Training AUC:  0.689
Classifier 2 - Mean Test AUC:  0.509
Classifier 2 - Mean Accuracy Score: 86.20%
Classifier 2 - Mean Precision Score: 64.55%
Classifier 2 - Mean Recall Score: 2.10%
Classifier 3 - Mean Training AUC:  0.688
Classifier 3 - Mean Test AUC:  0.506
Classifier 3 - Mean Accuracy Score: 86.00%
Classifier 3 - Mean Precision Score: 18.75%
Classifier 3 - Mean Recall Score: 1.68%
Classifier 4 - Mean Training AUC:  0.704
Classifier 4 - Mean Test AUC:  0.51
Classifier 4 - Mean Accuracy Score: 85.85%
Classifier 4 - Mean Precision Score: 32.33%
Classifier 4 - Mean Recall Score: 2.73%
************************************************************

For Design Matrix 3
Classifier 1 - Mean Training AUC:  0.624
Classifier 1 - Mean Test AUC:  0.631
Classifier 1 - Mean Accuracy Score: 81.05%
Classifier 1 - Mean Precision Score: 33.85%
Classifier 1 - Mean Recall Score: 38.33%
Classifier 2 - Mean Training AUC:  0.817
Classifier 2 - Mean Test AUC:  0.613
Classifier 2 - Mean Accuracy Score: 87.45%
Classifier 2 - Mean Precision Score: 61.67%
Classifier 2 - Mean Recall Score: 25.16%
Classifier 3 - Mean Training AUC:  0.797
Classifier 3 - Mean Test AUC:  0.539
Classifier 3 - Mean Accuracy Score: 86.85%
Classifier 3 - Mean Precision Score: 68.69%
Classifier 3 - Mean Recall Score: 8.38%
Classifier 4 - Mean Training AUC:  0.805
Classifier 4 - Mean Test AUC:  0.512
Classifier 4 - Mean Accuracy Score: 86.40%
Classifier 4 - Mean Precision Score: 80.21%
Classifier 4 - Mean Recall Score: 2.48%
************************************************************

For Design Matrix 4
Classifier 1 - Mean Training AUC:  0.532
Classifier 1 - Mean Test AUC:  0.528
Classifier 1 - Mean Accuracy Score: 75.83%
Classifier 1 - Mean Precision Score: 18.22%
Classifier 1 - Mean Recall Score: 20.92%
Classifier 2 - Mean Training AUC:  0.587
Classifier 2 - Mean Test AUC:  0.499
Classifier 2 - Mean Accuracy Score: 86.02%
Classifier 2 - Mean Precision Score: 0.00%
Classifier 2 - Mean Recall Score: 0.00%
Classifier 3 - Mean Training AUC:  0.613
Classifier 3 - Mean Test AUC:  0.5
Classifier 3 - Mean Accuracy Score: 86.15%
Classifier 3 - Mean Precision Score: 0.00%
Classifier 3 - Mean Recall Score: 0.00%
Classifier 4 - Mean Training AUC:  0.663
Classifier 4 - Mean Test AUC:  0.51
Classifier 4 - Mean Accuracy Score: 86.38%
Classifier 4 - Mean Precision Score: 70.00%
Classifier 4 - Mean Recall Score: 2.04%
************************************************************

4. Deployment Workflow

  • Must Have: Pictorially show how each of your models will be deployed in an ICU
  • The workflow starts with the input and ends with two predictions.

  • Intermediate components will be from the pipeline of your best performing model.

  • Differences if any between training and prediction should be explained (no implementation is required).

  • Input should be 48 hours of patient data (assume the same file format) for a single patient

  • Output should be prediction of both mortality and LoS

Deployment Workflow

Deployment Workflow:

  • The patients' data will be first extracted from their patients' file.

  • The temporal variables would be the latest data from the patients' file within the 48 hours of patients' stay in the hospital assuming that they survive the 48 hours period.

  • The Design Matrix 3's features are all the static and temporal data.

  • The data is preprocessed with a simple imputer, replacing missing data represented by -1, with the most frequent values across the respective folds

  • The data is then divided into 4 folds into X and Y whereby X are the features while Y are the target variables.

  • These X and Y are further divided into Training and Test.

  • These datas are prepared for 4-fold Cross Validation (CV) whereby there are 4 iterations of training and testing the model performance with 3 folds being the training set and 1 remaining fold being the test set.

  • The models are then evaluated by their respective performance metrics - mainly Root Mean Squared Error (RSME) for regression model and Area Under the Curve (AUC) score.

  • They are taken as the average of the 4 iterations for each combination of Design Matrix with the model.

  • Additional Metrics being evaluated are as follows:

For Regression:

  • the Adjusted R-Squared value (the proportion of total variance explained by the model - which determine how useful the model is to predict and explain)

For Classifiers:

  • the accuracy scores - how accuracy the model is in making the predictions

  • the precision - the proportion of actual true in-hospital death cases out of all the predicted in-hospital death cases.

  • the recall - the proportion of actual true in-hospital death cases out of all the predictions made.

The best model after the cross validation for each tasks are then deployed into the system.

Each time there is a new patient's data in every 48 hours interval, it will be extracted and preprocessesd as an observation and the respective models will be run for a prediction - The regression model will predict the length of stays while the classifier will predict the patients' mortality - either "Will Die", "Will not die"

This predictions will allow the hospital managers to better allocate resources to achieve their objective of saving lives.

For length of stay prediction - allow hospital managers to better schedule the ICU units in advanced based on possible future demands.

For the mortality prediction - more allocation would be likely to those who are predicted to be "Will Die" so as to increase the chance of their survival.

Regression Model Performance Summary

In our evaluation of the regression models:

  • we have the multilinear regression model performing the best with Design Matrix 3
    • with a training RSME of 11.432 and a test RSME of 19.795. It is normal that the test will underperform the training.
    • The Adjusted R-squared is 0.108 is the second highest model.

Interpretation:

  • The model with the given features of design matrix 3 has the explainatory power of 0.108 to predict the length of stay of each patient.

  • The model has about RSME = 19.795 of the discrepancy between the observed values and the values expected under the model in question.

Classifier Model Performance Summary

In our evaluation of classifier model:

  • Logistic Regression Pipeline with the StandardScalar, SelectKBest and PCA has the best predictive power of ROC_AUC score of 0.817 in training and 0.613 in the test

  • It also has the highest accuracy score of 87.45% along with a precision of 61.67% and 25.16%

Interpretation :

  • Accuracy score: the model is able to predict correctly classify the labels 4 out of every 5 patients.

  • each prediction of either "Will Die" or "Will not die" would be realised truely about 61.67% of the time.

  • each prediction of "Will Die" has about 1/4 of a chance of being correct.