Just a little example on how to use Decision Trees and Random Forests in Python. Basically – tress are a type of “flow-chart”, a decision tree, with nodes/edges to determine a likely outcome. Given that trees can be very difficult to predict, we can utilize Random Forests to take specific features/variables to build multiple decision trees and then average the results. This is a very very brief and vague explanation – as these posts are meant to be quick shots of how-to code and not to teach the theory behind the method. In this example we’re utilizing a small healthcare dataset that predicts if spinal surgery for an individual was successful to help a particular condition (Kyphosis). Both methods are shown to see differences in accuracy. The dataset can be found on my github here.

## Decision Trees and Random Forests - Simple Example¶

### I.E. predict values based on "flow-chart" of nodes/edges¶

In this example we're utilizing a healthcare dataset that predicts if spinal surgery for an individual was successful to help a particular condition (Kyphosis).

In :
```#Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
```
In :
```#Read in the data
```
In :
```df.head()
#Kyphosis = Was the condition absent or present after the surgery?
#Age = Age of person in months
#Number = Number of vertebrae involved
#Start = Number of top most vertebrae operated on
```
Out:
Kyphosis Age Number Start
0 absent 71 3 5
1 absent 158 3 14
2 present 128 4 5
3 absent 2 5 1
4 absent 1 4 15
In :
```df.info()
#You'll see this is a very small dataset
```
```<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 4 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   Kyphosis  81 non-null     object
1   Age       81 non-null     int64
2   Number    81 non-null     int64
3   Start     81 non-null     int64
dtypes: int64(3), object(1)
memory usage: 2.7+ KB
```

### Decision Tree - Split data into test/train and predict¶

In :
```#Import splitting library
from sklearn.model_selection import train_test_split
```
In :
```#Set X,Y
X = df.drop('Kyphosis',axis=1)
y = df['Kyphosis']
```
In :
```#Choose the test size
#Test size = % of dataset allocated for testing (.3 = 30%)
#Random state = # of random splits
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
```
In :
```#Import library
from sklearn.tree import DecisionTreeClassifier
```
In :
```#Create object
dtree = DecisionTreeClassifier()
```
In :
```#Fit
dtree.fit(X_train,y_train)
```
Out:
`DecisionTreeClassifier()`
In :
```#Predict
predictions = dtree.predict(X_test)
```
In :
```#See if the model worked, print reports (not particularly wonderful here...)
from sklearn.metrics import classification_report, confusion_matrix
```
In :
```print(confusion_matrix(y_test,predictions))
```
```[[12  5]
[ 6  2]]
```
In :
```print(classification_report(y_test,predictions))
```
```              precision    recall  f1-score   support

absent       0.67      0.71      0.69        17
present       0.29      0.25      0.27         8

accuracy                           0.56        25
macro avg       0.48      0.48      0.48        25
weighted avg       0.54      0.56      0.55        25

```

### Random Forest - Split data into test/train and predict¶

You'll notice this performs much better (in this instance) than the decision tree above

In :
```#Import library
from sklearn.ensemble import RandomForestClassifier
```
In :
```#Create object = set estimators to 200
rfc = RandomForestClassifier(n_estimators=200)
```
In :
```#Fit
rfc.fit(X_train,y_train)
```
Out:
`RandomForestClassifier(n_estimators=200)`
In :
```#Predict
rfc_predictions = rfc.predict(X_test)
```
In :
```#See if the model worked, print reports (MUCH better here)
print(confusion_matrix(y_test,rfc_predictions))
```
```[[17  0]
[ 6  2]]
```
In :
```print(classification_report(y_test,rfc_predictions))
```
```              precision    recall  f1-score   support

absent       0.74      1.00      0.85        17
present       1.00      0.25      0.40         8

accuracy                           0.76        25
macro avg       0.87      0.62      0.62        25
weighted avg       0.82      0.76      0.71        25

```