Just a little example on how to use the Support Vector Machines model in Python. Support Vector Machines simply separate or classify data based on groupings, by dividing up the surface into “hyperplanes” (if the data point is in hyperplane ‘A’, it’s most likely related to that cluster instead of the cluster in hyperplane ‘B’). Again – a very simplistic description, but it’s not terribly complex to understand at its highest level. A good description in detail can be located here. Interesting to note this can be calculated linearly and non-linearly (particularly in the third dimension). In this example we’re utilizing a cancer dataset that is provided within Scikit learn, and we’re going to predict values based on the “target” field therein. The dataset can be imported as shown in the code.
Support Vector Machines - Simple Example¶
I.E. predict values based on division of clustered data into "hyperplanes"¶
In this example we're utilizing a breast cancer dataset already present in the scikit learn library. We're going to predict if values fall in the 'target' field or not (present as a binary 0 or 1).
#Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#Import data
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
#Some info on this dataset
print(cancer['DESCR'])
#Convert to dataframe
df_feat = pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
df_feat.head()
Split data into test/train and predict¶
#Import splitting library
from sklearn.model_selection import train_test_split
#Set X,Y
X = df_feat
y = cancer['target']
#Choose the test size
#Test size = % of dataset allocated for testing (.3 = 30%)
#Random state = # of random splits
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
#Import library
from sklearn.svm import SVC
#Create object
model = SVC()
#Fit
model.fit(X_train,y_train)
#Predict
predictions = model.predict(X_test)
#See if the model worked, print reports (worked very well)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,predictions))
print(classification_report(y_test,predictions))