Principal Component Analysis in Python – Simple Example

The greatest variance is shown on an orthogonal line perpendicular to the axis. Likewise, the second greatest variation on the second axis, and so on. This allows us to reduce the number of variables used in an analysis. Taking this a step further - we can expand to higher level of dimensions - shown as "components". If we utilize a dataset with a large number of variables, this helps us reduce the amount of variation to a small number of components - but these can be tough to interpret. A much more detailed walk-through on the theory can be found here. I'm going to show how this analysis can be done utilizing Scikit learn in Python. The dataset were going to be utilizing can be loaded directly within sklearn as shown below. ...
Read More

K Means Clustering in Python – Simple Example

Just a little example on how to use a K Means Clustering model in Python. Or - how to take data and predict likely assignments among groupings of a given number of clusters. In this example we’re utilizing a fairly generic dataset of universities in which we're going to predict clusters of Public or Private universities (2 clusters). In the case of this data we know if they are Public or Private so we can actually evaluate the accuracy of the model, which would not be a common ability in most real world applications. The dataset can be found on my github here. ...
Read More

Grid Searching in Python – Machine Learning

So what happens if your ML model is obviously flawed? I.E. it predicts the flip of a coin to always be heads, etc. One potential problem is that the model parameters need to be tweaked from its defaults. A way to fix this problem is to utilize a grid search, in which we test multiple combinations of parameters to see which produce the most accurate results. In this example we're utilizing a breast cancer dataset already present in the scikit learn library. We're going to predict if values fall in the 'target' field or not (present as a binary 0 or 1). ...
Read More

Support Vector Machines in Python – Simple Example

Just a little example on how to use the Support Vector Machines model in Python. Support Vector Machines simply separate or classify data based on groupings, by dividing up the surface into "hyperplanes" (if the data point is in hyperplane 'A', it's most likely related to that cluster instead of the cluster in hyperplane 'B'). Again - a very simplistic description, but it's not terribly complex to understand at its highest level. A good description in detail can be located here. Interesting to note this can be calculated linearly and non-linearly (particularly in the third dimension). In this example we’re utilizing a cancer dataset that is provided within Scikit learn, and we're going to predict values based on the "target" field therein. The dataset can be imported as shown in the code. ...
Read More

K-Nearest Neighbor in Python – Simple Example

Just a little example on how to use the K-Nearest Neighbor model in Python. In this example we're utilizing an anonymous dataset with unknown context. All we know is that there is a TARGET CLASS of 0/1 and we want to predict if a row belongs to that class or not. K-Nearest Neighbor functions by comparing a point to a certain number of points around it. It's pretty simple to implement. The dataset can be found on my github here. ...
Read More

Decision Trees and Random Forests in Python – Simple Example

Just a little example on how to use Decision Trees and Random Forests in Python. Basically - tress are a type of "flow-chart", a decision tree, with nodes/edges to determine a likely outcome. Given that trees can be very difficult to predict, we can utilize Random Forests to take specific features/variables to build multiple decision trees and then average the results. This is a very very brief and vague explanation - as these posts are meant to be quick shots of how-to code and not to teach the theory behind the method. In this example we're utilizing a small healthcare dataset that predicts if spinal surgery for an individual was successful to help a particular condition (Kyphosis). Both methods are shown to see differences in accuracy. The dataset can be found on my github here. ...
Read More

Logistic Regression in Python – Simple Example

A great simple example on how to deal with Logistic Regression in Python utilizing Matplotlib, Seaborn, and Scikit-learn. The data is pulled from Kaggle.com and found here. Normally this comes as a test and train set, but I'm analyzing the training set only to see the accuracy of the model. The data here is a little dirty and necessitated some data cleansing first, probably worthy of another short tutorial / post in terms of some other methods of initial cleansing before working on data... ...
Read More