Kaggle Submission: Titanic

I've already briefly done some work in the dataset in my tutorial for Logistic Regression - but never in entirety. I decided to re-evaluate utilizing Random Forest and submit to Kaggle. In this dataset, we're utilizing a testing/training dataset of passengers on the Titanic in which we need to predict if passengers survived or not (1 or 0). ...
Read More

Including Jupyter Notebooks on WordPress – Part II

An update to a post I wrote in July of 2020 about embedding Jupyter Notebooks on WordPress. The original post is located here. I continued along utilizing the nbconvert shortcode for a while and encountered a huge amount of problems with https://nbviewer.jupyter.org/. Essentially - I believe there is some sort of bug with the system. Normally it takes 1-2 days for nbviewer to "recognize" something that is dropped into github - which isn't a huge deal but a big annoyance. In addition - I've had multiple workbooks go 1-2 weeks without being recognized, and a few that were fine and then dropped into a 404 error on the site. Either way - I'm looking at using a great little utility called "nb2wp" and located on github here. All it really does is perform the same conversion, but drops it locally to a .html file with CSS inline, along with saving all of the images (and embeds the links to those images as...
Read More

Principal Component Analysis in Python – Simple Example

The greatest variance is shown on an orthogonal line perpendicular to the axis. Likewise, the second greatest variation on the second axis, and so on. This allows us to reduce the number of variables used in an analysis. Taking this a step further - we can expand to higher level of dimensions - shown as "components". If we utilize a dataset with a large number of variables, this helps us reduce the amount of variation to a small number of components - but these can be tough to interpret. A much more detailed walk-through on the theory can be found here. I'm going to show how this analysis can be done utilizing Scikit learn in Python. The dataset were going to be utilizing can be loaded directly within sklearn as shown below. ...
Read More

K Means Clustering in Python – Simple Example

Just a little example on how to use a K Means Clustering model in Python. Or - how to take data and predict likely assignments among groupings of a given number of clusters. In this example we’re utilizing a fairly generic dataset of universities in which we're going to predict clusters of Public or Private universities (2 clusters). In the case of this data we know if they are Public or Private so we can actually evaluate the accuracy of the model, which would not be a common ability in most real world applications. The dataset can be found on my github here. ...
Read More

Grid Searching in Python – Machine Learning

So what happens if your ML model is obviously flawed? I.E. it predicts the flip of a coin to always be heads, etc. One potential problem is that the model parameters need to be tweaked from its defaults. A way to fix this problem is to utilize a grid search, in which we test multiple combinations of parameters to see which produce the most accurate results. In this example we're utilizing a breast cancer dataset already present in the scikit learn library. We're going to predict if values fall in the 'target' field or not (present as a binary 0 or 1). ...
Read More

Support Vector Machines in Python – Simple Example

Just a little example on how to use the Support Vector Machines model in Python. Support Vector Machines simply separate or classify data based on groupings, by dividing up the surface into "hyperplanes" (if the data point is in hyperplane 'A', it's most likely related to that cluster instead of the cluster in hyperplane 'B'). Again - a very simplistic description, but it's not terribly complex to understand at its highest level. A good description in detail can be located here. Interesting to note this can be calculated linearly and non-linearly (particularly in the third dimension). In this example we’re utilizing a cancer dataset that is provided within Scikit learn, and we're going to predict values based on the "target" field therein. The dataset can be imported as shown in the code. ...
Read More

K-Nearest Neighbor in Python – Simple Example

Just a little example on how to use the K-Nearest Neighbor model in Python. In this example we're utilizing an anonymous dataset with unknown context. All we know is that there is a TARGET CLASS of 0/1 and we want to predict if a row belongs to that class or not. K-Nearest Neighbor functions by comparing a point to a certain number of points around it. It's pretty simple to implement. The dataset can be found on my github here. ...
Read More