Including Jupyter Notebooks on WordPress – Part II

An update to a post I wrote in July of 2020 about embedding Jupyter Notebooks on WordPress. The original post is located here. I continued along utilizing the nbconvert shortcode for a while and encountered a huge amount of problems with https://nbviewer.jupyter.org/. Essentially - I believe there is some sort of bug with the system. Normally it takes 1-2 days for nbviewer to "recognize" something that is dropped into github - which isn't a huge deal but a big annoyance. In addition - I've had multiple workbooks go 1-2 weeks without being recognized, and a few that were fine and then dropped into a 404 error on the site. Either way - I'm looking at using a great little utility called "nb2wp" and located on github here. All it really does is perform the same conversion, but drops it locally to a .html file with CSS inline, along with saving all of the images (and embeds the links to those images as...
Read More

Recommender System in Python – Simple Example

It's actually very easy to build a simple recommendation system in Python. I'll show you how to do it utilizing a movie dataset with various user ratings. In this case we're just comparing two movies against all others to recommend what a user might like if they were to enjoy Star Wars or Liar Liar. Recommender System - A Simple ExampleThis example utilizes a dataset of movie ratings by user. We'll attempt to recommend other movies a user might like based upon the first. We have 'u.data' and 'Movie_ID_Titles' files to read in. u.data is located here. Movie_ID_Titles is located here. In [ ]: #Initial Imports import numpy as np import pandas as pd import seaborn as sns %matplotlib inline Data Imports In [81]: #Set column names column_names = ['user_id','item_id','rating','timestamp'] #Read in data df = pd.read_csv('u.data',sep='\t',names=column_names) In [82]: df.head() Out[82]: user_id item_id rating timestamp 0 0 50 5 881250949 1 0 172 5 881250949 2 0 133 1 881250949 3 196 242 3 881250949 4 186 302 3 891717742 In [83]: #Read in data movie_titles = pd.read_csv('Movie_Id_Titles') In [84]: movie_titles.head() Out[84]: item_id title 0 1 Toy Story (1995) 1 2 GoldenEye (1995) 2 3 Four Rooms (1995) 3 4 Get Shorty (1995) 4 5 Copycat (1995) In [85]: #Merge these two datasets on the item id df = pd.merge(df,movie_titles,on='item_id') In [86]: df.head() Out[86]: user_id item_id rating timestamp title 0 0 50 5 881250949 Star Wars (1977) 1 290 50 5 880473582 Star Wars (1977) 2 79 50 4 891271545 Star Wars (1977) 3 2 50 5 888552084 Star Wars (1977) 4 8 50 5 879362124 Star Wars (1977) Data ManipulationFirst we want to create a ratings dataframe to hold the...
Read More

Principal Component Analysis in Python – Simple Example

The greatest variance is shown on an orthogonal line perpendicular to the axis. Likewise, the second greatest variation on the second axis, and so on. This allows us to reduce the number of variables used in an analysis. Taking this a step further - we can expand to higher level of dimensions - shown as "components". If we utilize a dataset with a large number of variables, this helps us reduce the amount of variation to a small number of components - but these can be tough to interpret. A much more detailed walk-through on the theory can be found here. I'm going to show how this analysis can be done utilizing Scikit learn in Python. The dataset were going to be utilizing can be loaded directly within sklearn as shown below. Principal Component Analysis - Simple ExampleWe're going to be utilizing a cancer dataset that is found within sklearn. We're going to try to find what components are most important (show the most...
Read More

Refresh all views in SQL Server database

One problem I've had recently is a SQL Server instance in which the views were constantly cached and needed to be refreshed. Normally to do this you would run the build in procedure: exec sp_refreshview MyView However - I had the need to run this on all my views daily, and we constantly had new views. You can use the stored proc below to simply loop all the views and run as needed. create procedure usp_refreshview as declare @sqlcmd nvarchar(max) = '' select @sqlcmd = @sqlcmd + 'exec sp_refreshview ' + '''' + name + '''; ' from sys.objects as so where so.type = 'V' print @sqlcmd if len(@sqlcmd) > 0 exec(@sqlcmd) To run, simply exec: exec usp_refreshview ...
Read More

K Means Clustering in Python – Simple Example

Just a little example on how to use a K Means Clustering model in Python. Or - how to take data and predict likely assignments among groupings of a given number of clusters. In this example we’re utilizing a fairly generic dataset of universities in which we're going to predict clusters of Public or Private universities (2 clusters). In the case of this data we know if they are Public or Private so we can actually evaluate the accuracy of the model, which would not be a common ability in most real world applications. The dataset can be found on my github here. ...
Read More

Rows to Comma Separated Lists in SQL Server

A useful bit of syntax in the case that you end up with multiple levels of granularity in a dataset, and you want to collapse it back down. One solution would be to turn your rows into columns (pivoting), or put your row values into a single column separated by commas (or pipes, etc.). Below I'm using the AdventureWorks2017 database to show how to do the latter. SELECT [CountryRegionCode] ,StateProvinceCode FROM [AdventureWorks2017].[Person].[StateProvince] Here it produces a gigantic list of countries with associated state/provinces. So let's collapse that down... SELECT outerQ.[CountryRegionCode], STUFF(( SELECT DISTINCT ',' + subQ.StateProvinceCode FROM [AdventureWorks2017].[Person].[StateProvince] subQ WHERE outerQ.CountryRegionCode = subQ.CountryRegionCode FOR XML PATH('') ), 1, 1, '' ) as LastNameList FROM [AdventureWorks2017].[Person].[StateProvince] outerQ GROUP BY outerQ.[CountryRegionCode]; ...
Read More

Grid Searching in Python – Machine Learning

So what happens if your ML model is obviously flawed? I.E. it predicts the flip of a coin to always be heads, etc. One potential problem is that the model parameters need to be tweaked from its defaults. A way to fix this problem is to utilize a grid search, in which we test multiple combinations of parameters to see which produce the most accurate results. In this example we're utilizing a breast cancer dataset already present in the scikit learn library. We're going to predict if values fall in the 'target' field or not (present as a binary 0 or 1). ...
Read More

Support Vector Machines in Python – Simple Example

Just a little example on how to use the Support Vector Machines model in Python. Support Vector Machines simply separate or classify data based on groupings, by dividing up the surface into "hyperplanes" (if the data point is in hyperplane 'A', it's most likely related to that cluster instead of the cluster in hyperplane 'B'). Again - a very simplistic description, but it's not terribly complex to understand at its highest level. A good description in detail can be located here. Interesting to note this can be calculated linearly and non-linearly (particularly in the third dimension). In this example we’re utilizing a cancer dataset that is provided within Scikit learn, and we're going to predict values based on the "target" field therein. The dataset can be imported as shown in the code. ...
Read More

K-Nearest Neighbor in Python – Simple Example

Just a little example on how to use the K-Nearest Neighbor model in Python. In this example we're utilizing an anonymous dataset with unknown context. All we know is that there is a TARGET CLASS of 0/1 and we want to predict if a row belongs to that class or not. K-Nearest Neighbor functions by comparing a point to a certain number of points around it. It's pretty simple to implement. The dataset can be found on my github here. ...
Read More

Decision Trees and Random Forests in Python – Simple Example

Just a little example on how to use Decision Trees and Random Forests in Python. Basically - tress are a type of "flow-chart", a decision tree, with nodes/edges to determine a likely outcome. Given that trees can be very difficult to predict, we can utilize Random Forests to take specific features/variables to build multiple decision trees and then average the results. This is a very very brief and vague explanation - as these posts are meant to be quick shots of how-to code and not to teach the theory behind the method. In this example we're utilizing a small healthcare dataset that predicts if spinal surgery for an individual was successful to help a particular condition (Kyphosis). Both methods are shown to see differences in accuracy. The dataset can be found on my github here. ...
Read More