## Keras and Tensorflow Basics in Python – Part II

Another simple example of utilizing Keras to predict a multi-classification problem. Details in the Jupyter notebook below.
...

Skip to content
# Category: Python Tips/Tricks

## Keras and Tensorflow Basics in Python – Part II

## Natural Language Processing in Python – Sentiment using VADER

## Natural Language Processing in Python – LDA vs. NMF

## Kaggle Submission: Mobile Phone Price Classification

## Project: Social Media Sentiment

## Frequency Table – Python

## Keras and Tensorflow Pt III – Classification Example

## Choosing the right number of layers/neurons for a Neural Network (Python)

## Keras and Tensorflow Pt II – Regression Example

## Keras and Tensorflow Basics in Python – Simple Example

Another simple example of utilizing Keras to predict a multi-classification problem. Details in the Jupyter notebook below.
...

Another little bit of NLP showing how to do a quick and dirty sentiment analysis utilizing VADER within NLTK. It doesn't give the best accuracy on all datasets, but it removes complexity. Details below in the Jupyter Notebook.
...

Another little bit of NLP comparing Topic clustering in an unsupervised problem. Details below in the Jupyter Notebook.
...

A recent "submission" I completed based upon a Kaggle dataset. I use quotations as this particular dataset didn't include the ability to submit your final dataset. However, I had fun doing it and is a good example of multi-classification.
...

A recent project I completed related to Natural Language Processing - embedded below as a Jupyter Notebook.
...

A small script I wrote just because I happened to need it. Generates a quick and more "easy-to-read" frequency table for ranges specified in the bins.
Simple Frequency Table CodeTakes a list (sample), separates them by bins, and gives a frequency table with histogram
In [1]:
#Imports
import pandas as pd
import seaborn as sns
In [2]:
#Give list
sample = [10, 15, 12, 17, 22, 14, 23, 8, 15, 11, 17, 12, 16, 26, 12, 11, 9, 16, 15, 24, 12, 17, 16, 14, 19, 13, 10, 15, 19, 20,
10, 25, 14, 15, 12, 22, 7, 28, 16, 9]
#Put list into df
df = pd.DataFrame(sample, columns=['nums'])
In [3]:
#Set bin sizes
bins = [5, 9, 13, 17, 21, 25, 29]
In [4]:
#Put into dataframe
newdf = pd.DataFrame(pd.cut(df['nums'], bins=bins).value_counts()).sort_index()
newdf.reset_index(inplace=True)
#Convert to String
newdf['index'] = newdf['index'].astype(str)
In [5]:
#Set 'easy-to-read' names for bins
left = newdf['index'].str.split(',').str[0].str.split('(').str[1].astype('int32') + 1
right = newdf['index'].str.split(',').str[1].str.split(']').str[0]
fullname = left.astype(str) + ' -' + right
newdf['index'] = fullname
In [6]:
#cummulative frequency
newdf['cumfreq'] = newdf['nums'].cumsum()
#relative frequency
newdf['relfreq'] = newdf['nums'] / newdf['nums'].sum()
#cummulative relative frequency
newdf['cumrelfreq'] = newdf['relfreq'].cumsum()
#Add column names
newdf.columns =['Class Interval', 'Frequency', 'Cummulative Frequency', 'Relative Frequency', 'Cumulative Relative Frequency']
In [7]:
#Show frequency table
newdf
Out[7]:
Class...

This is a good simple example of a classification problem utilizing Keras and Tensorflow. In addition, I'm utilizing early stopping in an attempt to avoid overfitting in the model. You'll notice this take effect as the model stops training well before the 600 set epochs.
Keras / Tensorflow Classification - ExampleHere we're going to attempt to utilize Keras/Tensorflow to predict the whether or not an individual has cancer.
The data being used can be seen on my github below:
https://github.com/kaledev/PythonSnippets/blob/master/Datasets/Keras/cancer_classification.csv
Data Imports and EDA
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
df = pd.read_csv('DATA/cancer_classification.csv')
Here we can see the dataset is fairly well balanced in terms of classification of the labels, if the dataset was unbalanced then we might see issues with overfitting.
In [3]:
sns.countplot(x='benign_0__mal_1',data=df)
Out[3]:
<AxesSubplot:xlabel='benign_0__mal_1', ylabel='count'>
Create Models and Predict
In [4]:
#Set X/y
X = df.drop('benign_0__mal_1', axis=1).values
y = df['benign_0__mal_1'].values
In [5]:
from sklearn.model_selection import train_test_split
In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
We need to scale the data so all features are in sync
In [7]:
from sklearn.preprocessing import MinMaxScaler
In [8]:
scaler = MinMaxScaler()
In [9]:
X_train =...

This seems to be a very confusing subject for most, and I've had difficulty while learning how to setup Keras NN models as the addition/subtraction of layers and neurons creates vastly different outcomes in model. Normally I wouldn't just link out to others, but there is a very well written synopsis found on StackExchange below that lays it out in a very simple fashion. Very brief summary:
Input (first) layer: Neurons = Number of features in the datasetHidden layer(s): Neurons = Somewhere between 1 and the amount in the input later (take the mean); Number of hidden layers: 1 works for *most* applications, maybe none.Output (last) layer: exactly 1 unless it's a classification problem and you utilize the softmax activation, in which case the number equals the number of classes you are predicting
https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
Meaning in the case of a dataset with 20 features:
#Example Keras Binary Classification model
model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam')
#Example Keras Multi-Class model
model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation='softmax')) #If I...

This is a more complex example of Keras, utilizing Regression. This utilizes a good sized dataset from Kaggle, but does contain a little bit of data cleansing before we can build out the model. Unfortunately the model we end up building isn't perfect and requires more tuning or some final dataset alterations, but it's a good example none the less. More information below.
Keras / Tensorflow Regression - ExampleHere we're going to attempt to utilize Keras/Tensorflow to predict the price of homes based upon a set of features.
The data being used comes from Kaggle:
https://www.kaggle.com/harlfoxem/housesalesprediction
Imports
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Data Exploration and Cleansing
In [2]:
df = pd.read_csv('DATA/kc_house_data.csv')
Since we're going to predict prices, we can do a quick distribution. We can see the vast majority sit around 500k, and we have some outliers all the way out to 7m+ (but very few).
In [3]:
plt.figure(figsize=(15,6))
sns.distplot(df['price'])
Out[3]:
<AxesSubplot:xlabel='price'>
One thing we may want to do is get rid of these outliers (at least to an...

Below is a small example showing how to utilize Keras/Tensorflow 2.0 to predict a value utilizing a small dataset. More explanations to follow in the Jupyter notebook below...
Keras / Tensorflow Basics - A Simple ExampleThe dataset utilized here is fake, for the sake of example use only. It contains a price and two "features". We're assuming the dataset is a price listing of gemstones, and based on the features we can predict what the price of a new gemstone added to the list may be.
The data can be found here.
In [1]:
#Imports
import pandas as pd
import numpy as np
import seaborn as sns
Data
In [2]:
df = pd.read_csv('Keras/fake_reg.csv')
In [3]:
df.head()
Out[3]:
price
feature1
feature2
0
461.527929
999.787558
999.766096
1
548.130011
998.861615
1001.042403
2
410.297162
1000.070267
998.844015
3
540.382220
999.952251
1000.440940
4
546.024553
1000.446011
1000.338531
In [4]:
sns.pairplot(df)
Out[4]:
<seaborn.axisgrid.PairGrid at 0x18188b92e48>
This is a very simply dataset, but the pairplot can show us how the two features may correlate to pricing.
Training the Model
In [5]:
from sklearn.model_selection import train_test_split
In [6]:
#We need .values because it's best to pass in numpy arrays due to how tensorflow works
X = df[['feature1', 'feature2']].values
y = df['price'].values
In [7]:
#Split into test/train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
In [8]:
#Scale data to be...