Google Sheets Gantt Chart

Just something I thought was a little interesting - I couldn't find any free Gantt chart web-based apps that were easily shareable (preferably in Google Docs). So, I built one using Google sheets. It's not perfect, and only has very basic functionality. It's a perfect chart for running through basic timing and milestones (particularly for a research project, which is why it's geared toward organizing a dissertation). Click below to make a copy of the link for your own usage. It does assume a level of familiarity with Excel/Sheets and the familiarity with formulas. Google Sheets Link All you need to do is the following: Add new rows if neededEdit/Add dates if needed across columnsCopy down formulas in your new rows/columns from existing (included the hidden column H), make sure to special paste formulas only so the formatting doesn't get thrown offSet your start date / due date (A light blue line will generate)Set the percentage done in the PCT OF TASK COMPLETE...
Read More

Frequency Table – Python

A small script I wrote just because I happened to need it. Generates a quick and more "easy-to-read" frequency table for ranges specified in the bins. Simple Frequency Table CodeTakes a list (sample), separates them by bins, and gives a frequency table with histogram In [1]: #Imports import pandas as pd import seaborn as sns In [2]: #Give list sample = [10, 15, 12, 17, 22, 14, 23, 8, 15, 11, 17, 12, 16, 26, 12, 11, 9, 16, 15, 24, 12, 17, 16, 14, 19, 13, 10, 15, 19, 20, 10, 25, 14, 15, 12, 22, 7, 28, 16, 9] #Put list into df df = pd.DataFrame(sample, columns=['nums']) In [3]: #Set bin sizes bins = [5, 9, 13, 17, 21, 25, 29] In [4]: #Put into dataframe newdf = pd.DataFrame(pd.cut(df['nums'], bins=bins).value_counts()).sort_index() newdf.reset_index(inplace=True) #Convert to String newdf['index'] = newdf['index'].astype(str) In [5]: #Set 'easy-to-read' names for bins left = newdf['index'].str.split(',').str[0].str.split('(').str[1].astype('int32') + 1 right = newdf['index'].str.split(',').str[1].str.split(']').str[0] fullname = left.astype(str) + ' -' + right newdf['index'] = fullname In [6]: #cummulative frequency newdf['cumfreq'] = newdf['nums'].cumsum() #relative frequency newdf['relfreq'] = newdf['nums'] / newdf['nums'].sum() #cummulative relative frequency newdf['cumrelfreq'] = newdf['relfreq'].cumsum() #Add column names newdf.columns =['Class Interval', 'Frequency', 'Cummulative Frequency', 'Relative Frequency', 'Cumulative Relative Frequency'] In [7]: #Show frequency table newdf Out[7]: Class...
Read More

Keras and Tensorflow Pt III – Classification Example

This is a good simple example of a classification problem utilizing Keras and Tensorflow. In addition, I'm utilizing early stopping in an attempt to avoid overfitting in the model. You'll notice this take effect as the model stops training well before the 600 set epochs. Keras / Tensorflow Classification - ExampleHere we're going to attempt to utilize Keras/Tensorflow to predict the whether or not an individual has cancer. The data being used can be seen on my github below: https://github.com/kaledev/PythonSnippets/blob/master/Datasets/Keras/cancer_classification.csv Data Imports and EDA In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns In [2]: df = pd.read_csv('DATA/cancer_classification.csv') Here we can see the dataset is fairly well balanced in terms of classification of the labels, if the dataset was unbalanced then we might see issues with overfitting. In [3]: sns.countplot(x='benign_0__mal_1',data=df) Out[3]: <AxesSubplot:xlabel='benign_0__mal_1', ylabel='count'> Create Models and Predict In [4]: #Set X/y X = df.drop('benign_0__mal_1', axis=1).values y = df['benign_0__mal_1'].values In [5]: from sklearn.model_selection import train_test_split In [6]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42) We need to scale the data so all features are in sync In [7]: from sklearn.preprocessing import MinMaxScaler In [8]: scaler = MinMaxScaler() In [9]: X_train =...
Read More

Choosing the right number of layers/neurons for a Neural Network (Python)

This seems to be a very confusing subject for most, and I've had difficulty while learning how to setup Keras NN models as the addition/subtraction of layers and neurons creates vastly different outcomes in model. Normally I wouldn't just link out to others, but there is a very well written synopsis found on StackExchange below that lays it out in a very simple fashion. Very brief summary: Input (first) layer: Neurons = Number of features in the datasetHidden layer(s): Neurons = Somewhere between 1 and the amount in the input later (take the mean); Number of hidden layers: 1 works for *most* applications, maybe none.Output (last) layer: exactly 1 unless it's a classification problem and you utilize the softmax activation, in which case the number equals the number of classes you are predicting https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw Meaning in the case of a dataset with 20 features: #Example Keras Binary Classification model model = Sequential() model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='adam') #Example Keras Multi-Class model model = Sequential() model.add(Dense(20, activation='relu')) model.add(Dense(10, activation='relu')) model.add(Dense(3, activation='softmax')) #If I...
Read More

Keras and Tensorflow Pt II – Regression Example

This is a more complex example of Keras, utilizing Regression. This utilizes a good sized dataset from Kaggle, but does contain a little bit of data cleansing before we can build out the model. Unfortunately the model we end up building isn't perfect and requires more tuning or some final dataset alterations, but it's a good example none the less. More information below. Keras / Tensorflow Regression - ExampleHere we're going to attempt to utilize Keras/Tensorflow to predict the price of homes based upon a set of features. The data being used comes from Kaggle: https://www.kaggle.com/harlfoxem/housesalesprediction Imports In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns Data Exploration and Cleansing In [2]: df = pd.read_csv('DATA/kc_house_data.csv') Since we're going to predict prices, we can do a quick distribution. We can see the vast majority sit around 500k, and we have some outliers all the way out to 7m+ (but very few). In [3]: plt.figure(figsize=(15,6)) sns.distplot(df['price']) Out[3]: <AxesSubplot:xlabel='price'> One thing we may want to do is get rid of these outliers (at least to an...
Read More