Kaggle Submission: Titanic

I've already briefly done some work in the dataset in my tutorial for Logistic Regression - but never in entirety. I decided to re-evaluate utilizing Random Forest and submit to Kaggle. In this dataset, we're utilizing a testing/training dataset of passengers on the Titanic in which we need to predict if passengers survived or not (1 or 0). Titanic Dataset - Kaggle SubmissionYou can view my Kaggle submissions here. Initial Imports In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline Data In [2]: #Import training data df_test = pd.read_csv('test.csv') df_train = pd.read_csv('train.csv') In [3]: df_test.head() Out[3]: PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q 1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S 2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q 3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S 4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S In [4]: df_train.head() Out[4]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S Cleanse Missing Data In [5]: #We have some missing data here, so we're going to map out where NaN exists #We might be able to take care of age, but cabin is probably too bad to save sns.heatmap(df_train.isnull(),yticklabels=False,cbar=False,cmap='viridis') Out[5]: <matplotlib.axes._subplots.AxesSubplot at...
Read More