This is another great little exercise I completed that shows some basic to advanced analysis of stock ticker data. The instructions were written several years ago so the dataset used no longer matches when I pull today. This means that some of the charts are not showing the intended data, and the authors comments on peaks/lows may not be seen. It’s hard to say why the data they originally used is different since I don’t have it. However – this really isn’t about the data anyway, but more of how to perform some more advanced analytics and visualizations with Python.

Finance Data Project

In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and pandas skills, it is not meant to be a robust financial analysis or be taken as financial advice.

We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.

Get the Data

In this section we will learn how to use pandas to directly read data from Google finance using pandas!

First we need to start with the proper imports, which we've already laid out for you here.

Note: You'll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (pip install pandas-datareader), or just follow along with the video lecture.

The Imports

In [1]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline

Data

We need to get data using pandas datareader. We will get stock information for the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

Figure out how to get the stock data from Jan 1st 2006 to Jan 1st 2016 for each of these banks. Set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:

  1. Use datetime to set start and end datetime objects.
  2. Figure out the ticker symbol for each bank.
  3. Figure out how to use datareader to grab info on the stock.

Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use google finance as a source, for example:

# Bank of America
BAC = data.DataReader("BAC", 'yahoo', start, end)


In [2]:
#Define Start/End dates, and ticker names
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2016, 1, 1)
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
In [3]:
# Read in the stock symbols from the source of 'stooq', acording to https://pandas-datareader.readthedocs.io/en/latest/remote_data.html
bank_stocks = data.DataReader(tickers, 'yahoo', start, end)
# The column indices come in at Attributes / Symbols and I want to reverse them
bank_stocks.columns = bank_stocks.columns.swaplevel(0, 1)
bank_stocks.sort_index(axis=1, level=0, inplace=True)
bank_stocks.head()
Out[3]:
Symbols BAC C ... MS WFC
Attributes Adj Close Close High Low Open Volume Adj Close Close High Low ... High Low Open Volume Adj Close Close High Low Open Volume
Date
2006-01-03 35.054218 47.080002 47.180000 46.150002 46.919998 16296700.0 440.882477 492.899994 493.799988 481.100006 ... 58.490002 56.740002 57.169998 5377000.0 20.408512 31.900000 31.975000 31.195000 31.600000 11016400.0
2006-01-04 34.681938 46.580002 47.240002 46.450001 47.000000 17757900.0 432.742950 483.799988 491.000000 483.500000 ... 59.279999 58.349998 58.700001 7977800.0 20.171795 31.530001 31.820000 31.365000 31.799999 10870000.0
2006-01-05 34.726604 46.639999 46.830002 46.320000 46.580002 14970700.0 434.889679 486.200012 487.799988 484.000000 ... 58.590000 58.020000 58.549999 5778000.0 20.149401 31.495001 31.555000 31.309999 31.500000 10158000.0
2006-01-06 34.674480 46.570000 46.910000 46.349998 46.799999 12599800.0 434.889679 486.200012 489.000000 482.000000 ... 58.849998 58.049999 58.770000 6889800.0 20.267757 31.680000 31.775000 31.385000 31.580000 8403800.0
2006-01-09 34.696808 46.599998 46.970001 46.360001 46.720001 15619400.0 432.832489 483.899994 487.399994 483.000000 ... 59.290001 58.619999 58.630001 4144500.0 20.264565 31.674999 31.825001 31.555000 31.674999 5619600.0

5 rows × 36 columns

Set the column name levels to more appropriate names

In [4]:
bank_stocks.columns.names = ['Bank Ticker', 'Stock Info']

Check the head of the bank_stocks dataframe.

In [5]:
bank_stocks.head()
Out[5]:
Bank Ticker BAC C ... MS WFC
Stock Info Adj Close Close High Low Open Volume Adj Close Close High Low ... High Low Open Volume Adj Close Close High Low Open Volume
Date
2006-01-03 35.054218 47.080002 47.180000 46.150002 46.919998 16296700.0 440.882477 492.899994 493.799988 481.100006 ... 58.490002 56.740002 57.169998 5377000.0 20.408512 31.900000 31.975000 31.195000 31.600000 11016400.0
2006-01-04 34.681938 46.580002 47.240002 46.450001 47.000000 17757900.0 432.742950 483.799988 491.000000 483.500000 ... 59.279999 58.349998 58.700001 7977800.0 20.171795 31.530001 31.820000 31.365000 31.799999 10870000.0
2006-01-05 34.726604 46.639999 46.830002 46.320000 46.580002 14970700.0 434.889679 486.200012 487.799988 484.000000 ... 58.590000 58.020000 58.549999 5778000.0 20.149401 31.495001 31.555000 31.309999 31.500000 10158000.0
2006-01-06 34.674480 46.570000 46.910000 46.349998 46.799999 12599800.0 434.889679 486.200012 489.000000 482.000000 ... 58.849998 58.049999 58.770000 6889800.0 20.267757 31.680000 31.775000 31.385000 31.580000 8403800.0
2006-01-09 34.696808 46.599998 46.970001 46.360001 46.720001 15619400.0 432.832489 483.899994 487.399994 483.000000 ... 59.290001 58.619999 58.630001 4144500.0 20.264565 31.674999 31.825001 31.555000 31.674999 5619600.0

5 rows × 36 columns

Data Exploration

Let's explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs. Reference the solutions if you can not figure out how to use .xs(), since that will be a major part of this project.

What is the max Close price for each bank's stock throughout the time period?

In [6]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Out[6]:
Bank Ticker
BAC     54.900002
C      564.099976
GS     247.919998
JPM     70.080002
MS      89.300003
WFC     58.520000
dtype: float64

Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank's stock. returns are typically defined by:

$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$
In [7]:
returns = pd.DataFrame()

We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set's it as a column in the returns DataFrame.

In [8]:
for ticker in tickers:
    returns[ticker + " Return"] = bank_stocks[ticker]['Close'].pct_change()
In [9]:
returns.dropna().head()
Out[9]:
BAC Return C Return GS Return JPM Return MS Return WFC Return
Date
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.001110
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005874
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 -0.000158
2006-01-10 -0.008369 0.003100 0.012578 0.001475 0.000507 -0.000789

Create a pairplot using seaborn of the returns dataframe. What stock stands out to you? Can you figure out why?

In [10]:
import seaborn as sns
sns.pairplot(returns[1:]) #Exclude first row
Out[10]:
<seaborn.axisgrid.PairGrid at 0x1d2d4d8bec8>

Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?

In [11]:
returns.idxmin()
#Returns the min index value for every column, date is the index
Out[11]:
BAC Return   2009-01-20
C Return     2009-02-27
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]
In [12]:
returns.idxmax()
#Returns the max index value for every column, date is the index
Out[12]:
BAC Return   2009-04-09
C Return     2008-11-24
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]

Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?

In [13]:
returns.std()
Out[13]:
BAC Return    0.036647
C Return      0.038672
GS Return     0.025390
JPM Return    0.027667
MS Return     0.037819
WFC Return    0.030238
dtype: float64
In [14]:
returns.loc['2015-01-01':'2015-12-31'].std()
#Same as a between in SQL
Out[14]:
BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

Create a distplot using seaborn of the 2015 returns for Morgan Stanley

In [15]:
sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'][1:],bins=100)
#Start from 1 to skip NaN
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d2d689bcc8>

Create a distplot using seaborn of the 2008 returns for CitiGroup

In [16]:
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'][1:],bins=100)
#Start from 1 to skip NaN
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d2d7f54bc8>

More Visualizations

Imports

In [69]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

Create a line plot showing Close price for each bank for the entire index of time. (Hint: Try using a for loop, or use .xs to get a cross section of the data.)

In [79]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').plot(figsize=(12,5))
plt.show()

Moving Averages

Let's analyze the moving averages for these stocks in the year 2008.

Plot the rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008

In [80]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31']['BAC'].plot(label='BAC Close', figsize=(12,5))
bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31']['BAC'].rolling(30).mean().plot(label='30 Day Avg', figsize=(12,5))
plt.legend()
plt.show()

#This one is very tricky...
#First I use the multi index function to get the BAC info
#Then the Loc between dates
#And I make use of the rolling function

Create a heatmap of the correlation between the stocks Close Price.

In [81]:
plt.figure(figsize=(8, 5))
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31'].corr(),annot=True)
plt.show()

Optional: Use seaborn's clustermap to cluster the correlations together:

In [82]:
sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31'].corr(),annot=True)
plt.show()

Definitely a lot of more specific finance topics here, so don't worry if you didn't understand them all! The only thing you should be concerned with understanding are the basic pandas and visualization oeprations.