This is another great little exercise I completed that shows some basic to advanced analysis of stock ticker data. The instructions were written several years ago so the dataset used no longer matches when I pull today. This means that some of the charts are not showing the intended data, and the authors comments on peaks/lows may not be seen. It’s hard to say why the data they originally used is different since I don’t have it. However – this really isn’t about the data anyway, but more of how to perform some more advanced analytics and visualizations with Python.
Finance Data Project¶
In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and pandas skills, it is not meant to be a robust financial analysis or be taken as financial advice.
We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.
Get the Data¶
In this section we will learn how to use pandas to directly read data from Google finance using pandas!
First we need to start with the proper imports, which we've already laid out for you here.
Note: You'll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (pip install pandas-datareader), or just follow along with the video lecture.
The Imports¶
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline
Data¶
We need to get data using pandas datareader. We will get stock information for the following banks:
- Bank of America
- CitiGroup
- Goldman Sachs
- JPMorgan Chase
- Morgan Stanley
- Wells Fargo
Figure out how to get the stock data from Jan 1st 2006 to Jan 1st 2016 for each of these banks. Set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:
- Use datetime to set start and end datetime objects.
- Figure out the ticker symbol for each bank.
- Figure out how to use datareader to grab info on the stock.
Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use google finance as a source, for example:
# Bank of America
BAC = data.DataReader("BAC", 'yahoo', start, end)
WARNING: MAKE SURE TO CHECK THE LINK ABOVE FOR THE LATEST WORKING API.¶
#Define Start/End dates, and ticker names
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2016, 1, 1)
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
# Read in the stock symbols from the source of 'stooq', acording to https://pandas-datareader.readthedocs.io/en/latest/remote_data.html
bank_stocks = data.DataReader(tickers, 'yahoo', start, end)
# The column indices come in at Attributes / Symbols and I want to reverse them
bank_stocks.columns = bank_stocks.columns.swaplevel(0, 1)
bank_stocks.sort_index(axis=1, level=0, inplace=True)
bank_stocks.head()
Set the column name levels to more appropriate names
bank_stocks.columns.names = ['Bank Ticker', 'Stock Info']
Check the head of the bank_stocks dataframe.
bank_stocks.head()
Data Exploration¶
Let's explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs. Reference the solutions if you can not figure out how to use .xs(), since that will be a major part of this project.
What is the max Close price for each bank's stock throughout the time period?
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank's stock. returns are typically defined by:
$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$returns = pd.DataFrame()
We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set's it as a column in the returns DataFrame.
for ticker in tickers:
returns[ticker + " Return"] = bank_stocks[ticker]['Close'].pct_change()
returns.dropna().head()
Create a pairplot using seaborn of the returns dataframe. What stock stands out to you? Can you figure out why?
import seaborn as sns
sns.pairplot(returns[1:]) #Exclude first row
Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?
returns.idxmin()
#Returns the min index value for every column, date is the index
returns.idxmax()
#Returns the max index value for every column, date is the index
Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?
returns.std()
returns.loc['2015-01-01':'2015-12-31'].std()
#Same as a between in SQL
Create a distplot using seaborn of the 2015 returns for Morgan Stanley
sns.distplot(returns.loc['2015-01-01':'2015-12-31']['MS Return'][1:],bins=100)
#Start from 1 to skip NaN
Create a distplot using seaborn of the 2008 returns for CitiGroup
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'][1:],bins=100)
#Start from 1 to skip NaN
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
Create a line plot showing Close price for each bank for the entire index of time. (Hint: Try using a for loop, or use .xs to get a cross section of the data.)
bank_stocks.xs(key='Close',axis=1,level='Stock Info').plot(figsize=(12,5))
plt.show()
Moving Averages¶
Let's analyze the moving averages for these stocks in the year 2008.
Plot the rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008
bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31']['BAC'].plot(label='BAC Close', figsize=(12,5))
bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31']['BAC'].rolling(30).mean().plot(label='30 Day Avg', figsize=(12,5))
plt.legend()
plt.show()
#This one is very tricky...
#First I use the multi index function to get the BAC info
#Then the Loc between dates
#And I make use of the rolling function
Create a heatmap of the correlation between the stocks Close Price.
plt.figure(figsize=(8, 5))
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31'].corr(),annot=True)
plt.show()
Optional: Use seaborn's clustermap to cluster the correlations together:
sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').loc['2008-01-01':'2008-12-31'].corr(),annot=True)
plt.show()
Definitely a lot of more specific finance topics here, so don't worry if you didn't understand them all! The only thing you should be concerned with understanding are the basic pandas and visualization oeprations.