Just a little example on how to use Decision Trees and Random Forests in Python. Basically - tress are a type of "flow-chart", a decision tree, with nodes/edges to determine a likely outcome. Given that trees can be very difficult to predict, we can utilize Random Forests to take specific features/variables to build multiple decision trees and then average the results. This is a very very brief and vague explanation - as these posts are meant to be quick shots of how-to code and not to teach the theory behind the method. In this example we're utilizing a small healthcare dataset that predicts if spinal surgery for an individual was successful to help a particular condition (Kyphosis). Both methods are shown to see differences in accuracy. The dataset can be found on my github here.
...
A great simple example on how to deal with Logistic Regression in Python utilizing Matplotlib, Seaborn, and Scikit-learn. The data is pulled from Kaggle.com and found here. Normally this comes as a test and train set, but I'm analyzing the training set only to see the accuracy of the model.
The data here is a little dirty and necessitated some data cleansing first, probably worthy of another short tutorial / post in terms of some other methods of initial cleansing before working on data...
...
A great simple example on how to deal with Linear Regression in Python utilizing Matplotlib, Seaborn, and Scikit-learn. The data is fake, only to be used as an example and found here.
...
I normally don't share little strange bits of unfinished code - but I thought this was pretty cool. I was messing around with coding a little "fish tank" game in Python utilizing Pygame. One thing I'm bad about doing is attempting to make everything modular - which is why my Text Adventure game turned out to be a Text Adventure Engine. I wanted to utilize Pygame, but I wanted to load in sprite sheets in a predictable manner. In this case I found some sprites that are generally utilized for RPG Maker that always contain a predictable pattern (Multiple sprites on the same sheet, differing only by color). Given that there are so many available due to its popularity, it also makes things easy to play with. I believe there are licensing problems here, but this is just for personal use as an example. You could utilize any sheets that are uniform in size. Here's a little example of one...
This is another great little exercise I completed that shows some basic to advanced analysis of stock ticker data. The instructions were written several years ago so the dataset used no longer matches when I pull today. This means that some of the charts are not showing the intended data, and the authors comments on peaks/lows may not be seen. It's hard to say why the data they originally used is different since I don't have it. However - this really isn't about the data anyway, but more of how to perform some more advanced analytics and visualizations with Python.
...
UPDATE 2020/08/10: I believe there is a more reliable method here. The method below still does work, but I've encountered too many problems to continue on using it at the moment...
This really isn't a standard Python tip, but interesting to those folks running WordPress sites that may want it AND are hitting problems doing so. You can end up embedding a Jupyter notebook out of github directly (sort of) and it looks wonderful - see post below for an example:
https://www.mikekale.com/analytics-use-case-basic-pandas-i-o-matplotlib-seaborn/
There is a great little post here on how to do it:
https://www.eg.bucknell.edu/~brk009/notebook-on-wp/
Ultimately it points to the author of a little plugin here:
https://www.andrewchallis.co.uk/portfolio/php-nbconvert-a-wordpress-plugin-for-jupyter-notebooks/
These instructions will *probably* work for most of you guys, however I host with Nearlyfreespeech.net, whom is a great hosting company but they are very security conscious. If you're looking for Wordpress/Themes/Plugins to auto update - it's not the spot. Most of my work on the backend is done in SSH, and getting the right permissions/groups/owners on the files and...
Attached is a great practical exercise I completed that shows you how to perform basic I/O utilizing Pandas, and data visualization utilizing Matplotlib and Seaborn. Along with seeing how to load the data, you can see some great basic functions of utilizing Pandas for data cleansing such as adding columns or groupby's. Several basic lambda functions are also shown, which can be a bit mind bending at first for those who haven't dealt with them before.
...
Something very important to me in particular as I utilize SQL Server quite a bit. Utilizing pyodbc and Pandas you can bring your data in.
Required Libraries
conda install pyodbc #Anaconda installation
#pip install pyodbc #Alternative
Here we can try altering to read from the AdventureWorks2017 database in SQL Server, as an example (replacing your server_name):
import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'
'Server=server_name;'
'Database=AdventureWorks2017;'
'Trusted_Connection=yes;')
sql_query = pd.read_sql_query('SELECT * FROM Person.person',conn)
sql_query.head()
Notice this gets thrown out as a dataframe:
Similarly - we could also write back to SQL Server if we wanted to utilizing slightly altered logic (abbreviated insert given how many...
Section breakdown:
Import/Write CSVImport ExcelImport HTML (scraping)
Import/Write CSV
import pandas as pd
df = pd.read_csv('example') #import csv
df.to_csv('My_output',index=False) #write to csv, don't include the index column
Import/Write to Excel
import pandas as pd
pd.read_excel('Excel_Sample.xlsx',sheet_name='Sheet1') #import Excel
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1') #write to Excel
Import HTML
Required Libraries, assuming Anaconda is installed
conda install lxml
conda install html5lib
conda install BeautifulSoup4
In this case I'm going to reference a table found below on the FDIC.gov website:
https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/
import pandas as pd
data = pd.read_html('https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/')
Notice that this reads in every table it can find within the website as a list of dataframes, you can explore the tables it picked up by viewing data[0], data[1], etc.
data[0].head()
...