The fourth tutorial is reviewing and learning functions useful for being an analyst that not have already been covered. The questions are based on a wine quality data set.
Note: At PandasZoo we use single quotes for our answers.
This is a sample of what the data set looks like:
fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
1 | 7.8 | 0.88 | 0.00 | 2.6 | 0.098 | 25.0 | 67.0 | 0.9968 | 3.20 | 0.68 | 9.8 | 5 |
2 | 7.8 | 0.76 | 0.04 | 2.3 | 0.092 | 15.0 | 54.0 | 0.9970 | 3.26 | 0.65 | 9.8 | 5 |
3 | 11.2 | 0.28 | 0.56 | 1.9 | 0.075 | 17.0 | 60.0 | 0.9980 | 3.16 | 0.58 | 9.8 | 6 |
4 | 7.4 | 0.70 | 0.00 | 1.9 | 0.076 | 11.0 | 34.0 | 0.9978 | 3.51 | 0.56 | 9.4 | 5 |
Import the Pandas module.
Hint: We put in an example answer that you should try typing in.
Read in the wine quality data set. No need for a full path. The file is called winequality-red.csv
When you read in the data and make a dataframe object, call it wine.
Hint: We refer to Pandas as pd. You can find official documentation for read_csv here
Use the head function to look at the wine DataFrame.
Sort the wine DataFrame by alcohol where ascending is false. Overwrite the wine DataFrame with this sorting.
Dang, there is wine out there that is 14% or higher, let's make a column for anything greater than or equal to 14% and call the column strong_wine.
Make the new column have a true/false, so no need for any if logic.
Hint, the data is already multipled by 100
Let's take a look at wines that are True for the 'strong_wine' column and have a pH of 3.68. Use the head function to see these.
Let's use regex filtering to look at only the pH column.
Use the head function so we don't need to save an object.
Let's use seaborn to plot quality on the x axis and alcohol on the y axis. Make a linear fit using fit_reg=True
.
lmplot is the function we want to use.
Hint: Assume I ran the follwing code already:
import seaborn as sns