Pandas Python Tutorial: Learn Pandas and Python in stages
Pandas Python Tutorial: Learn Pandas and Python in stages
The first tutorial is using the Titanic data set. It is an Excel file called titanic3.xls. The only tab is called titanic3.
The questions we use for this tutorial are based on the titanic3 dataset, which can be found here
Note: At PandasZoo we use single quotes for our answers.
This is a sample of what the titanic3 data set looks like:
pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NaN | St Louis, MO |
1 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NaN | Montreal, PQ / Chesterville, ON |
2 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | NaN | Montreal, PQ / Chesterville, ON |
3 | 1 | 0 | Allison, Mr. Hudson Joshua Creighton | male | 30.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | 135.0 | Montreal, PQ / Chesterville, ON |
4 | 1 | 0 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | NaN | Montreal, PQ / Chesterville, ON |
Import the Pandas module.
Hint: We put in an example answer that you should try typing in.
Import the NumPy module.
Import the matplotlib.pyplot module.
Read in the titanic3.xls data set. The only tab is called titanic3. Make sure NAs are labeled 'NA'.
When you read in the dataframe, call it titanic_df. Also, make sure index_col is set to None.
The order we want the read_excel function is: filename, tab name, index_col, and na_values.
Hint: We refer to Pandas as pd. You can find official documentation for read_excel here
Use the head function to look at the titanic_df DataFrame.
Use the describe function to learn more about the titanic_df dataset.
Drop the 'ticket', 'cabin', 'boat', and 'body' columns in this order from the titanic_df dataframe.
Let's make a bar plot using Pandas on the 'survived' column using value counts.
Hint: we refer to Pandas as pd
Let's have a look at the mean of the people that survived, i.e the 'survived' column.
Hint: Don't make an object.
Let's group our data by the sex of the passenger and see what the means are.
Hint: Don't make an object.
Let's group the data by sex and class of the passenger in that order and see what the means are.
Hint: Don't make an object.
Another Hint: 'pclass' is the class column.
Let's look at the sex and the class in but only look at those under 18 years old and see what the means are.
Hint: Don't make an object and do this in one line of code.
Hint: The group by order should be sex then class.