The second tutorial is using two UCI cycing data sets. They are two CSVs.
The questions we use for this tutorial are based on two different years of UCI Men's Cyclo-Cross dataset, which can be found here and clicking individual rankings.
Note: At PandasZoo we use single quotes for our answers.
This is a sample of what the data set looks like:
Rank | UCI ID | Name | Nationality | Team Code | Age | Points | year | type | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 10007946203 | VAN DER POEL Mathieu | NETHERLANDS | CCU | 24 | 2600 | 2019 | cyclocross |
1 | 2 | 10007585986 | VAN AERT Wout | BELGIUM | NaN | 25 | 2452 | 2019 | cyclocross |
2 | 3 | 10007586087 | AERTS Toon | BELGIUM | TFL | 26 | 2415 | 2019 | cyclocross |
3 | 4 | 10007155651 | VANTHOURENHOUT Michael | BELGIUM | MNG | 26 | 1610 | 2019 | cyclocross |
4 | 5 | 10006118660 | VAN DER HAAR Lars | NETHERLANDS | TFL | 28 | 1431 | 2019 | cyclocross |
Import the Pandas module.
Hint: We put in an example answer that you should try typing in.
Read in the uci_cyclocross_rankings_historicals.csv data set.
When you read in the dataframe, call it cyclo. No need to include a path to the file.
Hint: We refer to Pandas as pd. You can find official documentation for read_csv here
Use the head function to look at the cyclo DataFrame.
Create a dataframe called cyclo_2018 which subsets cyclo for only rows where the 'year' column is 2018.
Similar to question 4, create a dataframe called cyclo_2019 which subsets cyclo for only rows where the 'year' column is 2019.
Use a left merge to merge cyclo_2018 with cyclo_2019. In other words, merge cyclo_2018 to cyclo_2019 (left side) and call the dataframe cyclo_both
Use 'UCI ID' as the variable to merge on.
Hint: we refer to Pandas as pd and put 'how' before 'on' in the merge function.
Let's append cyclo_2018 with cyclo_2019 into a dataframe called cyclo_append
Hint: Append cyclo_2019 to cyclo_2018 dataframe
Let's use the dataframe cyclo_both from eariler to only view riders whom have Points_x greater than Points_y using the head function.
In other words lets preview the data using the head function to see what riders earned more points in 2019 versus 2018.