Codingassignmenthelper | Home Codingassignmenthelper | University

FIT5145 Assignment1 Data Science

Assignment :

The aim of this assignment is to investigate and visualise data using various data science tools. It will test your ability to: read data files in Python and extract related data from those files; wrangle and process data into the required formats; use various graphical and non-graphical tools to performing exploratory dataanalysis and visualisation; communicate your findings in your report.
You will need to submit two files: A report in PDF containing your answers to all the questions. Note that you can useWord or other word processing software to format your submission. Just save the finalcopy to a PDF before submitting. Make sure to include screenshots/images of thegraphs you generate in order to justify your answers to all the questions. The Python code as a Jupyter notebook file that you have written to analyse and plot thedata

Tasks:

There are three tasks (A, B & C) in this assignment. Task C is “Optional” task forhigher credit. Students that complete only tasks A and B can only get a maximumof Distinction.> Students who attempt task C can achieve a higher grade by demonstrating criticalanalysis skills and a deeper understanding of the task. You need to use Python to complete the tasks.

Task A: Investigating Population and Gender Equality in Education

In the task, you are required to visualise the relationship between the population in different countries, the income in different countries and the gender ratio (women % men, 25 to 34 years) in schools of different countries, and gain insights from how these relations and trends change over time. The data files used in this task were originally downloaded from Gapminder. We have extracted the data from the original files and put into a simpler format. Please download the data fromMoodle: Population.csv: This file contains yearly data regarding the estimated resident population,grouping by countries around the world, between 1800 and 2018. GenderEquality.csv: This data file contains yearly data about the ratio of female to malenumber of years in school, among 25- to 34-years-olds, including primary, secondary andtertiary education across different countries around the world, for the period between1970 and 2015. Income.csv: This data file contains yearly data of income per person adjusted for differences in purchasing power (in international dollars) across different countries around the world, for the period between 1800 and 2018.

A1. Investigating the Gender Equality Data

Have a look at the gender equality data. Use Python to plot the gender ratio (women % men) in schools for Australia, China and United States over time. What are the maximum and minimum values for gender ratio in Australia over the time period? How do you compare the trend in gender ratio (women % men) in schools for these three countries over the time period? Which two countries have similar growth trend?

A2. Visualising the Relationship over Time

Have a look at the relationship between gender ratio in schools and income over time Use Python to build a Motion Chart comparing the gender ratio in schools, the income, and the population of each country over time. The motion chart should show the gender ratio in schools on the x-axis, the income on the y-axis, and the bubble size should depend on the population. (HINT: A Jupyter notebook containing a tutorial on building motion charts in Python is available here.) Run the visualisation from start to finish. (Hint: In Python, to speed up the animation, set timer bar next to the play/pause button to the minimum value.) And then answer the following questions: Which two countries generally have the lowest gender ratio (women % men) in schools? Select Cape Verde and Bolivia for this question: From which year onwards does Cape Verde start to have a higher gender ratio and a higher income from Bolivia. Please support your answer with a relevant python code and motion chart. Is there generally a relationship between the amount of income and gender ratio (women % men) in schools in all countries during the whole period of time? What kind of relationship? Explain your answer. Any other interesting things you notice in the data? Please support your answer with relevant python code and/or motion chart.

Task B: Exploratory Analysis on Big Data

In this part, you are required to do some exploratory analysis on the health insurance marketplace data. The file InsuranceRates.csv.zip contains data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace. This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). The data was then published on Kaggle. The file we provide is an extract from the data on Kaggle. Unzipped, the file is over 500MB and contains the following fields

BusinessYear : Year for which plan provides coverage to enrollees.
StateCode : Two-character state abbreviation indicating the state where the plan is offered.
IssuerId : Five-digit numeric code that identifies the issuer organization in the Health Insurance Oversight System (HIOS).
PlanId : Fourteen-character alpha-numeric code that identifies an insurance plan within HIOS.
Age : Categorical indicator of whether a subscriber's age is used to determine rate eligibility for the insurance plan.
IndividualRate : Dollar value for the monthly insurance premium cost applicable to a non-tobacco user for the insurance plan in a rating area, or to a general subscriber if there is no tobacco preference.
IndividualTobaccoRate : Dollar value for the monthly insurance premium cost applicable to a tobacco user for the insurance plan in a rating area.

Load the InsuranceRates.csv data in Python and answer the following questions:

B1. How many years does the data cover? (Hint: pandas provides functionality to see 'unique' values.) What are the possible values for 'Age'? What are the average, maximum and minimum values for the monthly insurance premium cost for an individual? Do those values seem reasonable to you?


B2. Variation in Costs over Time and with Age

Generate boxplots (or other plots) of insurance costs versus year and age to answer the following questions: Are insurance policies becoming cheaper or more expensive over time? Is the median insurance cost increasing or decreasing? How does insurance costs vary with the age of the person being insured? (Hint: filter out the value 'Family Option' before plotting the data.) In terms of median cost, do older people pay more or less for insurance than younger people? How much more/less to they pay?

Task C: Exploratory Analysis on Other Data

(Note: This additional task is for those students wishing to get higher grades for their assessment. It is not required to pass the assignment, but it is required to get higher credit.)

Find some publicly available data and repeat some of the analysis performed in Tasks A and B above. Good sources of data are government websites, such as: data.gov.au, data.gov, data.gov.in, data.gov.uk, ...

Please note that your analysis should at least contain visualisation, interpretation of your visualisation and a prediction task.

To Download Click Here > FIT5145 Assignment1 Data Science.pdf
Codingassignmenthelper | Home Codingassignmenthelper | Home