Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

Python代写|QBUS6860 – Individual Assignment 1

Python代写|QBUS6860 – Individual Assignment 1




This assignment has been designed to help students develop basic skills in data visualization and to allow students to practice techniques learned in lecture and tutorial.

Key Admin Information


a.ONE writtenreport  (word  or  pdf format,  through  Canvas-  Assignment  1 Report Submission).

b.SEVERAL Python “.py” or Jupyter Notebook “ ipynb” files  (through Canvas- Assignment 1 – Upload Your Program Code Files).

2.Thelate penalty for the assignment is 5% of the assigned mark per calendar day, starting after 4pm on the due date. The closing date Monday 11 April 2022, 4:00 pm is the last date on which an assessment will be accepted for marking.

3.Length: Themain text of your  report  (including  everything  except  for  possible appendices) should have a maximum of 10 pages in normal 12 point fonts and single line. For each Task, you should write a sufficient and complete report with necessary plots based on your visualization, methodology, analysis, insight and limitations, etc, when possible.

4.Numberswith decimals should be reported to the Fourthdecimal point in the report.

5.Ifyou wish to include additional material, you can do so by creating an appendix. There is no page limit for the appendix. Keep in mind that making good use of your audience’s time is an essential business skill. Every sentence, table or figure has to count. Extraneous and/or wrong material will potentially affect your mark.

6.Anonymousmarking:  Given  the  anonymous  marking  policy  of the  University, please only include your student ID (SID) in the submitted report, and do NOT include your name. The file name of your report should follow the following format. Replace “XXXX” with your SID in, for example, QBUS6860_2021S1_SIDXXXXX.pdf

7.Presentationof the assignment is part of the assessment. Markers will assign up to 10% marks for clarity of writing and presentation.

8.ForTurnitin to check your code, please copy and paste your codes into Appendix. Code  should  be  formatted  by  equal  width  fonts  such  as  Courier New or Consola.

If your programs are in py file, simply copy and paste into the report Appendix.  If you are using Jupyter Notebook, please follow InstructionPY to convert it to “ py”


Key Rules

  • Carefully read the requirements for each part of the assignment.
  • Please follow any further instructions announced on Canvas.
  • You must use Python for the assignment.
  • Reproducibility is fundamental in data analysis, so that you make sure you suggest the right Python py file or Jupyter Notebook ipynb files that generate the results in your report. Markers will run your program for checking.
  • The University of Sydney takes plagiarism very SERIOUSLY. Please be warned that
  • plagiarism between individuals/groups is always obvious to the markers and can be
  • easily detected by Turnitin.
  • Not submitting your code will lead to a loss of 50% of the assignment marks.
  • Failure  to read  information  and  follow  instructions may  lead to  a  loss  of marks. Furthermore, note that it is your responsibility to be informed of the University of Sydney and Business School rules and guidelines, and follow them.
  • Referencing:  Business  School  recommends  APA  Referencing  System.  (You  may find the details at:  )
  • Feedback will be provided on the marked submission.


Task A (40 Marks)

This task is designed for you to practice your skills in conducting basic Visual Data Analytics (VDA) and Exploratory Data Analysis (EDA).


The COVID- 19 pandemic in Australia is part of the ongoing worldwide pandemic of the coronavirus  disease  2019  (COVID- 19)  caused  by  severe  acute  respiratory  syndrome coronavirus 2 (SARS-CoV-2). The first confirmed case in Australia was identified on 25 January 2020,  in Victoria (from 19_pandemic_in_Australia). Since then the Australian Federal Government has collected data for the COVID- 19 pandemic in Australia. The data is useful in making decision on public policies by all type agencies.

Resources a place to get the updated Covid- 19 Data for Australia, which is from the Australian Federation Government ( alerts/covid-19) and State Government Health agencies.  You can download the dataset as described in, or from Matt Bolton’s GitHub repository

A copy of the dataset has been on Canvas for your convenience, but you are encouraged to download the most recently updated data from the above GitHub site directly.  The data files are all in csv format. It is easy to identify the meaning of each column in each file.


You are receiving 2 visualisation types at random (e.g., your randomly selected types could be  violin  and  scatterplot  or  histogram  and  bubble plot,  etc.).  Please check the list file QBUS6860_Assignment01_RandomTask.xlsx for your assigned visualization type by using your Student ID. This is file on Canvas along with this document.

  1. [8Marks] Playwith all the dataset files, report and explain all the statistics, such as the total positive COVID- 19 cases so far etc.
  2. [12Marks] Useyour two randomly assigned visualisation types to analyse the data (you may use other types in addition to the types you are assigned, but you must use your assigned types). For example, you were assigned histogram and bubble plot but you think that the data could be better represented using a stream graph. You may use stream graph in addition to histogram and bubble plot, but you must use at least histogram and bubble plot in your analysis. If an assigned type is not appropriate for this set of data, please explain the reason.

Always  keep  in  mind  the  visual  presentation  should  be  meaningful  and  visually pleasing.

  1. [10Marks] Conductappropriate analysis and report your insights. You shall consider this task as challenging.
  2. [5Marks]  Summarise your  conclusion  on  for  example  whether  data  is  in  good quality, what else information can be collected, so to put forwards your suggestion.

Note:  The other 5 marks are allocated for presentation quality


Task B (60 Marks)

Finding  ICLR2022  (  Authors  Affiliation(s)  and  Email Address(es) from OpenReview site This task is designed for you to apply techniques in data management and EDA.


The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to  the  advancement of the branch  of artificial  intelligence  called representation learning, but generally referred to as deep learning. ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas  such  as  machine/computer  vision,  computational  biology,  speech  recognition,  text understanding, gaming, and robotics.


You may re-use part of tutorial codes and revise it for your purpose here.


  1. [5Marks] Acquire ICLR2022 authors ids each of which is either an OpenReview ID or an email address of an author.   You may rely on some code snippets from Tutorial 3.
  2. [12Marks:Challenging] Write Python code to extract all the authors profiles. As shown in Tutorial 3, each author has an ID on OpenReview site (or email address). You need to get IDs for all ICLR2022 authors. Then an author profile can be accessed like where  ~Junbin_Gao1 is called author ID (username). On a sample page, locate where Author Affiliation and Email Address is, then try to write your own web crawler to get this information for all ICLR2022 authors.

Warning:  prepare  to  wait  for  getting  all  the  information  after  you  deploy  your crawler.

  1.  [12Marks:Challenging] Explore and report some statistics, such as the total number of authors,  how  many  missing  values  for  their  affiliations  or  emails,  how  many different affiliations, where are authors from etc.

Note  1:  Generally  speaking,  each  appearance  of  an  author  ID  means  a  paper submission. It is possible to tell how many papers an author submitted and how many papers from a particular organisation.

Note 2:   Openreview captures all the emails  for the organisations with which an author is associated or/and was associated.  I suggest you use the first email address in their email list as an author’s current affiliation.

Note 3: As there is no country information collected in author profile, you may need to rely on email domain to map to a country, for example, from we know  au is the code for Australia. But people may use  some common email domains such as  [‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’,   ‘msn .com’,   ‘’,   ‘’, ‘’].  In this  case, please take the  following  strategy:  (1)  if such  as  a common  email  address  appears  as  an  author’s  first  email  address, then  check the second email address to identify the countr; (2) if such a common email address is the only email  address  for the author,  you may aggregate them in a group of “unidentifiable” .

  1. [8Marks] Visuallypresent the statistical information you have discovered in Task 3.
  2. [10Marks] Identifyor discuss whether there is any missing information in Task 2. What is your suggestion regarding this?
  3. [8Marks] (Challenging!) Segment authors into three major groups:  University, IT Company (eg. Google, Tencent etc), and Others.

Note:  The other 5 marks are allocated for presentation quality