这是一个澳洲的Python作业代写,主要与数据提取相关
1. General instructions
1.1. Define year and quarter, and visit edgar index via
https://www.sec.gov/Archives/edgar/full-index/%s/QTR%s/company.idx
where %s = year; QTR%s = quarter
e.g. All files filed in 2022 QTR1 are listed in
https://www.sec.gov/Archives/edgar/full-index/2022/QTR1/company.idx
It looks like in Figure 1.
Figure 1.
1.2. Filter rows by “Form Type” (defined later in Section 2 – 4) and access the text file by concatenating
strings to build URL = “https://www.sec.gov/Archives/” + File Name.
1.3. Extract information from each filtered text file and write results into csv files per year-quarter.
2. Task A – Specific instructions
▪ Select rows where “Form Type” = “485APOS” or “N-1A” or “N-1A/A”. Though those three form
types are called differently in the company.idx, their filing follow very similar template.
▪ Write selected rows into csv: “A” + “_” + year + “_” + quarter.csv, e.g. A_2011_3.csv,
A_2011_4.csv. Column names are the same as company.idx.
▪ Write unselected rows into csv: “A” + “_” + year + “_” + quarter + “_” + “rej”.csv, e.g.
A_2011_3_rej.csv, A_2011_4_rej.csv. Column names are the same as company.idx.
▪ Read each selected text file and extract information for columns defined in Table 1
▪ Write extract results into csv: “A” + “_” + year + “_” + quarter+ “_” + “result”.csv, e.g.
A_2011_3_result.csv, A_2011_4_result.csv
▪ Enter “N99999A” for missing observations.
3. Task B – Specifics instructions
▪ Select rows where “Form Type” = “485APOS” or “N-1A” or “N-1A/A”. Though those three form
types are called differently in the company.idx, their filing follow very similar template.
▪ Write selected rows into csv: “B” + “_” + year + “_” + quarter.csv, e.g. B_2011_3.csv,
B_2011_4.csv. Column names are the same as company.idx
▪ Write unselected rows into csv: “B” + “_” + year + “_” + quarter + “_” + “rej”.csv, e.g.
B_2011_3_rej.csv, B_2011_4_rej.csv. Column names are the same as company.idx
▪ Read each selected text file and extract information for columns defined in Table 2
▪ Write extract results into csv: “B” + “_” + year + “_” + quarter+ “_” + “result”.csv, e.g.
B_2011_3_result.csv, B_2011_4_result.csv
▪ Enter “N99999A” for missing observations.
Note that, in spelling names, EDGAR filing may use special characters and their codes interchangeably.
For example, “Global X Farmland & Timberland ETF” and “Global X Farmland & Timberland ETF”
are both used in the same file.