Final Project - IA626


Final Project - IA626
Summary
The final project for IA-626 will be an open ended project including the topics below. The project
will be primarily graded on complexity, analysis and documentation.
Requirements
The project should contain the following
● ETL - The methods used to fetch and store source data should be clearly outlined and
repeatable. There should be some thought put into the storage format.
○ Example - News stories were scraped from 3 webpages hourly using the
getStories.py script. Stories were stored as raw HTML files. The files were then
parsed by parseHTML.py which loaded them into JSON object with the following
IA626留学生作业代做、analysis作业代做
schema. I stored the news stories in this JSON schema over MySQL because we
wanted flexibility in the schema.
● Analysis - What is the primary question you are asking? This might be just an initial
question which leads into more in depth analysis.
○ Example - We looked at the frequency of posts but noticed that the frequency
varies between two cities of the same size in the same timezone. We then looked
at demographic information to see if there was a correlation.
● Two or more data sources - Projects should contain 2 or more data sources. One of
these “sources” can be an API which translates results.
○ Example - I took each post containing a word in our keyword list and sent it to an
API which categorized its popularity score.
Waiver of requirements
Some of these requirements can be waived for projects which contain
● Custom data visualization
● Unique or novel analysis
● UI Application
Included code appendix
All students must supply an appendix of APIs and code they have used.
Deliverable content
● Summary / initial question
● Outline - general approach
○ For multi step approaches use diagrams to describe the data flow.
● Python code
● Figures
● Results
● Code / API appendix
Final Project - IA626
https://docs.google.com/document/u/0/d/1xkoDmsR4IWFpc9iKphMEzhLrNCknRl35B7rHdGn3U50/mobilebasic 12/7/19, 01Q02
Page 1 of 2
Here are a few data sources and APIs to consider:
● Files
○ Reddit Comments - 1 month
○ Reddit Comments - 1 year (TBD)
○ Taxi Trips (see me for complete set)
○ Taxi Fares
○ NYS Data
○ NYC Data
● APIs
○ Google places API
○ Google Geolocation API
○ Forecast.io weather API
○ Energy data - bulk
Deliverable format
Project should be delivered as a PDF including images, figures, code snapshots etc. If your
project requires another content type please consult me.
https://docs.google.com/document/u/0/d/1xkoDmsR4IWFpc9iKphMEzhLrNCknRl35B7rHdGn3U50/mobilebasic 12/7/19, 01Q02
Page 2 of 2

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或 微信:codehelp

上一篇:CSCI 2110 Data Structures and Algorithms


下一篇:日志级别的判定