Project 1 (20 Points Total)
Text Mining Twitter Data in R (using “tidytext”)
This is a two-week project spanning Weeks 2 and 3.
All parts are due at the end of Week 3.
Purpose
In this project you will use twitter data with the tidytext package in R to explore and analyze tweets. The goal is to dig deep into twitter data to learn more about a topic or event.
Assignment Due Date and Time
Part 1 and 2 are both due in Week 3, Sunday at 11:59 p.m. ET.
You will need to install R and Twitter data in order to complete this project.
Part 1 (10 Points)
Twitter represents a fundamentally new instrument to make social measurements. Millions of people voluntarily express opinions across any topic. This data source is incredibly valuable for both research and business.
For example, here are some interesting applications of some twitter data analysis studies:
Twitter Study Tracks When We Are :) (twitter data shows biological rhythms)
https://www.nytimes.com/2011/09/30/science/30twitter.html
Twitter mood predicts the stock Market
https://arxiv.org/pdf/1010.3003&embedded=true
Thunderstorm Fest (plot a map of locations where thunder was mentioned in context of a storm in Summer 2012).
https://cliffmass.blogspot.com/2012/07/thunderstorm-fest.html
Researchers from Northeastern University and Harvard University studying the characteristics and dynamics of Twitter as a resource for learning more about how twitter can be used to analyze moods at a national scale.
http://www.ccs.neu.edu/home/amislove/twittermood/
Analyzing Tweets with R and tidytext (Trump and Obama tweet analysis)
https://medium.com/the-artificial-impostor/analyzing-tweets-with-r-92ff2ef990c6
Your Task
Come up with your own twitter analysis idea. Find something to compare on a theme of your choice. Decide on what data you want to use and what you are looking to find in the
代写Twitter Data作业、代写R课程作业 data. You can use your own data or data from strangers. You can use a generic theme or a specific one. You must decide on something you are interested in learning about. See the examples above for some ideas.
Write a 1-2 paragraph description of the analysis you will perform. Title this section, “Description.”
After you have performed the analysis in part 2 (below), provide a 2-3 paragraph description of your conclusion and results. Title this section “Conclusion.” In this section tell me what you discovered from the data? What did the data tell you? Was it what you expected or predicted? Did you learn anything interesting? What are your concluding thoughts on this analysis?
Save both sections together in a document labeled, “analysis.doc.”
Part 2 (10 Points)
Perform the analysis in R using tidytext. Your twitter data analysis must include (all steps outlined in chapter 7):
Word Frequency Analysis
Comparison of Word Usage
Changes in Word Use Analysis
Favorites and Retweets Analysis
Textbook 2. Chapter 7 will guide you through the steps. Save your R source code for the above steps.
Submission Instructions
Upload your part 1, “analysis.doc” and part 2, R source code files to the assignment submission area.
Grading Criteria
The assignment is worth 20 points total, broken out as follows:
Criteria Novice Needs Improvement Proficient Excellent
Part 1 Analysis
10 points 0-5 points
An inappropriate topic was selected that didn’t make any sense or require any analysis or was capable of being analyzed by the dataset.
6-7 points
A good level of analysis was reported however there were areas where significant details and observations were missed.
8 points
The responses to all questions were reasonably correct however some of the reasoning contained unrealistic analysis or results. 10 points
An appropriate topic was selected. The responses to the questions adequately analyzed and described the data descriptions as observed in the analysis.
The data showed interesting results that appeared to be appropriate given the analysis performed.
Part 2 Programming
10 points 0-5 points
No working source code was created to address the proposed problem to be solved.
6-7 points
The source code that was created did not properly address the content of the questions although some of it may have worked to produce the correct results.
8 points
A majority of the answers were implemented properly, and the source code contained appropriate but not efficient solutions to address most of the questions.
10 points
All questions were implemented using efficient and correct R source code syntax. The functions were written properly, and they addressed the questions and provided an adequate response in all cases. The correct libraries were used.
Total 0-10 points
0-60% (F - D) 12-14 points
70% (C) 16 points
80% (B) 20 points
100% (A)
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
微信:codehelp