STA 032 Winter 2019
R Report I - Due Tuesday, February 5th by 5:00pm.
R Report I
FORMAT
* Use complete sentences and proper grammar to answer all questions.
* Use R Markdown to create an html document.
* Code should not be in the body of the text, so be sure to add echo = FALSE in the preface to your R chunks. All code
should be included at the end of the homework, as an appendix.
I. Upload the dataset “Fitbit.csv” from Canvas. This dataset has the following columns:
Column 1: Steps - The total number of steps for that day.
Column 2: Miles - The total distance walked in miles for that day.
Column 3: Floors - The total number of floors climbed for that day (up or down).
Column 4: Sleep - The total number hours of sleep that for that night.
Column 5: Day - The day of the week.
Column 6: Month - The month of the year.
Load this dataset in to R, and use R to complete the following:
(a) List the names of the columns.
(b) Find the number of rows in the dataset.
(c) Use the function summary on the dataset and display the results. Describe how this function treats categorical
columns, and how it treats numeric columns.
(d) Find the mean of the column Steps.
(e) Find and display the average and standard deviation for steps taken for every day of the week.
(f) Find and display the average hours and standard deviation of sleep for every day of the week.
(g) Create a boxplot of the total number of steps for every day of the week (there should be 7 sub-plots). Does it
appear one day is less active than the rest? Explain.
(h) Create a boxplot of the total hours of sleep for every day of the week (there should be 7 sub-plots). Does it appear
one day is less restful than the rest? Explain.
(i) Calculate the number of days where the total steps were above 10000.
(j) Calculate the average number of steps taken when total sleep was below 7 hours.
1
II. Create functions which perform the following tasks:
(a) Takes in a vector, and subtracts the mean and divides by the standard deviation (I.e., for every xi finds (xi ?xˉ)/s).
Then returns the standard deviation of the result. Test the function on the following vector: X = 1:100.
(b) Takes in a vector and finds the values which are (ˉx 2s, xˉ + 2s), where s is the sample standard deviation, and
returns both values, with labels of “lower” and “upper” respectively. Test the function on the following vector: X
= 1:100
(c) Takes in a vector, and calculates the mean after removing any observations that are more than 3 standard deviations
from the mean. Test the function on the following vector: Test the function on the following vector: X =
c(1:100,200,300)
III. The purpose of this problem is to simulate a fair coin flip, and to see how many flips it takes for the probability of a
head to be approximately 0.50.
(a) Use the function sample to flip a fair coin 20 times, and find the probability that you flipped a “head” based on
the 20 flips.
(b) Use an sapply to repeat (a) for the following values of n: 10, 100, 1000, 10000, 100000. Show the probabilities for
all 5 values of n.
(c) The error of a coin flip is the absolute value of the estimated probability minus the true probability, i.e
error = |0.50 ? P?(head)|
Find the error for your simulations from (c).
(d) What happens to the error as n increases, and why? Explain your answer.
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
微信:codinghelp