题目:模拟产生统计专业同学的名单(学号区分),记录数学分析、线性代数、概率统计三科成绩,然后进行一些统计分析
> num=seq(10378001,10378100) > num [1] 10378001 10378002 10378003 10378004 10378005 10378006 10378007 10378008 [9] 10378009 10378010 10378011 10378012 10378013 10378014 10378015 10378016 [17] 10378017 10378018 10378019 10378020 10378021 10378022 10378023 10378024 [25] 10378025 10378026 10378027 10378028 10378029 10378030 10378031 10378032 [33] 10378033 10378034 10378035 10378036 10378037 10378038 10378039 10378040 [41] 10378041 10378042 10378043 10378044 10378045 10378046 10378047 10378048 [49] 10378049 10378050 10378051 10378052 10378053 10378054 10378055 10378056 [57] 10378057 10378058 10378059 10378060 10378061 10378062 10378063 10378064 [65] 10378065 10378066 10378067 10378068 10378069 10378070 10378071 10378072 [73] 10378073 10378074 10378075 10378076 10378077 10378078 10378079 10378080 [81] 10378081 10378082 10378083 10378084 10378085 10378086 10378087 10378088 [89] 10378089 10378090 10378091 10378092 10378093 10378094 10378095 10378096 [97] 10378097 10378098 10378099 10378100
用runif(产生均匀分布的随机数)和rnorm(产生正态分布的随机数)
> x1=round(runif(100,min=80,max=100)) > x1 [1] 81 94 98 86 86 95 88 90 93 86 87 93 93 85 85 87 84 93 [19] 99 85 99 80 88 93 82 86 89 83 96 99 89 92 87 87 83 86 [37] 89 88 85 92 86 84 87 86 88 94 89 93 95 99 99 92 89 100 [55] 92 98 82 88 83 83 94 91 84 81 88 92 98 83 94 95 99 95 [73] 81 82 86 94 85 83 81 87 98 90 81 81 90 85 80 92 98 82 [91] 96 96 91 95 80 88 84 87 93 96 > x2=round(rnorm(100,mean=80,sd=7)) > x2 [1] 72 67 83 81 82 81 73 73 74 84 72 86 87 79 85 70 76 93 73 85 89 77 75 72 82 [26] 83 85 82 79 88 86 87 83 72 76 90 85 77 81 77 94 74 61 76 92 77 77 74 87 94 [51] 87 81 66 76 73 75 81 84 89 70 73 86 81 80 79 81 82 74 75 65 77 75 75 87 90 [76] 74 84 71 85 89 79 80 79 77 90 77 83 80 78 94 85 81 83 82 87 84 86 89 83 75 > x3=round(rnorm(100,mean=83,sd=18)) > x3 [1] 85 107 96 83 82 60 68 106 52 78 114 78 74 80 76 121 84 90 [19] 66 105 104 110 94 68 80 84 84 103 99 98 101 82 91 71 96 74 [37] 82 115 77 70 84 82 74 88 83 100 92 70 77 98 103 58 79 85 [55] 45 63 101 66 60 70 77 67 83 90 79 100 105 76 103 95 82 78 [73] 72 54 64 83 85 92 93 120 100 98 82 73 93 110 90 102 81 98 [91] 91 53 103 74 59 91 110 71 76 92 > x3[which(x3>100)]=100 #将大于100分的成绩换成100分 > x3 [1] 85 100 96 83 82 60 68 100 52 78 100 78 74 80 76 100 84 90 [19] 66 100 100 100 94 68 80 84 84 100 99 98 100 82 91 71 96 74 [37] 82 100 77 70 84 82 74 88 83 100 92 70 77 98 100 58 79 85 [55] 45 63 100 66 60 70 77 67 83 90 79 100 100 76 100 95 82 78 [73] 72 54 64 83 85 92 93 100 100 98 82 73 93 100 90 100 81 98 [91] 91 53 100 74 59 91 100 71 76 92
合成数据框并保存到硬盘
> x=data.frame(num,x1,x2,x3) > x num x1 x2 x3 1 10378001 81 72 85 2 10378002 94 67 100 3 10378003 98 83 96 4 10378004 86 81 83 5 10378005 86 82 82 6 10378006 95 81 60 7 10378007 88 73 68 8 10378008 90 73 100 9 10378009 93 74 52 10 10378010 86 84 78 11 10378011 87 72 100 12 10378012 93 86 78 13 10378013 93 87 74 14 10378014 85 79 80 15 10378015 85 85 76 16 10378016 87 70 100 17 10378017 84 76 84 18 10378018 93 93 90 19 10378019 99 73 66 20 10378020 85 85 100 21 10378021 99 89 100 22 10378022 80 77 100 23 10378023 88 75 94 24 10378024 93 72 68 25 10378025 82 82 80 26 10378026 86 83 84 27 10378027 89 85 84 28 10378028 83 82 100 29 10378029 96 79 99 30 10378030 99 88 98 31 10378031 89 86 100 32 10378032 92 87 82 33 10378033 87 83 91 34 10378034 87 72 71 35 10378035 83 76 96 36 10378036 86 90 74 37 10378037 89 85 82 38 10378038 88 77 100 39 10378039 85 81 77 40 10378040 92 77 70 41 10378041 86 94 84 42 10378042 84 74 82 43 10378043 87 61 74 44 10378044 86 76 88 45 10378045 88 92 83 46 10378046 94 77 100 47 10378047 89 77 92 48 10378048 93 74 70 49 10378049 95 87 77 50 10378050 99 94 98 51 10378051 99 87 100 52 10378052 92 81 58 53 10378053 89 66 79 54 10378054 100 76 85 55 10378055 92 73 45 56 10378056 98 75 63 57 10378057 82 81 100 58 10378058 88 84 66 59 10378059 83 89 60 60 10378060 83 70 70 61 10378061 94 73 77 62 10378062 91 86 67 63 10378063 84 81 83 64 10378064 81 80 90 65 10378065 88 79 79 66 10378066 92 81 100 67 10378067 98 82 100 68 10378068 83 74 76 69 10378069 94 75 100 70 10378070 95 65 95 71 10378071 99 77 82 72 10378072 95 75 78 73 10378073 81 75 72 74 10378074 82 87 54 75 10378075 86 90 64 76 10378076 94 74 83 77 10378077 85 84 85 78 10378078 83 71 92 79 10378079 81 85 93 80 10378080 87 89 100 81 10378081 98 79 100 82 10378082 90 80 98 83 10378083 81 79 82 84 10378084 81 77 73 85 10378085 90 90 93 86 10378086 85 77 100 87 10378087 80 83 90 88 10378088 92 80 100 89 10378089 98 78 81 90 10378090 82 94 98 91 10378091 96 85 91 92 10378092 96 81 53 93 10378093 91 83 100 94 10378094 95 82 74 95 10378095 80 87 59 96 10378096 88 84 91 97 10378097 84 86 100 98 10378098 87 89 71 99 10378099 93 83 76 100 10378100 96 75 92 > write.table(x,file="mark.txt",col.names=F,row.name=F,sep=" ")
计算各科平均分
> mean(x) [1] NA Warning message: In mean.default(x) : 参数不是数值也不是逻辑值:回覆NA > colMeans(x) num x1 x2 x3 10378050.50 89.24 80.25 83.68 > colMeans(x)[c("x1","x2","x3")] x1 x2 x3 89.24 80.25 83.68 > apply(x,2,mean) num x1 x2 x3 10378050.50 89.24 80.25 83.68
求各科最高最低分
> apply(x,2,max) num x1 x2 x3 10378100 100 94 100 > apply(x,2,min) num x1 x2 x3 10378001 80 61 45
求每人的总分
> apply(x[c("x1","x2","x3")],1,sum) [1] 238 261 277 250 250 236 229 263 219 248 259 257 254 244 246 257 244 276 [19] 238 270 288 257 257 233 244 253 258 265 274 285 275 261 261 230 255 250 [37] 256 265 243 239 264 240 222 250 263 271 258 237 259 291 286 231 234 261 [55] 210 236 263 238 232 223 244 244 248 251 246 273 280 233 269 255 258 248 [73] 228 223 240 251 254 246 259 276 277 268 242 231 273 262 253 272 257 274 [91] 272 230 274 251 226 263 270 247 252 263
求总分最高的同学
> which.max(apply(x[c("x1","x2","x3")],1,sum)) [1] 50 > x$num[which.max(apply(x[c("x1","x2","x3")],1,sum))] [1] 10378050
对x1进行直方图分析
> hist(x$x1)
探索各科成绩的关联关系
> plot(x1,x2) > plot(x$x1,x$x2)
列联表分析
> table(x$x1) 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 98 99 100 3 6 4 6 4 6 8 7 7 5 3 2 6 7 5 5 4 5 6 1 > barplot(table(x$x1))
饼图
> pie(table(x$x1))
箱线图
> boxplot(x$x1,x$x2,x$x3)
> boxplot(x[2:4],col=c("red","green","blue"),notch=T)#颜色设置
> boxplot(x$x1,x$x2,x$x3,horizontal=T)#水平放置
星相图
> stars(x[c("x1","x2","x3")])
> stars(x[c("x1","x2","x3")],full=T,draw.segment=T)#雷达图
> stars(x[c("x1","x2","x3")],full=F,draw.segment=T)#雷达图(半圆)
脸谱图
> library(aplpack) 载入需要的程辑包:tcltk > faces(x[c("x1","x2","x3")]) effect of variables: modified item Var "height of face " "x1" "width of face " "x2" "structure of face" "x3" "height of mouth " "x1" "width of mouth " "x2" "smiling " "x3" "height of eyes " "x1" "width of eyes " "x2" "height of hair " "x3" "width of hair " "x1" "style of hair " "x2" "height of nose " "x3" "width of nose " "x1" "width of ear " "x2" "height of ear " "x3"
其它脸谱图
> library(TeachingDemos) 载入程辑包:‘TeachingDemos’ The following objects are masked from ‘package:aplpack’: faces, slider > faces2(x)
茎叶图
> stem(x$x1) The decimal point is at the | 80 | 000000000 82 | 0000000000 84 | 0000000000 86 | 000000000000000 88 | 000000000000 90 | 00000 92 | 0000000000000 94 | 0000000000 96 | 0000 98 | 00000000000 100 | 0
QQ图
可用于判断是否正态分布
直线的斜率是标准差,截距是均值
点的分布越是接近直线,则越接近正态分布
> qqnorm(x1) > qqline(x1) > qqnorm(x3) > qqline(x3)
散点图的进一步设置
plot(x$x1,x$x2 main="数学分析与线性代数成绩的关系", xlab="数学分析", ylab="线性代数", xlim=c(0,100), ylim=c(0,100), xaxs="i",#Set x axis style as internal yaxs="i",#Set y axis style as internal col="red",#Set the color of plotting symbol to red pch=19)#Set the ploting symbol to filled dots