一、数据分析
1、一般的数据分析过程
数据采集、数据存储、数据分析、数据挖掘、数据可视化、进行决策
(1)数据存储:存储到数据仓库(输入到计算机中存储为文件格式)。
(2)数据挖掘:从大量的数据通过算法搜索隐藏于其中信息的过程,并不清楚能够挖掘出什么,即不能明确结果。
二、R语言介绍
1、R语言特点
(1)有效的数据处理和保存机制
(2)拥有一整套数组和矩阵的操作运算符
(3)一系列连贯而又完整的数据分析中间工具
(4)图形统计可以对数据直接进行分析和显示,可用于多种图形设备
(5)一种相当完善、简洁和高效的程序设计语言
(6)R语言是彻底面对对象的统计编程语言
(7)R语言和其它编程语言、数据库之间有很好的接口
(8)R语言是*软件,可以放心大胆地使用,但其功能却不比任何其它同类软件差
(9)R语言具有丰富的网上资源
2、R的缺点
(1)R软件不够规范,不容易上手,需要付出较多的努力,付出大量的学习成本
(2)R扩展包太多,且学习难度大
三、Rstudio基本操作
1、工作目录
(1)getwd() 用于查询当前工作目录
(2)setwd()用于设置当前目录
注意:路径名要加上“ ”,要将原本文件加的\改成/
(3)dir() 查询当前工作目录的文件列表
注:如果想要改变默认工作目录,需要点击上方的Tools工具中的Global option中修改
2、赋值运算
(1)常用 <- 进行赋值,注:符号之间一定不要有空格
> x <- 3 > x [1] 3
(2)<<- 用于强制赋值给一个全局变量而不是局部变量,编写函数会用到
3、常用函数
(1)sum(….),取得所有元素的最大值
> sum(1, 3,5 ,6,9) [1] 24
mean(….) 取得所有元素的平均值
> x<-c(1,2,3,4,5,6,7,8) > mean(x) [1] 4.5 > mean(1:8) [1] 4.5
ls() 列出目前在工作空间中存在的变量名
> ls() [1] "x"
ls.str() 列出目前在工作空间中存在的变量的所有信息
> ls.str() x : num [1:8] 1 2 3 4 5 6 7 8
str(变量名)列出变量的详细信息
> str(x) num [1:8] 1 2 3 4 5 6 7 8
rm(变量名)删除变量
> rm(x)
rm(list = ls())删除所有变量
save.image()保存工作空间,默认保存在当前工作目录下的RData文件中(绘制的图片不会单独保存)
> save.image()
四、R包的安装
(1).libPaths()能够显示当前库所在的位置
> .libPaths() [1] "D:/R-3.6.2/library"
(2)library()不加任何参数能显示当前库中的软件包
(3)library(包名)载入包/也可以使用require(包名)载入包
> library("grid")
(4)R的基础包在R启动的时候就会被加载进来
(5)help(package="包名")查看包的帮助文档
> help(package = "grid")
(6)library(help="包名")列出包的基本内容
> library(help="grid")
(7)ls("package:包名")列出包中包含的所有函数
> ls("package:grid") [1] "absolute.size" "addGrob" "applyEdit" [4] "applyEdits" "arcCurvature" "arrow" [7] "arrowsGrob" "ascentDetails" "bezierGrob" [10] "bezierPoints" "calcStringMetric" "childNames" [13] "circleGrob" "clipGrob" "convertHeight" [16] "convertNative" "convertUnit" "convertWidth" [19] "convertX" "convertY" "current.parent" [22] "current.rotation" "current.transform" "current.viewport" [25] "current.vpPath" "current.vpTree" "curveGrob" [28] "dataViewport" "delayGrob" "depth" [31] "descentDetails" "deviceDim" "deviceLoc" [34] "downViewport" "draw.details" "drawDetails" [37] "editDetails" "editGrob" "emptyCoords" [40] "engine.display.list" "explode" "forceGrob" [43] "frameGrob" "functionGrob" "gEdit" [46] "gEditList" "get.gpar" "getGrob" [49] "getNames" "gList" "gpar" [52] "gPath" "grid.abline" "grid.add" [55] "grid.arrows" "grid.bezier" "grid.cap" [58] "grid.circle" "grid.clip" "grid.collection" [61] "grid.convert" "grid.convertHeight" "grid.convertWidth" [64] "grid.convertX" "grid.convertY" "grid.copy" [67] "grid.curve" "grid.delay" "grid.display.list" [70] "grid.DLapply" "grid.draw" "grid.edit" [73] "grid.force" "grid.frame" "grid.function" [76] "grid.gedit" "grid.get" "grid.gget" [79] "grid.grab" "grid.grabExpr" "grid.gremove" [82] "grid.grep" "grid.grill" "grid.grob" [85] "grid.layout" "grid.legend" "grid.line.to" [88] "grid.lines" "grid.locator" "grid.ls" [91] "grid.move.to" "grid.multipanel" "grid.newpage" [94] "grid.null" "grid.pack" "grid.panel" [97] "grid.path" "grid.place" "grid.plot.and.legend" [100] "grid.points" "grid.polygon" "grid.polyline" [103] "grid.pretty" "grid.raster" "grid.record" [106] "grid.rect" "grid.refresh" "grid.remove" [109] "grid.reorder" "grid.revert" "grid.roundrect" [112] "grid.segments" "grid.set" "grid.show.layout" [115] "grid.show.viewport" "grid.strip" "grid.text" [118] "grid.xaxis" "grid.xspline" "grid.yaxis" [121] "grob" "grobAscent" "grobCoords" [124] "grobDescent" "grobHeight" "grobName" [127] "grobPathListing" "grobPoints" "grobTree" [130] "grobWidth" "grobX" "grobY" [133] "gTree" "heightDetails" "is.grob" [136] "is.unit" "isEmptyCoords" "layout.heights" [139] "layout.torture" "layout.widths" "layoutRegion" [142] "legendGrob" "linesGrob" "lineToGrob" [145] "makeContent" "makeContext" "moveToGrob" [148] "nestedListing" "nullGrob" "packGrob" [151] "pathGrob" "pathListing" "placeGrob" [154] "plotViewport" "pointsGrob" "polygonGrob" [157] "polylineGrob" "pop.viewport" "popViewport" [160] "postDrawDetails" "preDrawDetails" "push.viewport" [163] "pushViewport" "rasterGrob" "recordGrob" [166] "rectGrob" "removeGrob" "reorderGrob" [169] "resolveHJust" "resolveRasterSize" "resolveVJust" [172] "roundrectGrob" "seekViewport" "segmentsGrob" [175] "setChildren" "setGrob" "showGrob" [178] "showViewport" "stringAscent" "stringDescent" [181] "stringHeight" "stringWidth" "textGrob" [184] "unit" "unit.c" "unit.length" [187] "unit.pmax" "unit.pmin" "unit.rep" [190] "upViewport" "valid.just" "validDetails" [193] "viewport" "viewport.layout" "viewport.transform" [196] "vpList" "vpPath" "vpStack" [199] "vpTree" "widthDetails" "xaxisGrob" [202] "xDetails" "xsplineGrob" "xsplinePoints" [205] "yaxisGrob" "yDetails"
(8)data(package = "包名")列出包中包含所有的数据集
> data(package = "base") Warning message: In data(package = "base") : 數據機从程序包'base'移到了程序包'datasets'
(9)detach("package:包名")从内存中移除包
(10)remove.package("包名")删除已经安装的包
(11)install.packages("包名")安装对应包
五、数据结构
1、定义:数据结构是计算机存储、组织数据的方式
2、R中的数据类型
数值型,数值可以用于直接结算,加减乘除
字符串型,可以进行连接,转换,提取等
逻辑性,真或者假
日期型等
3、普通的数据结构:向量、标量、列表、数组、多维数组
特殊的数据结构:perl中的哈希,python中的字典、C语言中的指针等
4、R对象:object,它是指可以赋值给变量的任何事物,包括常量、数据结构、函数,甚至图形。对象都拥有某种模式,描述了此对象是如何存储的,以及某个类
5、R中的数据结构:向量、标量、矩阵、数组、列表、数据框因子、时间序列等。