read.table从文件读取数据,输出数据框。
格式要求
第一行为数据框的name,第二行第一列为行号,其他列为数据。
如果第二行的行号省略,需要使用参数header=TRUE。
scan()从文件读取数据,输出指定类型。
或者可以为dummy列表指定name。
dummy类型为数值类型,输出到一个向量,同时使用matrix构造一个矩阵。
R还内置了很多测试数据集合,在package:datasets包里面。
使用以下方法可以查看这个包中的对象:
解绑之后,就不能直接使用这里面的对象了。
使用data()函数也可以查看包中的测试数据对象。
查看所有已安装的包的测试数据:
从其他包导入测试数据:
[参考]
Input file form with names and row labels:
Price Floor Area Rooms Age Cent.heat
01 52.00 111.0 830 5 6.2 no
02 54.75 128.0 710 5 7.5 no
03 57.50 101.0 1000 5 4.2 no
04 57.50 131.0 690 6 8.8 no
05 59.75 93.0 900 5 1.9 yes
...
> HousePrice <- read.table("houses.data")
如果第二行的行号省略,需要使用参数header=TRUE。
Input file form without row labels:
Price Floor Area Rooms Age Cent.heat
52.00 111.0 830 5 6.2 no
54.75 128.0 710 5 7.5 no
57.50 101.0 1000 5 4.2 no
57.50 131.0 690 6 8.8 no
59.75 93.0 900 5 1.9 yes
...
HousePrice <- read.table("houses.data", header=TRUE)
scan()从文件读取数据,输出指定类型。
第二个参数为一个dummy类型,例如输出到列表。
inp <- scan("input.dat", list("",0,0))
label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]
或者可以为dummy列表指定name。
inp <- scan("input.dat", list(id="", x=0, y=0))
label <- inp$id; x <- inp$x; y <- inp$y
dummy类型为数值类型,输出到一个向量,同时使用matrix构造一个矩阵。
X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)
R还内置了很多测试数据集合,在package:datasets包里面。
可以使用attach将这个包加载到搜索路径中以供使用。
默认可能已经绑定了,
> search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
使用以下方法可以查看这个包中的对象:
> ls(6)
> ls('package:datasets')
> objects(6)
> objects('package:datasets')
[1] "ability.cov" "airmiles" "AirPassengers"
[4] "airquality" "anscombe" "attenu"
[7] "attitude" "austres" "beaver1"
[10] "beaver2" "BJsales" "BJsales.lead"
[13] "BOD" "cars" "ChickWeight"
[16] "chickwts" "co2" "CO2"
[19] "crimtab" "discoveries" "DNase"
[22] "esoph" "euro" "euro.cross"
[25] "eurodist" "EuStockMarkets" "faithful"
[28] "fdeaths" "Formaldehyde" "freeny"
[31] "freeny.x" "freeny.y" "HairEyeColor"
[34] "Harman23.cor" "Harman74.cor" "Indometh"
[37] "infert" "InsectSprays" "iris"
[40] "iris3" "islands" "JohnsonJohnson"
[43] "LakeHuron" "ldeaths" "lh"
[46] "LifeCycleSavings" "Loblolly" "longley"
[49] "lynx" "mdeaths" "morley"
[52] "mtcars" "nhtemp" "Nile"
[55] "nottem" "npk" "occupationalStatus"
[58] "Orange" "OrchardSprays" "PlantGrowth"
[61] "precip" "presidents" "pressure"
[64] "Puromycin" "quakes" "randu"
[67] "rivers" "rock" "Seatbelts"
[70] "sleep" "stack.loss" "stack.x"
[73] "stackloss" "state.abb" "state.area"
[76] "state.center" "state.division" "state.name"
[79] "state.region" "state.x77" "sunspot.month"
[82] "sunspot.year" "sunspots" "swiss"
[85] "Theoph" "Titanic" "ToothGrowth"
[88] "treering" "trees" "UCBAdmissions"
[91] "UKDriverDeaths" "UKgas" "USAccDeaths"
[94] "USArrests" "USJudgeRatings" "USPersonalExpenditure"
[97] "uspop" "VADeaths" "volcano"
[100] "warpbreaks" "women" "WorldPhones"
[103] "WWWusage"
使用测试数据:
例如uspop是一个时间序列
> uspop
Time Series:
Start = 1790
End = 1970
Frequency = 0.1
[1] 3.93 5.31 7.24 9.64 12.90 17.10 23.20 31.40 39.80 50.20
[11] 62.90 76.00 92.00 105.70 122.80 131.70 151.30 179.30 203.20
解绑之后,就不能直接使用这里面的对象了。
> detach('package:datasets')
使用data()函数也可以查看包中的测试数据对象。
> data()
Data sets in package ‘datasets’:
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
......
查看所有已安装的包的测试数据:
> data(package = .packages(all.available = TRUE))
Data sets in package ‘boot’:
acme Monthly Excess Returns
aids Delay in AIDS Reporting in England and Wales
aircondit Failures of Air-conditioning Equipment
aircondit7 Failures of Air-conditioning Equipment
amis Car Speeding and Warning Signs
aml Remission Times for Acute Myelogenous Leukaemia
......
从其他包导入测试数据:
data(package="rpart")
data(Puromycin, package="datasets")
或者使用library加载包后,测试数据会自动加载到搜索路径。