R 文本数据输入read.table(),scan(), 如何使用自带测试数据集合package:datasets

read.table从文件读取数据,输出数据框。
格式要求
第一行为数据框的name,第二行第一列为行号,其他列为数据。
Input file form with names and row labels:

     Price    Floor     Area   Rooms     Age  Cent.heat
01   52.00    111.0      830     5       6.2      no
02   54.75    128.0      710     5       7.5      no
03   57.50    101.0     1000     5       4.2      no
04   57.50    131.0      690     6       8.8      no
05   59.75     93.0      900     5       1.9     yes
...

> HousePrice <- read.table("houses.data")


如果第二行的行号省略,需要使用参数header=TRUE。
Input file form without row labels:

Price    Floor     Area   Rooms     Age  Cent.heat
52.00    111.0      830     5       6.2      no
54.75    128.0      710     5       7.5      no
57.50    101.0     1000     5       4.2      no
57.50    131.0      690     6       8.8      no
59.75     93.0      900     5       1.9     yes
...

HousePrice <- read.table("houses.data", header=TRUE)


scan()从文件读取数据,输出指定类型。
第二个参数为一个dummy类型,例如输出到列表。

inp <- scan("input.dat", list("",0,0))
label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]

或者可以为dummy列表指定name。
inp <- scan("input.dat", list(id="", x=0, y=0))
label <- inp$id; x <- inp$x; y <- inp$y

dummy类型为数值类型,输出到一个向量,同时使用matrix构造一个矩阵。
X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)


R还内置了很多测试数据集合,在package:datasets包里面。
可以使用attach将这个包加载到搜索路径中以供使用。
默认可能已经绑定了,

> search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics" 
[4] "package:grDevices" "package:utils"     "package:datasets" 
[7] "package:methods"   "Autoloads"         "package:base"     

使用以下方法可以查看这个包中的对象:
> ls(6)
> ls('package:datasets')
> objects(6)
> objects('package:datasets')
  [1] "ability.cov"           "airmiles"              "AirPassengers"        
  [4] "airquality"            "anscombe"              "attenu"               
  [7] "attitude"              "austres"               "beaver1"              
 [10] "beaver2"               "BJsales"               "BJsales.lead"         
 [13] "BOD"                   "cars"                  "ChickWeight"          
 [16] "chickwts"              "co2"                   "CO2"                  
 [19] "crimtab"               "discoveries"           "DNase"                
 [22] "esoph"                 "euro"                  "euro.cross"           
 [25] "eurodist"              "EuStockMarkets"        "faithful"             
 [28] "fdeaths"               "Formaldehyde"          "freeny"               
 [31] "freeny.x"              "freeny.y"              "HairEyeColor"         
 [34] "Harman23.cor"          "Harman74.cor"          "Indometh"             
 [37] "infert"                "InsectSprays"          "iris"                 
 [40] "iris3"                 "islands"               "JohnsonJohnson"       
 [43] "LakeHuron"             "ldeaths"               "lh"                   
 [46] "LifeCycleSavings"      "Loblolly"              "longley"              
 [49] "lynx"                  "mdeaths"               "morley"               
 [52] "mtcars"                "nhtemp"                "Nile"                 
 [55] "nottem"                "npk"                   "occupationalStatus"   
 [58] "Orange"                "OrchardSprays"         "PlantGrowth"          
 [61] "precip"                "presidents"            "pressure"             
 [64] "Puromycin"             "quakes"                "randu"                
 [67] "rivers"                "rock"                  "Seatbelts"            
 [70] "sleep"                 "stack.loss"            "stack.x"              
 [73] "stackloss"             "state.abb"             "state.area"           
 [76] "state.center"          "state.division"        "state.name"           
 [79] "state.region"          "state.x77"             "sunspot.month"        
 [82] "sunspot.year"          "sunspots"              "swiss"                
 [85] "Theoph"                "Titanic"               "ToothGrowth"          
 [88] "treering"              "trees"                 "UCBAdmissions"        
 [91] "UKDriverDeaths"        "UKgas"                 "USAccDeaths"          
 [94] "USArrests"             "USJudgeRatings"        "USPersonalExpenditure"
 [97] "uspop"                 "VADeaths"              "volcano"              
[100] "warpbreaks"            "women"                 "WorldPhones"          
[103] "WWWusage"   

使用测试数据:
例如uspop是一个时间序列

> uspop
Time Series:
Start = 1790 
End = 1970 
Frequency = 0.1 
 [1]   3.93   5.31   7.24   9.64  12.90  17.10  23.20  31.40  39.80  50.20
[11]  62.90  76.00  92.00 105.70 122.80 131.70 151.30 179.30 203.20


解绑之后,就不能直接使用这里面的对象了。
> detach('package:datasets')

使用data()函数也可以查看包中的测试数据对象。
> data()
Data sets in package ‘datasets’:

AirPassengers           Monthly Airline Passenger Numbers 1949-1960
BJsales                 Sales Data with Leading Indicator
BJsales.lead (BJsales)
                        Sales Data with Leading Indicator
BOD                     Biochemical Oxygen Demand
......


查看所有已安装的包的测试数据:
> data(package = .packages(all.available = TRUE))
Data sets in package ‘boot’:

acme                    Monthly Excess Returns
aids                    Delay in AIDS Reporting in England and Wales
aircondit               Failures of Air-conditioning Equipment
aircondit7              Failures of Air-conditioning Equipment
amis                    Car Speeding and Warning Signs
aml                     Remission Times for Acute Myelogenous Leukaemia
......


从其他包导入测试数据:
data(package="rpart")
data(Puromycin, package="datasets")

或者使用library加载包后,测试数据会自动加载到搜索路径。

[参考]
上一篇:进制转换 | 手把手教你入门Python之十七


下一篇:PKI架构的简介,如何使用OPENSSL完成加密与解密,如何自建CA完成证书的签署