1 tibble包简介
包名: tibble 编码: UTF- 最新版本: 1.2 标题: 简单数据框 描述: 构建一个 'tbl_df' 类,可以比传统的R数据框提供更好的检查和打印功能。 作者: Hadley Wickham , Romain Francois ,Kirill Müller, RStudio URL: https://github.com/hadley/tibble 要求: R (>= ) Github: https://github.com/hadley/tibble
tibble包是一个轻量级的包,它实现的data.frame的重新塑造,保留了data.frame中经过实践证明有效的部分,吸取了专注于数据操作的dplyr包的基本思想。tibble包提供了更优于data.frame的性能,包括:打印,提取子集和因子操作。
tibble包内提供的主要函数:
名称 |
功能 |
as_tibble |
强制转换lists和matrices为数据框(data.frame) |
tibble |
创建数据框(data.frame)或列表(list) |
tribble |
智能行(Row-wise)创建tibble |
obj_sum/ type_sum/ tbl_sum |
给出对象的简明摘要:对象类型和数据框大小 |
rownames |
行名的操作工具(非常有用):可以提取行名为列或列为行名 |
has_name |
检查命名元素的存在has_name(iris, "Species") |
repair_names |
修复对象的名称(如果没有命名则用V+i代替) |
all_equal |
数据框相等的柔性比较,忽略行和列的排列顺序 |
glimpse |
有点像str(),主要是查看数据集的结构 |
enframe |
将向量变为数据框 |
print.tbl_df |
print(x,n)打印数据集x的前n行,默认为10行,有点像head() |
add_column |
给数据框添加列 |
add_row |
给数据框添加行 |
is.tibble |
检测对象是否为tibble |
knit_print.trunc_mat |
截断显示 |
2 安装和使用
2.1 安装
从CRAN安装:
install.packages("tibble")
从github安装:
# install.packages("devtools") devtools::install_github("hadley/tibble")
2.2 创建tibbles对象
可以利用as_tibble()函数将已经存在的对象(data.frame,list,matrix,or table)强制转为tibble对象:
library(tibble) as_tibble(iris) #> # A tibble: × #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fctr> #> 5.1 3.5 1.4 0.2 setosa #> 4.9 3.0 1.4 0.2 setosa #> 4.7 3.2 1.3 0.2 setosa #> 4.6 3.1 1.5 0.2 setosa #> 5.0 3.6 1.4 0.2 setosa #> 5.4 3.9 1.7 0.4 setosa #> 4.6 3.4 1.4 0.3 setosa #> 5.0 3.4 1.5 0.2 setosa #> 4.4 2.9 1.4 0.2 setosa #> 4.9 3.1 1.5 0.1 setosa #> # ... with more rows
也可以利用tibble()函数创建:
tibble(x = :, y = , z = x ^ + y) #> # A tibble: × #> x y z #> <int> <dbl> <dbl> #> #> #> #> #> a <- : tibble(a, b = a * ) ## # A tibble: × ## a b ## <int> <dbl> ## ## ## ## ## tibble(a, b = a * , c = ) ## # A tibble: × ## a b c ## <int> <dbl> <dbl> ## ## ## ## ## tibble(x = runif(), y = x * ) # # A tibble: × # x y # <dbl> <dbl> # 0.7098188 1.4196377 # 0.2790267 0.5580533 # 0.2655437 0.5310874 # 0.1237294 0.2474587 # 0.9018147 1.8036293 # 0.1594413 0.3188827 # 0.2592028 0.5184056 # 0.6570324 1.3140648 # 0.8955551 1.7911102 # 0.1940897 0.3881794 tibble(x = letters) # # A tibble: × # x # <chr> # a # b # c # d # e # f # g # h # i # j # # ... with more rows tibble(x = :, y = list(:, :, :)) #> # A tibble: × #> x y #> <int> <list> #> <]> #> <]> #> <]>
也可以使用tribble()函数一行一行的定义一个tibble对象:
tribble( ~x, ~y, ~z, , 3.6, , 8.5 ) #> # A tibble: × #> x y z #> <chr> <dbl> <dbl> #> a 3.6 #> b 8.5
查看类型,最底层还是data.frame:
class(as_tibble(iris)) #> [] "tbl_df" "tbl" "data.frame"
2.3 添加行和列
### 添加行 add_row(.data, ..., .before = NULL, .after = NULL) .data 要添加的数据框 .before , .after 在哪行之前或之后添加该数据 df <- tibble(x = :, y = :) df #> # A tibble: × #> x y #> <int> <int> #> #> #> 1
library(dplyr) df %>% add_row(x = 4, y = 0, .before = 2) #> # A tibble: 4 × 2 #> x y #> <dbl> <dbl> #> 1 1 3 #> 2 4 0 #> 3 2 2 #> 4 3 1 df %>% add_row(x = 4:5, y = 0:-1) #> # A tibble: 5 × 2 #> x y #> <int> <int> #> 1 1 3 #> 2 2 2 #> 3 3 1 #> 4 4 0 #> 5 5 -1 add_row(df, x = 4) #> # A tibble: 4 <U+00D7> 2 #> x y #> <dbl> <int> #> 1 1 3 #> 2 2 2 #> 3 3 1 #> 4 4 NA
### 添加列 add_column(.data, ..., .before = NULL, .after = NULL) .data 要添加的数据框 .before , .after 在哪行=列之前或之后添加该数据 df %>% add_column(z = -:, w = ) #> # A tibble: × #> x y z w #> <int> <int> <int> <dbl> #> - #> #> df %>% add_column(z = -:, .after = ) #> # A tibble: × #> x z y #> <int> <int> <int> #> - #> #> df %>% add_column(w = :, .before = "x") #> # A tibble: × #> w x y #> <int> <int> <int> #> #> #>
2.4 命名操作
2.4.1 rownames 行名的操作工具
df 数据框
var 用于rownames的列的名称
has_rownames(df) 确定数据框是否有行名
remove_rownames(df) 删除数据框的行名
library(tibble) head(mtcars) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 ## Mazda RX4 Wag ## Datsun ## Hornet Drive ## Hornet Sportabout ## Valiant head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 5.1 3.5 1.4 0.2 setosa ## 4.9 3.0 1.4 0.2 setosa ## 4.7 3.2 1.3 0.2 setosa ## 4.6 3.1 1.5 0.2 setosa ## 5.0 3.6 1.4 0.2 setosa ## 5.4 3.9 1.7 0.4 setosa has_rownames(mtcars) ## [] TRUE has_rownames(iris) ## [] FALSE has_rownames(remove_rownames(mtcars)) ## [] FALSE head(remove_rownames(mtcars)) ## mpg cyl disp hp drat wt qsec vs am gear carb ## ## ## ## ## ##
rownames_to_column(df, var = "rowname") 数据框的行名作为数据框的列,列名为rowname
column_to_rownames(df, var = "rowname") 数据框的某列作为行名
head(rownames_to_column(mtcars,"row2col")) ## row2col mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 ## Mazda RX4 Wag ## Datsun ## Hornet Drive ## Hornet Sportabout ## Valiant mtcars_tbl <- as_tibble(rownames_to_column(mtcars)) mtcars_tbl # # A tibble: × # rowname mpg cyl disp hp drat wt qsec vs am # <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # Mazda RX4 # Mazda RX4 Wag # Datsun # Hornet Drive # Hornet Sportabout # Valiant # Duster # Merc 240D # Merc # Merc # # ... with more rows, and more variables: gear <dbl>, carb <dbl> head(column_to_rownames(as.data.frame(mtcars_tbl))) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 ## Mazda RX4 Wag ## Datsun ## Hornet Drive ## Hornet Sportabout ## Valiant df <- rownames_to_column(mtcars,"row2col") column_to_rownames(df,"row2col")
2.4.2 has_name 检查数据框或者其他对象中是否存在指定命名元素,返回逻辑值(TRUE or FALSE)
has_name(x, name) x 数据框或其他命名对象 name 需检查的元素 has_name(iris, "Species") ## [] TRUE has_name(mtcars, "gears") ## [] FALSE
2.4.3 repair_names 修复对象的名称(如果没有命名则用V+i代替)
repair_names(x, prefix = "V", sep = "") x 命名的向量 prefix 字符串,前缀,该前缀用于新列名 sep 分隔符
list(, , ) # [[]] # [] # # [[]] # [] # # [[]] # [] repair_names(list(, , )) # works for lists, too # $V1 # [] # # $V2 # [] # # $V3 # [] tbl <- as_tibble(structure(list(, , ), class = "data.frame"),validate = FALSE) tbl # A tibble: × # ... with variables: <dbl>, <dbl>, <dbl> repair_names(tbl) # A tibble: <U+00D7> # ... with variables: V1 <dbl>, V2 <dbl>, V3 <dbl> repair_names(list(,,),prefix = "new",sep = "-") # $`` # [] # # $`` # [] # # $`` # []
2.5 其他函数
2.5.1 obj_sum/ type_sum/ tbl_sum 给出对象的简明摘要:对象类型和数据框大小
obj_sum(x) # 如果is_s3_vector值为TRUE,也就是是S3类型的向量,同时返回对象的尺寸的对象数据类型 type_sum(x) # 给出对象类型简短摘要 tbl_sum(x) # 给出一个类似于表对象的简短的文字描述,包括维数,数据源,可能的组(for dplyr) is_vector_s3(x)
> obj_sum(:) # [] "int [10]" > obj_sum(matrix(:)) # [] "int [10 <U+00D7> 1]" > obj_sum(Sys.Date()) # [] "date [1]" > obj_sum(Sys.time()) # [] "dttm [1]" > obj_sum(mean) # [] "fun"
2.5.2 all_equal 数据框柔性比较,忽略行和列的排列顺序
当使用all.equal比较两个tbl_df,默认情况下行和列的顺序是被忽略的,并且类型也不是强制要求。
all_equal(target, current, ignore_col_order = TRUE, ignore_row_order = TRUE, convert = FALSE, ...) "all.equal"(target, current, ignore_col_order = TRUE, ignore_row_order = TRUE, convert = FALSE, ...) 参数: target, current 要比较的两个数据框 ignore_col_order 是否需要忽略列顺序,默认为TRUE ignore_row_order 是否需要忽略行顺序,默认为TRUE convert 是否需要转换为相似的类型,默认为FALSE,如果为TRUE,会将因子factor转为字符character,整型integer double转为双精度浮点型 ...
# 对行号和列号进行采样,打乱行列顺序 scramble <- function(x) x[sample(nrow(x)), sample(ncol(x))] # 转为tbl-df类型 mtcars_df <- as_tibble(mtcars) # 默认情况下行列顺序是忽略的 all.equal(mtcars_df, scramble(mtcars_df)) # [] TRUE # 修改默认行列顺序不被忽略 all.equal(mtcars_df, scramble(mtcars_df), ignore_col_order = FALSE) # [] TRUE all.equal(mtcars_df, scramble(mtcars_df), ignore_row_order = FALSE) # [] "Component “mpg”: Mean relative difference: 0.3503521" # [] "Component “cyl”: Mean relative difference: 0.4912281" # [] "Component “disp”: Mean relative difference: 0.5690846" # [] "Component “hp”: Mean relative difference: 0.5386953" # [] "Component “drat”: Mean relative difference: 0.1387415" # [] "Component “wt”: Mean relative difference: 0.3286861" # [] "Component “qsec”: Mean relative difference: 0.1222072" # [] "Component “vs”: Mean relative difference: 2" # [] "Component “am”: Mean relative difference: 2" # [] "Component “gear”: Mean relative difference: 0.32" # [] "Component “carb”: Mean relative difference: 0.8" # 默认情况下all.equal对变量的差异很敏感 df1 <- tibble(x = "a") df2 <- tibble(x = factor("a")) all.equal(df1, df2) # [] "Incompatible type for column x: x character, y factor" all.equal(df1, df2,convert = TRUE) # [] "Factor levels not equal for column x" # Warning message: # Incompatible type for column x: x character, y factor
2.5.3 glimpse 有点像str(),主要是查看数据集的结构
glimpse(x, width = NULL, ...) x glimpse的对象 width 输出宽度:默认为tibble.width设定的宽度(如果有限)或者是控制台显示的宽度 glimpse(mtcars) # Observations: # Variables: # $ mpg <dbl> ... # $ cyl <dbl> , , , , , , , , , , , , , , , , , , , , ,... # $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8,... # $ hp <dbl> , , , , , , , , , , , , , ... # $ drat <dbl> .... # $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150,... # $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90,... # $ vs <dbl> , , , , , , , , , , , , , , , , , , , , ,... # $ am <dbl> , , , , , , , , , , , , , , , , , , , , ,... # $ gear <dbl> , , , , , , , , , , , , , , , , , , , , ,... # $ carb <dbl> , , , , , , , , , , , , , , , , , , , , ,... if (!requireNamespace("nycflights13", quietly = TRUE)) stop("Please install the nycflights13 package to run the rest of this example") # install.packages("nycflights13") glimpse(nycflights13::flights) # Observations: , # Variables: # $ year <, , , , , , , , ... # $ month <, , , , , , , , , , , , , , , , , ... # $ day <, , , , , , , , , , , , , , , , , ... # $ dep_time <, , , , , , , , , , ... # $ sched_dep_time <, , , , , , , , , , ... # $ dep_delay <dbl> , , , -, -, -, -, -, -, -, -, -, -, -,... # $ arr_time <, , , , , , , , , , ... # $ sched_arr_time <, , , , , , , , , , ... # $ arr_delay <dbl> , , , -, -, , , -, -, , -, -, ,... # $ carrier <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6"... # $ flight <, , , , , , , , , ... # $ tailnum <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N... # $ origin <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LG... # $ dest <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IA... # $ air_time <dbl> , , , , , , , , , , ... # $ distance <dbl> , , , , , , , , , ... # $ hour <dbl> , , , , , , , , , , , , , , , , , ... # $ minute <dbl> , , , , , , , , , , , , , , , ... # $ time_hour <dttm> -- ::, -- ::, --...
2.5.4 enframe 将向量变为数据框
将元向量或者列表转为两列的数据框,如果元向量没有命名,使用自然序列命名列。
enframe(x, name = "name", value = "value") x 元向量 name,value 两列命名,默认分别为name和value enframe(:) # # A tibble: × # name value # <int> <int> # # # enframe(c(a = , b = )) # # A tibble: × # name value # <chr> <dbl> # a # b
2.5.5 print.tbl_df
print(x,n)打印数据集x的前n行,默认为10行,有点像head()
描述矩阵的工具
"print"(x, ..., n = NULL, width = NULL, n_extra = NULL) trunc_mat(x, n = NULL, width = NULL, n_extra = NULL) x 展示的对象 n 要显示的行,如果为NULL(默认)并且行数小于tibble.print_max设定的值则会打印所有的行,否则会打印tibble.print_max设定的函数 width 生成的文本的宽度默认为NULL,此种情况下和使用getOption("tibble.width")或者getOption("width")设定值;后者只显示适应屏幕的列。也可以设定options(tibble.width = Inf)来显示所有的列 n_extra 整个tibble的宽度太小而打印的额外的信息,默认为NULL,会打印tibble.max_extra_cols作为额外的列信息
trunc_mat(mtcars) # # data.frame [ × ] # mpg cyl disp hp drat wt qsec vs am gear carb # * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # # # # # # # # # # # ... with more rows print(as_tibble(mtcars)) # # A tibble: × # mpg cyl disp hp drat wt qsec vs am gear carb # * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # # # # # # # # # # # ... with more rows print(as_tibble(mtcars), n = ) # # A tibble: × # mpg cyl disp hp drat wt qsec vs am gear carb # * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # # # ... with more rows print(as_tibble(mtcars), n = ) # # A tibble: × # mpg cyl disp hp drat wt qsec vs am gear carb # * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # # # # # ... with more rows print(as_tibble(mtcars), n = ) # 全部打印 if (!requireNamespace("nycflights13", quietly = TRUE)) stop("Please install the nycflights13 package to run the rest of this example") print(nycflights13::flights, n_extra = ) print(nycflights13::flights, width = Inf)
2.5.6 is.tibble 检测对象是否为tibble
is.tibble(x) is_tibble(x)
参考链接:http://www.rdocumentation.org/packages/tibble/versions/1.2