shell编程系列21--文本处理三剑客之awk中数组的用法及模拟生产环境数据统计 shell中的数组的用法:
shell数组中的下标是从0开始的 array=("Allen" "Mike" "Messi" "Jerry" "Hanmeimei" "Wang")
打印元素: echo ${array[]}
打印元素个数: echo ${#array[@]}
打印某个元素长度: echo ${#array[]} 给元素赋值: array[]=ui;
删除元素: unset array[];unset array # 删除数组
分片访问: echo ${array[@]::}
元素内容替换: ${array[@]/e/E} 只替换第一个e;${array[@]//e/E} 替换所有的e 数组的遍历:
for a in ${array[@]}
do
echo $a
done awk中数组的用法:
在awk中,使用数组时,不仅可以使用1...n作为数组小标,也可以使用字符串作为数组下标 典型常用例子:
、统计主机上所有的tcp链接状态数,按照每个tcp状态分类
# netstat -an | grep tcp | awk '{arr[$6]++}END{for (i in arr) print i,arr[i]}'
LAST_ACK
LISTEN
SYN_RECV
ESTABLISHED
FIN_WAIT1
FIN_WAIT2
CLOSING
TIME_WAIT 、计算横向数据综合,计算纵向数据总和 Allen
Mike
Zhang
Jerry
Han
Li # 代码如下:
[root@localhost shell]# awk -f statics.awk student.txt
Name Yuwen Math English Physical total
Allen
Mike
Zhang
Jerry
Han
Li
every_total
[root@localhost shell]# cat statics.awk
BEGIN{
printf "%-30s%-30s%-30s%-30s%-30s%-30s\n","Name","Yuwen","Math","English","Physical","total"
}
{
total=$+$+$+$
yuwen_sum+=$
math_sum+=$
english_sum+=$
physical_sum+=$
printf "%-30s%-30d%-30d%-30d%-30d%-30d\n",$,$,$,$,$,total
}
END{
printf "%-30s%-30d%-30d%-30d%-30d\n","every_total",yuwen_sum,math_sum,english_sum,physical_sum
} 计算字符串的长度:
[root@localhost shell]# str="test string"
[root@localhost shell]# echo $str
test string
[root@localhost shell]# echo ${#str} # 修改数组元素
array=("Allen" "Mike" "Messi" "Jerry" "Hanmeimei" "Wang")
[root@localhost shell]# array[]="Jerry"
[root@localhost shell]# echo ${array[@]}
Allen Jerry Messi Jerry Hanmeimei Wang # 删除第3个元素
[root@localhost shell]# echo ${array[@]}
Allen Jerry Messi Jerry Hanmeimei Wang
[root@localhost shell]#
[root@localhost shell]# unset array[];
[root@localhost shell]# echo ${array[@]}
Allen Jerry Jerry Hanmeimei Wang # 在数组中删除下标为1的元素,即Mike被删除,再次删除下标为1的元素,发现数组不变,说明数组虽然删除了元素,下标还是不变保存在内存中
[root@localhost shell]# array=("Allen" "Mike" "Messi" "Jerry" "Hanmeimei" "Wang")
[root@localhost shell]# unset array[]
[root@localhost shell]# echo ${array[*]}
Allen Messi Jerry Hanmeimei Wang
[root@localhost shell]# unset array[]
[root@localhost shell]# echo ${array[*]}
Allen Messi Jerry Hanmeimei Wang # 分片访问,数组为1的开始遍历3个元素
[root@localhost shell]# array=("Allen" "Mike" "Messi" "Jerry" "Hanmeimei" "Wang")
[root@localhost shell]# echo ${array[@]::}
Mike Messi Jerry
# 1到最后
[root@localhost shell]# echo ${array[@]:}
Mike Messi Jerry Hanmeimei Wang #替换1个,替换所有
[root@localhost shell]# echo ${array[@]}
Allen Mike Messi Jerry Hanmeimei Wang
[root@localhost shell]# echo ${array[@]/e/E}
AllEn MikE MEssi JErry HanmEimei Wang
[root@localhost shell]# echo ${array[@]//e/E}
AllEn MikE MEssi JErry HanmEimEi Wang # 遍历数组
[root@localhost shell]# for a in ${array[@]};do echo $a;done
Allen
Mike
Messi
Jerry
Hanmeimei
Wang 计算横向和、纵向和 Allen
Mike
Zhang
Jerry
Han
Li [root@localhost shell]# awk -f stu.awk student.txt
Name Yuwen Math English Physical total
Allen
Mike
Zhang
Jerry
Han
Li
every_total
[root@localhost shell]# cat stu.awk
BEGIN{
printf "%-20s%-20s%-20s%-20s%-20s%-20s\n","Name","Yuwen","Math","English","Physical","total"
} { total=$+$+$+$
yunwen_sum+=$
math_sum+=$
english_sum+=$
physical_sum+=$
printf "%-20s%-20d%-20d%-20d%-20d%-20d\n",$,$,$,$,$,total
}
END{
printf "%-20s%-20d%-20d%-20d%-20d\n","every_total",yunwen_sum,math_sum,english_sum,physical_sum
} # 模拟生产环境数据脚本
[root@localhost shell]# cat insert.sh
#!/bin/bash
# function create_random()
{
min=$
max=$(($-$min+))
num=$(date +%s%N)
echo $(($num%$max+$min))
} INDEX= while true
do
for user in Allen Mike Jerry Tracy Hanmeimei Lilei
do
COUNT=$RANDOM
NUM1=`create_random $COUNT`
NUM2=`expr $COUNT - $NUM1`
echo "`date '+%Y-%m-%d %H:%M:%S'` $INDEX Batches: user:$user insert $COUNT records into datebase:product table:detail, insert $NUM1 records successfully,
failed $NUM2 records" >> ./db.log.`date +%Y%m%d`
INDEX=`expr $INDEX + `
done
done 数据格式如下:
db.log. -- :: Batches: user Jerry insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Tracy insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Hanmeimei insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Lilei insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Allen insert records into datebase:product table:detail, insert records successfully,failed records
... 、统计每个人分别插入了多少条record进数据库
输出结果示例:
Name totalrecords
allen
mike [root@localhost shell]# awk -f exam1.awk db.log.
User Total records
Jerry
Mike
Lilei
Hanmeimei
Tracy
Allen
[root@localhost shell]# cat exam1.awk
BEGIN{
printf "%-20s%-20s\n","User","Total records"
} {
USER[$]+=$
} END{
for(u in USER)
printf "%-20s%-20d\n",u,USER[u]
} 、统计每个人分别插入成功了多少record,失败了多少record 输出结果:
User Success_record failed_record
jerry success $
failed $ [root@localhost shell]# cat exam2.awk
BEGIN{
printf "%-30s%-30s%-30s\n","User","Success records","Failed records"
} {
SUCCESS[$]+=$
FAILED[$]+=$
} END{
for(u in SUCCESS)
printf "%-30s%-30d%-30d\n",u,SUCCESS[u],FAILED[u]
}
[root@localhost shell]# awk -f exam2.awk db.log.
User Success records Failed records
Jerry
Mike
Lilei
Hanmeimei
Tracy
Allen 、将例子1和例子2结合起来,一起输出,输出每个人分别插入多少条数据,多少成功,多少失败,并且要格式化输出,加上标题
输出结果:
User Total success failed
tracy
allen 代码:
[root@localhost shell]# cat exam3.awk
BEGIN{
printf "%-30s%-30s%-30s%-30s\n","Name","total records","success records","failed records"
} {
TOTAL_RECORDS[$]+=$
SUCCESS[$]+=$
FAILED[$]+=$
} END{
for(u in TOTAL_RECORDS)
printf "%-30s%-30d%-30d%-30d\n",u,TOTAL_RECORDS[u],SUCCESS[u],FAILED[u]
}
[root@localhost shell]# awk -f exam3.awk db.log.
Name total records success records failed records
Jerry
Mike
Lilei
Hanmeimei
Tracy
Allen 、在例子3的基础上,加上结尾,统计全部插入记录数,成功记录数,失败记录数
输出结果:
User Total success failed
tracy
allen 方法1:
[root@localhost shell]# cat exam4_b.awk
BEGIN{
printf "%-30s%-30s%-30s%-30s\n","Name","total records","success records","failed records"
} {
TOTAL_RECORDS[$]+=$
SUCCESS[$]+=$
FAILED[$]+=$
} END{
for(u in TOTAL_RECORDS)
{
# 在统计出的结果数组中进行累加
records_sum+=TOTAL_RECORDS[u]
success_sum+=SUCCESS[u]
failed_sum+=FAILED[u]
printf "%-30s%-30d%-30d%-30d\n",u,TOTAL_RECORDS[u],SUCCESS[u],FAILED[u]
} printf "%-30s%-30d%-30d%-30d\n","",records_sum,success_sum,failed_sum
}
[root@localhost shell]# awk -f exam4_b.awk db.log.
Name total records success records failed records
Jerry
Mike
Lilei
Hanmeimei
Tracy
Allen 方法2:
[root@localhost shell]# cat exam4.awk
BEGIN{
printf "%-30s%-30s%-30s%-30s\n","Name","total records","success records","failed records"
} {
RECORDS[$]+=$
SUCCESS[$]+=$
FAILED[$]+=$ # 在原始数据中进行汇总计算
records_sum+=$
success_sum+=$
failed_sum+=$
} END{
for(u in RECORDS)
printf "%-30s%-30d%-30d%-30d\n",u,RECORDS[u],SUCCESS[u],FAILED[u] printf "%-30s%-30d%-30d%-30d\n","total",records_sum,success_sum,failed_sum
}
[root@localhost shell]# awk -f exam4.awk db.log.
Name total records success records failed records
Jerry
Mike
Lilei
Hanmeimei
Tracy
Allen
total 、查找丢失数据的现象,也就是成功+失败的记录数不等于一共插入的记录数,找出这些数据并显示行号和对应行的日志信息
输出结果: 代码:
[root@localhost shell]# awk '{if($8!=$14+$17) print NR,$0}' db.log.
-- :: Batches: user Hanmeimei insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Mike insert records into datebase:product table:detail, insert records successfully,failed records # 写入文件的方式
[root@localhost shell]# awk -f exam5.awk db.log.
-- :: Batches: user Hanmeimei insert records into datebase:product table:detail, insert records successfully,failed records
-- :: Batches: user Mike insert records into datebase:product table:detail, insert records successfully,failed records
[root@localhost shell]# cat exam5.awk
BEGIN{
}
{
if($!=$+$)
print NR,$
}