一、基础用法
awk:报告生成工具;把文件中读取到的每一行的每个字段分别进行格式化,然后进行显示。
[Linux85]#awk -h Usage: awk [POSIX or GNU style options] -f progfile [--] file ... Usage: awk [POSIX or GNU style options] [--] ‘program‘ file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs #字段分隔符 -v var=val --assign=var=val -m[fr] val awk [options] ‘script‘ FILE ... awk [options] ‘/pattern/{action}‘ FILE ...
四种分隔符:
输入/输出
行分隔符:$
字段分隔符:空白
模式
地址定界 | /pattern1/,/pattern2/ |
/pattern/ | 可以 ! 取反 |
expression |
表达式;>, >=, <, <=, ==, !=, ~ |
BEGIN{} | 在遍历操作开始之前执行一次 |
END{} | 在遍历操作结束之后、命令退出之前执行一次 |
[Linux85]#awk ‘/^soul/{print $0}‘ /etc/passwd /etc/shadow /etc/group soul:x:501:501::/home/soul:/bin/bash soul:!!:16166:0:99999:7::: soul:x:501: [Linux85]#
#ID号大于等于500的用户 [Linux85]#awk -F : ‘$3>=500{print $1}‘ /etc/passwd nfsnobody gentoo soul [Linux85]#
BEGIN执行前操作 [Linux85]#awk -F : ‘BEGIN{print "UserName\n***********"}$3>=500{print $1}‘ /etc/passwd UserName *********** nfsnobody gentoo soul [Linux85]#
awk的内置变量:
NF | 字段数( The number of fields in the current input record.) |
FS | field separator,读取文本时,所使用字段分隔符 |
RS | Record separator,输入文本信息所使用的换行符; |
OFS | 输出时使用字段分隔符,默认为空白(output field separator) |
ORS | output record separator |
[Linux85]#awk -F : ‘/^soul/{print $1,$7}‘ /etc/passwd soul /bin/bash [Linux85]#awk ‘BEGIN{FS=":"}/^soul/{print $1,$7}‘ /etc/passwd soul /bin/bash [Linux85]#awk ‘BEGIN{FS=":";OFS=":"}/^soul/{print $1,$7}‘ /etc/passwd soul:/bin/bash [Linux85]#
[Linux85]#awk ‘!/^$|^#/{print $1}‘ /etc/sysctl.conf net.ipv4.ip_forward net.ipv4.conf.default.rp_filter net.ipv4.conf.default.accept_source_route kernel.sysrq kernel.core_uses_pid net.ipv4.tcp_syncookies net.bridge.bridge-nf-call-ip6tables net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-arptables kernel.msgmnb kernel.msgmax kernel.shmmax kernel.shmall [Linux85]#
[Linux85]#ifconfig | awk ‘/inet addr/{print $2}‘ | awk -F : ‘!/127/{print $2}‘ 172.16.251.85 [Linux85]#
二、awk的进阶使用
1、print输出:print item1, item2, ...
各项目之间使用逗号隔开,而输出时则以空白字符分隔;
输出的item可以为字符串或数值、当前记录的字段(如$1)、变量或awk的表达式;数值会先转换为字符串,而后再输出;
print命令后面的item可以省略,此时其功能相当于print $0, 因此,如果想输出空白行,则需要使用print "";
2、printf输出:printf format, item1, item2, ...
其与print命令的最大不同是,printf需要指定format;
format用于指定后面的每个item的输出格式;
printf语句不会自动打印换行符;\n
format格式的指示符都以%开头;后面跟一个字符;
%c | 显示字符的ASCII码; |
%d | %i | 十进制整数; |
%e | %E | 科学计数法显示数值; |
%f | 显示浮点数; |
%g | %G | 以科学计数法的格式或浮点数的格式显示数值; |
%s | 显示字符串; |
%u | 无符号整数; |
%% | 显示%自身; |
[Linux85]#awk ‘BEGIN{num1=20;num2=30; printf "%d %d\n",num1,num2}‘ 20 30 [Linux85]# #不显示item;只显示的是格式;格式对应的后面的变量;所以需要一一对应
修饰符
N | 显示宽度 |
- | 左对齐 |
+ | 显示数值符号;正负数 |
[Linux85]#awk -F: ‘{printf "%-14s %s\n",$1,$NF}‘ /etc/passwd root /bin/bash bin /sbin/nologin daemon /sbin/nologin adm /sbin/nologin lp /sbin/nologin sync /bin/sync
3、awk内置变量之数据变量
NR | The number of input records,awk命令所处理的记录数;如果有多个文件,这个数目会把处理的多个文件中行统一计数; |
NF | Number of Field,当前记录的field个数; |
FNR | 与NR不同的是,FNR用于记录正处理的行是当前这一文件中被总共处理的行数; |
ARGV | 数组,保存命令行本身这个字符串,如awk ‘{print $0}‘ a.txt b.txt这个命令中,ARGV[0]保存awk,ARGV[1]保存a.txt; |
ARGC | awk命令的参数的个数; |
FILENAME | awk命令所处理的文件的名称; |
ENVIROM | 当前shell环境变量及其值的关联数组; |
[Linux85]#awk ‘{print NR,$0}‘ 1.txt 1 one line 2 two line 3 three line 4 four line 5 five line [Linux85]#awk ‘{print NR,$0}‘ 2.txt 1 six line 2 seven line 3 eight line 4 nine line 5 ten line [Linux85]#awk ‘{print NR,$0}‘ 1.txt 2.txt 1 one line 2 two line 3 three line 4 four line 5 five line 6 six line 7 seven line 8 eight line 9 nine line 10 ten line [Linux85]# # [Linux85]#awk ‘{print FNR,$0}‘ 1.txt 2.txt 1 one line 2 two line 3 three line 4 four line 5 five line 1 six line 2 seven line 3 eight line 4 nine line 5 ten line [Linux85]#
[Linux85]#awk -F: ‘/root/{print $1,"is a user in",ARGV[1]}‘ /etc/passwd root is a user in /etc/passwd operator is a user in /etc/passwd [Linux85]#
[Linux85]#awk ‘BEGIN{print ARGC}‘ /etc/passwd /etc/group /etc/shadow 4 [Linux85]# # ‘BEGIN{print ARGC}‘本身也当成一个参数
[Linux85]#awk ‘{print $0,"in", FILENAME}‘ 1.txt 2.txt one line in 1.txt two line in 1.txt three line in 1.txt four line in 1.txt five line in 1.txt six line in 2.txt seven line in 2.txt eight line in 2.txt nine line in 2.txt ten line in 2.txt [Linux85]#
4、输出重定向
print items > output-file
print items >> output-file
print items | command
特殊文件描述符:
/dev/stdin:标准输入
/dev/sdtout: 标准输出
/dev/stderr: 错误输出
/dev/fd/N: 某特定文件描述符,如/dev/stdin就相当于/dev/fd/0;
5、awk的操作符
算术操作符 |
赋值操作符 | 比较操作符 |
-x:负值 | =:应[=] | x < y True if x is less than y. |
+x:转换为数值 | += | x <= y True if x is less than or equal to y. |
x^y:次方 | -= | x > y True if x is greater than y. |
x**y:次方 | *= |
x >= y True if x is greater than or equal to y. |
x*y | /= | x == y True if x is equal to y. |
x/y | %= | x != y True if x is not equal to y. |
x+y | ^= | x ~ y True if the string x matches the regexp denoted by y. |
x-y | **= | x !~ y True if the string x does not match the regexp denoted by y. |
x%y | ++ | subscript in array True if the array array has an element with the subscript subscript. |
-- |
awk中;任何非0值或非空字符串都为真;反之为假。
条件表达式:
select?if-true-exp:if-false-exp
6、模式和常见的模式类型
模式:
awk ‘program‘ input-file1 input-file2 ...
program:
pattern { action }
pattern { action }
....
常见的模式:
Regexp | 正则表达式,格式为/regular expression/ |
expresssion | 表达式,其值非0或为非空字符时满足条件,如:$1 ~ /foo/ 或 $1 == "soul",用运算符~(匹配)和!~(不匹配)。 |
Ranges | 指定的匹配范围,格式为pat1,pat2 |
BEGIN/END | 特殊模式,仅在awk命令执行前运行一次或结束前运行一次 |
Empty(空模式) | 匹配任意输入行; |
常见的Action
Expressions
Control statements
Compound statements
Input statements
Output statements
7、控制语句
if-else
语法:if (condition) {then-body} else {[ else-body ]}
[Linux85]#awk -F : ‘BEGIN{OFS=":"}{if ($3==0) {print $1,"Administrator";} else {print $1,"Common User"}}‘ /etc/passwd root:Administrator bin:Common User daemon:Common User adm:Common User lp:Common User sync:Common User shutdown:Common User
[Linux85]#awk -F: ‘{if ($1=="root") printf "%-15s: %s\n",$1,"Admin";else printf "%-15s: %s\n",$1,"Common User"}‘ /etc/passwd root : Admin bin : Common User daemon : Common User adm : Common User lp : Common User sync : Common User shutdown : Common User halt : Common User mail : Common User uucp : Common User operator : Common User games : Common User gopher : Common User ftp : Common User nobody : Common User dbus : Common User usbmuxd : Common User
[Linux85]#awk -F: -v sum=0 ‘{if ($3>=500) sum++}END{print sum}‘ /etc/passwd 3 [Linux85]#统计uid>=500的用户个数
while
语法:while (condition){statement1; statment2; ...}
[Linux85]#awk -F : ‘{i=1;while (i<=3) {print $i;i++}}‘ /etc/passwd root x 0 bin x 1 #打印出/etc/passwd前三个字段
[Linux85]#awk -F: ‘{i=1;while (i<=NF) { if (length($i)>=4) {print $i}; i++ }}‘ /etc/passwd root root /root /bin/bash /bin /sbin/nologin
do-while 至少执行一次循环体,不管条件满足与否
语法:do {statement1, statement2, ...} while (condition)
[Linux85]#awk -F: ‘{i=1;do {print $i;i++}while(i<=3)}‘ /etc/passwd root x 0 bin x 1 daemon x 2
[Linux85]#awk -F: ‘{i=4;do {print $i;i--}while(i>4)}‘ /etc/passwd 0 1 2 4 7 0 0 0 12
for
语法:for (variable assignment; condition; iteration process) {statement1, statement2, ...}
[Linux85]#awk -F: ‘{for(i=1;i<=3;i++) if (i<3){printf "%s:",$i} print $i}‘ /etc/passwd root:x:0 bin:x:1 daemon:x:2 adm:x:4 lp:x:7 sync:x:0 shutdown:x:0
for循环遍历数组元素
语法: for (i in array) {statement1, statement2, ...}
[Linux85]#awk -F: ‘$NF!~/^$/{BASH[$NF]++}END{for(A in BASH){printf "%15s:%i\n",A,BASH[A]}}‘ /etc/passwd /sbin/shutdown:1 /bin/csh:1 /bin/bash:2 /sbin/nologin:29 /sbin/halt:1 /bin/sync:1 [Linux85]# #统计最后一个字段出现的次数
-
case
语法:switch (expression) { case VALUE or /REGEXP/: statement1, statement2,... default: statement1, ...}
break 和 continue
-
next
提前结束对本行文本的处理,并接着处理下一行;
[Linux85]#awk -F: ‘{if($3%2==0) next;print $1,$3}‘ /etc/passwd bin 1 adm 3 sync 5 halt 7 operator 11 gopher 13 nobody 99 dbus 81 usbmuxd 113 vcsa 69 rtkit 499 abrt 173 postfix 89 rpcuser 29 pulse 497 soul 501 [Linux85]#
8、数组
array[index-expression]
index-expression可以使用任意字符串;需要注意的是,如果某数据组元素事先不存在,那么在引用其时,awk会自动创建此元素并初始化为空串;因此,要判断某数据组中是否存在某元素,需要使用index in array的方式。
要遍历数组中的每一个元素,需要使用如下的特殊结构:
for (var in array) { statement1, ... }
其中,var用于引用数组下标,而不是元素值;
删除数组中的变量:delete array[index]
[Linux85]#netstat -ant | awk ‘/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}‘ ESTABLISHED 2 LISTEN 10 [Linux85]#
9、awk的内置函数
-
split(string, array [, fieldsep [, seps ] ])
将string表示的字符串以fieldsep为分隔符进行分隔,并将分隔后的结果保存至array为名的数组中;数组下标为从1开始的序列;
[Linux85]#df -lh | awk ‘!/^File/{split($5,percent,"%");if(percent[1]>=10){print $1}}‘ /dev/sda1 /dev/mapper/vg0-usr [Linux85]# #磁盘使用率大于等于%10的显示出来
length([string]):返回string字符串中字符的个数;
[Linux85]#awk -F: ‘{for(i=1;i<=NF;i++) { if (length($i)>=4) {print $i}}}‘ /etc/passwd root root /root /bin/bash /bin /sbin/nologin daemon daemon /sbin /sbin/nologin
-
substr(string, start [, length ])
取string字符串中的子串,从start开始,取length个;start从1开始计数;
system(command):执行系统command并将结果返回至awk命令
systime():取系统当前时间
tolower(s):将s中的所有字母转为小写
toupper(s):将s中的所有字母转为大写
10、用户自定义函数
自定义函数使用function关键字。格式如下:
function F_NAME([variable])
{
statements
}
example:
#统计当前系统上每个客户端IP的连接中状处于ESTABLISHED的连接态的个数; [Linux85]#netstat -tn | awk ‘/ESTABLISHED\>/{split($5,ip,":");num[ip[1]]++}END{for (i in num) printf "%s %d\n", i, num[i]}‘ 172.16.254.28 2 [Linux85]#
#统计ps aux命令执行时,当前系统上各状态的进程的个数; [Linux85]#ps aux | awk ‘!/^USER/{state[$8]++}END{for (i in state) printf "%s %d\n",i,state[i]}‘ S< 2 S<sl 1 Ss 18 SN 1 S 69 Ss+ 6 Ssl 2 R+ 1 S+ 2 Sl 2 S<s 1 [Linux85]#
#统计ps aux命令执行时,当前系统上各用户的进程的个数; [Linux85]#ps aux | awk ‘!/^USER/{state[$1]++}END{for (i in state) printf "%s %d\n",i,state[i]}‘ rpc 1 dbus 1 68 2 postfix 2 rpcuser 1 root 96 gentoo 2 [Linux85]#
#显示ps aux命令执行时,当前系统上其VSZ(虚拟内存集)大于10000的进程及其PID; [Linux85]#ps aux | awk ‘!/USER/{if($5>10000) print $2,$11}‘ 1 /sbin/init 397 /sbin/udevd 1184 auditd 1209 /sbin/rsyslogd 1251 rpcbind 1282 dbus-daemon 1292 NetworkManager 1297 /usr/sbin/modem-manager 1311 rpc.statd 1344 cupsd 1354 /usr/sbin/wpa_supplicant 1392 hald
本文出自 “Soul” 博客,请务必保留此出处http://chenpipi.blog.51cto.com/8563610/1391178