Linux中动态探针kprobes

  Kprobes 是 Linux 中的轻量级装置,可以将断点插入到正在运行的内核之中。Kprobes 可以地收集处理器寄存器和全局数据结构等调试信息。甚至可以使用 Kprobes 来修改 寄存器值和全局数据结构的值。

Kprobes 向运行的内核中给定地址写入断点指令,插入一个探测器。 执行被探测的指令会导致断点错误。Kprobes 钩住(hook in)断点处理器并收集调试信息。Kprobes 甚至可以单步执行被探测的指令。

内核编译中开启CONFIG_KPROBE_EVENTS=y.即可动态添加kprobe

 

1.   工作原理

用户指定一个探测点,并把一个用户定义的处理函数关联到该探测点,当内核执行到该探测点时,相应的关联函数被执行,然后继续执行正常的代码路径。

kprobe实现了三种类型的探测点: kprobes, jprobes和kretprobes (也叫返回探测点)。 kprobes是可以被插入到内核的任何指令位置的探测点,jprobes则只能被插入到一个内核函数的入口,而kretprobes则是在指定的内核函数返回时才被执行。

l   安装一个kprobes探测点,kprobe先备份被探测的指令,然后使用断点指令来取代被探测指令的头一个或几个字节。

l   当执行到探测点时,将因运行断点指令而执行trap操作,保存CPU的寄存器,调用相应的trap处理函数。

l   trap处理函数将调用相应的notifier_call_chain中注册的所有notifier函数,kprobe正是通过向trap对应的notifier_call_chain注册关联到探测点的处理函数来实现探测处理的。

l   首先执行关联到探测点的pre_handler函数,并把相应的kprobe struct和保存的寄存器作为该函数的参数,最后kprobe执行post_handler。等所有这些运行完毕后,最后紧跟在被探测指令后的指令流。

如下图:

Linux中动态探针kprobes

 

2.   kprobe初始化

kprobes作为一个模块,其初始化函数为init_kprobes,代码路径kernel/kprobes.c

 

Linux中动态探针kprobes

3.   通过ftrace接口使用

可以通过

/sys/kernel/debug/tracing/kprobe_events,

并使能

/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enabled.

语法

事件的语法如下:

p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]  : Set a probe

 r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe

 -:[GRP/]EVENT                                         : Clear a probe

 

GRP            : Group name. If omitted, use "kprobes" for it.

EVENT          : Event name. If omitted, the event name is generated

                 based on SYM+offs or MEMADDR.

MOD            : Module name which has given SYM.

SYM[+offs]     : Symbol+offset where the probe is inserted.

MEMADDR        : Address where the probe is inserted.

MAXACTIVE      : Maximum number of instances of the specified function that

                 can be probed simultaneously, or 0 for the default value

                 as defined in Documentation/kprobes.txt section 1.3.1.

 

FETCHARGS      : Arguments. Each probe can have up to 128 args.

 %REG          : Fetch register REG

 @ADDR         : Fetch memory at ADDR (ADDR should be in kernel)

 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)

 $stackN       : Fetch Nth entry of stack (N >= 0)

 $stack        : Fetch stack address.

 $retval       : Fetch return value.(*)

 $comm         : Fetch current task comm.

 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)

 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.

 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types

                 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types

                 (x8/x16/x32/x64), "string" and bitfield are supported.

 

 (*) only for return probe.

 (**) this is useful for fetching a field of data structures.

增加kprobe事件

例如,增加一个新的事件do_sys_open。

echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events

可以查看文件:

# cat /sys/kernel/debug/tracing/kprobe_events

p:kprobes/myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)

查看内核源码发现do_sys_open定义如下:

long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)

所以说,dfd=%ax filename=%dx flags=%cx mode=+4($stack)是do_sys_open的参数,因为kprobe是在函数入口处,

增加kretprobe事件

定义一个返回出的事件如下:

echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events

继续查看,发现有两个事件:

# cat /sys/kernel/debug/tracing/kprobe_events

p:kprobes/myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)

r:kprobes/myretprobe do_sys_open arg1=$retval

查看格式

关于所定义事件的格式,可以通过如下查看

# cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format

name: myprobe

ID: 1844

format:

      field:unsigned short common_type; offset:0;  size:2;    signed:0;

      field:unsigned char common_flags; offset:2;  size:1;    signed:0;

      field:unsigned char common_preempt_count;    offset:3;  size:1;   signed:0;

      field:int common_pid; offset:4;  size:4;    signed:1;

 

      field:unsigned long __probe_ip;   offset:8;  size:8;    signed:0;

      field:u64 dfd;   offset:16; size:8;    signed:0;

      field:u64 filename;   offset:24; size:8;    signed:0;

      field:u64 flags; offset:32; size:8;    signed:0;

      field:u64 mode;  offset:40; size:8;    signed:0;

 

print fmt: "(%lx) dfd=0x%Lx filename=0x%Lx flags=0x%Lx mode=0x%Lx", REC->__probe_ip, REC->dfd, REC->filename, REC->flags, REC->mode

可以看到有四个参数。

使能事件跟踪

定义事件后,默认是关闭的。如果要使能,命令如下:

echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable

echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable

使能之后,就可以查看事件

#cat /sys/kernel/debug/tracing/trace

#                              _-----=> irqs-off

#                             / _----=> need-resched

#                            | / _---=> hardirq/softirq

#                            || / _--=> preempt-depth

#                            ||| /     delay

#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION

#              | |       |   ||||       |         |

            bash-4024  [001] ....  7507.712770: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x8241 flags=0x1b6 mode=0xffffffff

             awk-5016  [000] ....  7508.140821: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0x1 mode=0xffffffff

             awk-5016  [000] d...  7508.140829: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3

             awk-5016  [000] ....  7508.140851: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff

             awk-5016  [000] d...  7508.140856: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3

             awk-5016  [000] ....  7508.140908: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff

             awk-5016  [000] d...  7508.140913: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3

             awk-5016  [000] ....  7508.140962: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0xe148 mode=0xffffffff

             awk-5016  [000] d...  7508.140966: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3

             awk-5016  [000] ....  7508.141351: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x88000 flags=0x768 mode=0xffffffff

             awk-5016  [000] d...  7508.141357: myretprobe: (do_syscall_64+0x6e/0x1a0 <- do_sys_open) arg1=0x3

             awk-5016  [000] ....  7508.141451: myprobe: (do_sys_open+0x0/0x210) dfd=0x2 filename=0x8000 flags=0x0 mode=0xffffffff

       每一行表示事件发生,其中<-符号表示从哪里返回。

 

清空kprobe事件

echo 0 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable

echo 0 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable

命令如下:

echo -:myprobe >> kprobe_events

4.   内核模块方式

使用代码如下:

/*

 * NOTE: This example is works on x86.

 * Here's a sample kernel module showing the use of kprobes to dump a

 * stack trace and selected registers when do_fork() is called.

 *

 * For more information on theory of operation of kprobes, see

 * Documentation/kprobes.txt

 *

 * You will see the trace data in /var/log/messages and on the console

 * whenever do_fork() is invoked to create a new process.

 */ 

 

#include <linux/kernel.h> 

#include <linux/module.h> 

#include <linux/kprobes.h> 

 

/* For each probe you need to allocate a kprobe structure */ 

static struct kprobe kp = { 

    .symbol_name    = "do_fork", 

}; 

 

/* kprobe pre_handler: called just before the probed instruction is executed */ 

static int handler_pre(struct kprobe *p, struct pt_regs *regs) 

#ifdef CONFIG_X86 

    printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx," 

            " flags = 0x%lx\n", 

        p->addr, regs->ip, regs->flags); 

#endif 

#ifdef CONFIG_PPC 

    printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx," 

            " msr = 0x%lx\n", 

        p->addr, regs->nip, regs->msr); 

#endif 

#ifdef CONFIG_MIPS 

    printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx," 

            " status = 0x%lx\n", 

        p->addr, regs->cp0_epc, regs->cp0_status); 

#endif 

 

    /* A dump_stack() here will give a stack backtrace */ 

    return 0; 

 

/* kprobe post_handler: called after the probed instruction is executed */ 

static void handler_post(struct kprobe *p, struct pt_regs *regs, 

                unsigned long flags) 

#ifdef CONFIG_X86 

    printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n", 

        p->addr, regs->flags); 

#endif 

#ifdef CONFIG_PPC 

    printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n", 

        p->addr, regs->msr); 

#endif 

#ifdef CONFIG_MIPS 

    printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n", 

        p->addr, regs->cp0_status); 

#endif 

 

/*

 * fault_handler: this is called if an exception is generated for any

 * instruction within the pre- or post-handler, or when Kprobes

 * single-steps the probed instruction.

 */ 

static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) 

    printk(KERN_INFO "fault_handler: p->addr = 0x%p, trap #%dn", 

        p->addr, trapnr); 

    /* Return 0 because we don't handle the fault. */ 

    return 0; 

 

static int __init kprobe_init(void) 

    int ret; 

    kp.pre_handler = handler_pre; 

    kp.post_handler = handler_post; 

    kp.fault_handler = handler_fault; 

 

    ret = register_kprobe(&kp); 

    if (ret < 0) { 

        printk(KERN_INFO "register_kprobe failed, returned %d\n", ret); 

        return ret; 

    } 

    printk(KERN_INFO "Planted kprobe at %p\n", kp.addr); 

    return 0; 

 

static void __exit kprobe_exit(void) 

    unregister_kprobe(&kp); 

    printk(KERN_INFO "kprobe at %p unregistered\n", kp.addr); 

 

module_init(kprobe_init) 

module_exit(kprobe_exit) 

MODULE_LICENSE("GPL"); 

添加Makefile如下:

obj-m := pr.o

CROSS_COMPILE=''

KDIR := /lib/modules/`uname -r`/build

PWD := $(shell pwd)

default:

    make -C $(KDIR) M=$(PWD) modules

clean:

   rm -rf *.o .* .cmd *.ko *.mod.c .tmp_versions

       然后加载pr.ko文件后,可以通过dmesg命令查看相关输出。

 

5.   参考

https://blog.csdn.net/luckyapple1028/article/details/52972315

Documentation/kprobes.txt

Documentation/trace/ftrace.txt

 

 

 

上一篇:Linux内核中的锁——知识点


下一篇:linux开发调试环境下的内核配置