recover.panic.defer.2021.03.03

Defer, Panic, and Recover

在 Go 语言中,recover 和 panic 的关系是什么?

我们先看一个基础的例子,在 main 方法体中启动一个协程,在协程内部主动调用 panic。程序的执行会被中断了,但有个疑问,为什么在别的协程里调用了 panic,要让 main 协程也退出呢?

func main() {
	go func() {
		panic("call panic")
	}()

	for{}
}

针对这种情况,我们引入 recover 方法。这里故意写了一段错误的代码,代码如下,运行的结果会怎么样呢?能 recover 住 panic 吗?

程序执行还是被中断了,recover 并没有起作用。因为 recover 没有写在 defer 函数里。实际上,recover 和 defer 联用,并且不跨协程,才能真正的拦截 panic 事件。

func main() {
	go func() {
	    
	    // 追加的代码
		if r := recover(); r != nil {
			fmt.Println(r)
		}
		
		panic("call panic")
	}()

	for{}
}

正确的写法如下。这里描述的内容在 Go 博客Defer, Panic, and Recover 有详细解释。

func main() {
	go func() {
		defer func() {
			if r := recover(); r != nil {
				fmt.Println(r)
			}
		}()

		panic("call panic")
	}()

	fmt.Println("come on")
}

Panic 和 Recover 的联系

在 panic 的过程中, panic 传入的参数用来作为 recover 函数的返回。

下面的例子中,声明了一个 inner 类型的结构体。panic 的时候,我们指定的入参是一个 inner 结构体变量,inner 的 Msg 成员值为 Thank。然后,我们对 recover 的返回做断言处理(因为返回类型为 interface),直接断言它为 inner 值类型。

工作中,我们经常遇到的切片下标越界,go 在处理到这种类型的 panic 时,默认传递的就是 runtime 包下的 boundsError(A boundsError represents an indexing or slicing operation gone wrong.)。

type inner struct {
	Msg string
}

func main() {

	defer func() {
		if r := recover(); r != nil {
			fmt.Print(r.(inner))
		}
	}()

	panic(inner{Msg: "Thank"})
}

panic 嵌套

当程序 panic 之后,调用 defer 函数时又触发了程序再次 panic。在程序的错误栈输出信息中,三处 panic 的错误信息都输出了。

我们不使用任何 recover ,查看 panic 的输出信息。从代码末尾的注释中可以发现,三个 panic 都触发了,而且输出中也包含了三个 panic 的信息。

func main() {
    go func() {

        // defer 1
        defer func() {

            // defer 2
            defer func() {
                panic("call panic 3")
            }()

            panic("call panic 2")
        }()

        panic("call panic 1")
    }()

    for{}
}

//output:
//panic: call panic 1
//        panic: call panic 2
//        panic: call panic 3
//
//goroutine 18 [running]:
//main.main.func1.1.1()
//        /Users/fuhui/Desktop/panic/main.go:10 +0x39

接下来,我们代码做 recover 处理,观察程序的输出情况。上面的示例中,程序依次触发了 panic 1、2、3。现在我们修改代码,对 panic 3 做捕获处理,程序还会继续 panic 吗?

我们在代码中又嵌套追加了第三个 defer,对 panic 3 进行捕获。从代码的输出结果中,我们可以发现,代码还是 panic 了。

虽然我们还不了解具体的实现,但至少我们可以明白:Go 程序中的 panic 都需要被 recover 处理掉,才不会触发程序终止。如果只处理链路中的最后一个,程序还是会异常终止。

我们稍作调整,在 defer 3 中再写三个 recover 语句可行吗?这样也是不可行的,defer、panic、recover 需要是一体的,大家可以自行验证。

func main() {
    go func() {

        // defer 1
        defer func() {

            // defer 2
            defer func() {

                // defer 3
                defer func() {
                    if r := recover(); r != nil{
                        fmt.Println("recover", r)
                    }
                }()

                panic("call panic 3")
            }()

            panic("call panic 2")
        }()

        panic("call panic 1")
    }()

    for{}
}

//output:
//recover panic 3
//panic: call panic 1
//        panic: call panic 2
//
//goroutine 18 [running]:

源码

Go 源码版本

确定 Go 源码的版本

➜  server go version
go version go1.15.1 darwin/amd64

gopanic

我们来看 panic 的类型结构:

arg 作为 panic 是的入参,对应我们调用 panic 函数是的入参。在后续 recover 的时候会返回这个参数。

link 作为一个 _panic 类型指针,通过这个类型,可以说明:在 Goroutine 内部 _panic 是按照链表的结构存储的。在一个 goroutine 内,可能会出现多个 panic,但这些 panic 信息都会被存储。

// A _panic holds information about an active panic.
//
// This is marked go:notinheap because _panic values must only ever
// live on the stack.
//
// The argp and link fields are stack pointers, but don't need special
// handling during stack growth: because they are pointer-typed and
// _panic values only live on the stack, regular stack pointer
// adjustment takes care of them.
//
//go:notinheap
type _panic struct {
	argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink
	arg       interface{}    // argument to panic
	link      *_panic        // link to earlier panic
	pc        uintptr        // where to return to in runtime if this panic is bypassed
	sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed
	recovered bool           // whether this panic is over
	aborted   bool           // the panic was aborted
	goexit    bool
}

gopanic 方法体代码比较长,我们直接在注释中对它进行标注和分析

// The implementation of the predeclared function panic.
func gopanic(e interface{}) {
	gp := getg()
	if gp.m.curg != gp {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic on system stack")
	}

	if gp.m.mallocing != 0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic during malloc")
	}
	if gp.m.preemptoff != "" {
		print("panic: ")
		printany(e)
		print("\n")
		print("preempt off reason: ")
		print(gp.m.preemptoff)
		print("\n")
		throw("panic during preemptoff")
	}
	if gp.m.locks != 0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic holding locks")
	}
    
    // 创建了这个 panic 对象,将这个 panic 对象的 link 指针指向当前 goroutine 的 _panic 列表
    // 说白了就是一个链表操作,将当前 panic 插入到当前 goroutine panic 链表的首位置
	var p _panic
	p.arg = e
	p.link = gp._panic
	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

	atomic.Xadd(&runningPanicDefers, 1)

	// By calculating getcallerpc/getcallersp here, we avoid scanning the
	// gopanic frame (stack scanning is slow...)
	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))

	for {
	    
	    // 循环获取 gp 的 defer,这里不展开,但 _defer 也是跟 _panic 一样按照链表结构进行存储的。
		d := gp._defer
		if d == nil {
			break
		}

		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
		// take defer off list. An earlier panic will not continue running, but we will make sure below that an
		// earlier Goexit does continue running.
		if d.started {
			if d._panic != nil {
				d._panic.aborted = true
			}
			d._panic = nil
			if !d.openDefer {
				// For open-coded defers, we need to process the
				// defer again, in case there are any other defers
				// to call in the frame (not including the defer
				// call that caused the panic).
				d.fn = nil
				gp._defer = d.link
				freedefer(d)
				continue
			}
		}

		// Mark defer as started, but keep on list, so that traceback
		// can find and update the defer's argument frame if stack growth
		// or a garbage collection happens before reflectcall starts executing d.fn.
		d.started = true

		// Record the panic that is running the defer.
		// If there is a new panic during the deferred call, that panic
		// will find d in the list and will mark d._panic (this panic) aborted.
		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

		done := true
		if d.openDefer {
			done = runOpenDeferFrame(gp, d)
			if done && !d._panic.recovered {
				addOneOpenDeferFrame(gp, 0, nil)
			}
		} else {
			p.argp = unsafe.Pointer(getargp(0))
			reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
		}
		p.argp = nil

		// reflectcall did not panic. Remove d.
		if gp._defer != d {
			throw("bad defer entry in panic")
		}
		d._panic = nil

		// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
		//GC()

		pc := d.pc
		sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
		if done {
			d.fn = nil
			gp._defer = d.link
			freedefer(d)
		}
		if p.recovered {
			gp._panic = p.link
			if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
				// A normal recover would bypass/abort the Goexit.  Instead,
				// we return to the processing loop of the Goexit.
				gp.sigcode0 = uintptr(gp._panic.sp)
				gp.sigcode1 = uintptr(gp._panic.pc)
				mcall(recovery)
				throw("bypassed recovery failed") // mcall should not return
			}
			atomic.Xadd(&runningPanicDefers, -1)

			if done {
				// Remove any remaining non-started, open-coded
				// defer entries after a recover, since the
				// corresponding defers will be executed normally
				// (inline). Any such entry will become stale once
				// we run the corresponding defers inline and exit
				// the associated stack frame.
				d := gp._defer
				var prev *_defer
				for d != nil {
					if d.openDefer {
						if d.started {
							// This defer is started but we
							// are in the middle of a
							// defer-panic-recover inside of
							// it, so don't remove it or any
							// further defer entries
							break
						}
						if prev == nil {
							gp._defer = d.link
						} else {
							prev.link = d.link
						}
						newd := d.link
						freedefer(d)
						d = newd
					} else {
						prev = d
						d = d.link
					}
				}
			}

			gp._panic = p.link
			// Aborted panics are marked but remain on the g.panic list.
			// Remove them from the list.
			for gp._panic != nil && gp._panic.aborted {
				gp._panic = gp._panic.link
			}
			if gp._panic == nil { // must be done with signal
				gp.sig = 0
			}
			// Pass information about recovering frame to recovery.
			gp.sigcode0 = uintptr(sp)
			gp.sigcode1 = pc
			mcall(recovery)
			throw("recovery failed") // mcall should not return
		}
	}

	// ran out of deferred calls - old-school panic now
	// Because it is unsafe to call arbitrary user code after freezing
	// the world, we call preprintpanics to invoke all necessary Error
	// and String methods to prepare the panic strings before startpanic.
	preprintpanics(gp._panic)

	fatalpanic(gp._panic) // should not return
	*(*int)(nil) = 0      // not reached
}

gorecover

源码中的 getg() 方法返回当前的 goroutine,之后是获取当前 Go 的 panic 信息。紧接着 if 判断,如果条件符合的话,将这个 panic 对象的 recovered 属性设置为 true,也就是标记为被处理了,并返回的是这个 panic 的参数。如果 if 条件不满足的话,表示没有 panic 对象被捕获,返回空。

// The implementation of the predeclared function recover.
// Cannot split the stack because it needs to reliably
// find the stack segment of its caller.
//
// TODO(rsc): Once we commit to CopyStackAlways,
// this doesn't need to be nosplit.
//go:nosplit
func gorecover(argp uintptr) interface{} {
	// Must be in a function running as part of a deferred call during the panic.
	// Must be called from the topmost function of the call
	// (the function used in the defer statement).
	// p.argp is the argument pointer of that topmost deferred function call.
	// Compare against argp reported by caller.
	// If they match, the caller is the one who can recover.
	gp := getg()
	p := gp._panic
	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
		p.recovered = true
		return p.arg
	}
	return nil
}

注:recover函数捕获的是祖父一级调用函数栈的异常。必须要和有异常的栈帧只隔一个栈帧,recover函数才能正捕获异常。

上一篇:ArcEngine GP栅格计算器以及表达式的写法描述


下一篇:Greenplum运维管理学习