探究retransformClasses致使JVM Metaspace OOM的问题

前言

本文深入分析Arthas 3.3.0~3.4.1版本trace大方法可能导致JVM Metaspace OOM的问题。通过分析trace命令生成的增强字节码及调试分析JVM retransformClasses的处理过程,找出发生Metaspace OOM的原因,然后给出解决办法。

问题说明

构造了一个测试的大方法demo.BigMethod250.test(),大约包含500个方法调用,250多个字符串常量。

package demo;

public class BigMethod250
{
   public static void test()
   {
      final String someString = "Dustin";

      if (someString == null || someString.isEmpty())
      {
         print("The String is null or empty.");
      }
      else if (someString.equals("a0"))
      {
         print("You found me!");
      }
      else if (someString.equals("a1"))
      {
         print("You found me!");
      }
      else if (someString.equals("a2"))
      {
         print("You found me!");
      }
      ...
      else if (someString.equals("a249"))
      {
         print("You found me!");
      }
      else if (someString.equals("a250"))
      {
         print("You found me!");
      }
      else
      {
         print("No matching string found.");
      }
}

使用JDK版本为11.0.7,对比测试3.2.0及3.4.1版本,测试的命令为:trace demo.BigMethod250 test

1) Arthas 3.2.0版本执行命令的Metaspace变化

注入启动Arthas 3.2.0后metaspace为20.8MB:

% jcmd arthas-demo VM.metaspace
14780:

Total Usage ( 134 loaders):

  Non-Class:  474 chunks,     18.17 MB capacity,    17.91 MB ( 99%) used,   228.27 KB (  1%) free,   488 bytes ( <1%) waste,    29.62 KB ( <1%) overhead, deallocated: 153 blocks with 80.33 KB
      Class:  199 chunks,      2.28 MB capacity,     2.19 MB ( 96%) used,    76.82 KB (  3%) free,    16 bytes ( <1%) waste,    12.44 KB ( <1%) overhead, deallocated: 64 blocks with 30.07 KB
       Both:  673 chunks,     20.44 MB capacity,    20.10 MB ( 98%) used,   305.09 KB (  1%) free,   504 bytes ( <1%) waste,    42.06 KB ( <1%) overhead, deallocated: 217 blocks with 110.40 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      18.42 MB ( 92%) committed
      Class space:     1016.00 MB reserved,       2.38 MB ( <1%) committed
             Both:        1.01 GB reserved,      20.80 MB (  2%) committed
.......

执行trace命令成功,metaspace为21.5MB,增加了不到1MB:

[arthas@14780]$ trace demo.BigMethod250 test
Press Q or Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 129 ms.
% jcmd arthas-demo VM.metaspace
14780:

Total Usage ( 134 loaders):

  Non-Class:  486 chunks,     18.92 MB capacity,    18.67 MB ( 99%) used,   225.09 KB (  1%) free,   488 bytes ( <1%) waste,    30.38 KB ( <1%) overhead, deallocated: 338 blocks with 83.47 KB
      Class:  202 chunks,      2.34 MB capacity,     2.24 MB ( 96%) used,    87.20 KB (  4%) free,    16 bytes ( <1%) waste,    12.62 KB ( <1%) overhead, deallocated: 67 blocks with 31.26 KB
       Both:  688 chunks,     21.26 MB capacity,    20.91 MB ( 98%) used,   312.29 KB (  1%) free,   504 bytes ( <1%) waste,    43.00 KB ( <1%) overhead, deallocated: 405 blocks with 114.73 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      19.17 MB ( 96%) committed
      Class space:     1016.00 MB reserved,       2.38 MB ( <1%) committed
             Both:        1.01 GB reserved,      21.55 MB (  2%) committed
......

2)Arthas 3.4.1版本执行命令的Metaspace变化

注入Arthas后,执行trace命令前metaspace为21MB:

% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 134 loaders):

  Non-Class:  476 chunks,     18.42 MB capacity,    18.16 MB ( 99%) used,   226.62 KB (  1%) free,   504 bytes ( <1%) waste,    29.75 KB ( <1%) overhead, deallocated: 473 blocks with 136.73 KB
      Class:  201 chunks,      2.34 MB capacity,     2.26 MB ( 97%) used,    69.33 KB (  3%) free,     0 bytes (  0%) waste,    12.56 KB ( <1%) overhead, deallocated: 65 blocks with 28.10 KB
       Both:  677 chunks,     20.75 MB capacity,    20.42 MB ( 98%) used,   295.95 KB (  1%) free,   504 bytes ( <1%) waste,    42.31 KB ( <1%) overhead, deallocated: 538 blocks with 164.84 KB

Virtual space:
  Non-class space:       20.00 MB reserved,      18.67 MB ( 93%) committed
      Class space:      492.00 MB reserved,       2.38 MB ( <1%) committed
             Both:      512.00 MB reserved,      21.05 MB (  4%) committed
......

执行trace命令失败,metaspace增加到462MB,增加了421MB:

[arthas@15090]$ trace demo.BigMethod250 test
Affect(class count: 1 , method count: 1) cost in 10144 ms, listenerId: 1
Enhance error! exception: java.lang.InternalError
error happens when enhancing class: null, check arthas log: /Users/xxx/logs/arthas/arthas.log
% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 140 loaders):

  Non-Class: 1751 chunks,    449.91 MB capacity,   449.35 MB (>99%) used,   234.66 KB ( <1%) free,   229.11 KB ( <1%) waste,   109.44 KB ( <1%) overhead, deallocated: 2558 blocks with 340.38 MB
      Class:  212 chunks,      2.47 MB capacity,     2.40 MB ( 97%) used,    60.89 KB (  2%) free,     0 bytes (  0%) waste,    13.25 KB ( <1%) overhead, deallocated: 70 blocks with 29.30 KB
       Both: 1963 chunks,    452.38 MB capacity,   451.74 MB (>99%) used,   295.55 KB ( <1%) free,   229.11 KB ( <1%) waste,   122.69 KB ( <1%) overhead, deallocated: 2628 blocks with 340.41 MB

Virtual space:
  Non-class space:      558.00 MB reserved,     459.63 MB ( 82%) committed
      Class space:      492.00 MB reserved,       2.50 MB ( <1%) committed
             Both:        1.03 GB reserved,     462.13 MB ( 44%) committed

......

Waste (percentages refer to total committed size 462.13 MB):
              Committed unused:    198.00 KB ( <1%)
        Waste in chunks in use:    229.11 KB ( <1%)
         Free in chunks in use:    295.55 KB ( <1%)
     Overhead in chunks in use:    122.69 KB ( <1%)
                In free chunks:      9.56 MB (  2%)
Deallocated from chunks in use:    340.41 MB ( 74%) (2628 blocks)
                       -total-:    350.79 MB ( 76%)

第二次执行trace命令后,超过设置的MaxMetaspaceSize 500MB,出现OOM错误:

% jcmd arthas-demo VM.metaspace
15090:

Total Usage ( 99 loaders):

  Non-Class: 1717 chunks,    487.71 MB capacity,   487.17 MB (>99%) used,   198.55 KB ( <1%) free,   251.93 KB ( <1%) waste,   107.31 KB ( <1%) overhead, deallocated: 2872 blocks with 465.54 MB
      Class:  171 chunks,      2.43 MB capacity,     2.38 MB ( 98%) used,    41.80 KB (  2%) free,     0 bytes (  0%) waste,    10.69 KB ( <1%) overhead, deallocated: 71 blocks with 29.81 KB
       Both: 1888 chunks,    490.14 MB capacity,   489.55 MB (>99%) used,   240.34 KB ( <1%) free,   251.93 KB ( <1%) waste,   118.00 KB ( <1%) overhead, deallocated: 2943 blocks with 465.56 MB

Virtual space:
  Non-class space:      596.00 MB reserved,     497.50 MB ( 83%) committed
      Class space:      492.00 MB reserved,       2.50 MB ( <1%) committed
             Both:        1.06 GB reserved,     500.00 MB ( 46%) committed

......

Waste (percentages refer to total committed size 500.00 MB):
              Committed unused:     42.00 KB ( <1%)
        Waste in chunks in use:    251.93 KB ( <1%)
         Free in chunks in use:    240.34 KB ( <1%)
     Overhead in chunks in use:    118.00 KB ( <1%)
                In free chunks:      9.82 MB (  2%)
Deallocated from chunks in use:    465.56 MB ( 93%) (2943 blocks)
                       -total-:    476.02 MB ( 95%)


MaxMetaspaceSize: 500.00 MB
InitialBootClassLoaderMetaspaceSize: 4.00 MB
UseCompressedClassPointers: true
CompressedClassSpaceSize: 492.00 MB
Exception in thread "as-shutdown-hooker" java.lang.OutOfMemoryError: Metaspace
    at java.base/java.lang.ClassLoader.defineClass1(Native Method)
    at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
    at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
    at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:550)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)

字节码分析

使用javap -verbose 生成class文件分析结果,对比结果如下:

class文件 文件大小 constant数量 StackMapTable
原始class 10035 (9.8KB) 555 总254个frame (其中frame_type = 17 248个)
3.2.0 增强的class 69367 (67.7KB) 606 (前555个常量与原始class相同) 总760个frame(其中frame_type = 251frame_type = 247 个253个)
3.4.1 增强的class 1109038 (1083KB) 1598 (与原始class常量顺序差异很大) 总1522个frame(其中frame_type = 255 1269个)

StackMapTable 几种主要的frame:

frame_type description
17 same
247 same_locals_1_stack_item_frame_extended
251 same_frame_extended
255 full_frame

其中3.4.1 增强的class的full_frame 包含的数据明显比另外两个class的要多,包含非常多的top,如下面这种数据:

frame_type = 255 /* full_frame */
  offset_delta = 34
  locals = [ class java/lang/String, null, class java/lang/Class, class java/lang/String, 
    class "[Ljava/lang/Object;", top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, top, 
    top, top, top, top, null, class java/lang/Class, class java/lang/String ]
  stack = [ class java/lang/String ]

对比字节码后,最大的两个差异是Constant pool 和 StackMapTable。尝试修改3.4.1版本代码,将增强类的StackMapTable去掉,测试结果只是减缓Metaspace的增量,但还是明显比3.2.0版本高得多。

JVM retransformClasses探究

字节码的对比分析很难找到切入点,回归到JVM本身,决定调试分析JVM retransformClasses的过程,期望可以找到两个arthas版本执行的差异分支。

3.4.1 版本中执行trace命令申请了大量的Metaspace内存空间,故在申请空间的方法(SpaceManager::get_new_chunk)下条件断点(chunk_word_size >= 8000),下面的调用栈触发了很多次:

metaspace::SpaceManager::get_new_chunk(unsigned long) spaceManager.cpp:387
metaspace::SpaceManager::grow_and_allocate(unsigned long) spaceManager.cpp:197
metaspace::SpaceManager::allocate_work(unsigned long) spaceManager.cpp:450
metaspace::SpaceManager::allocate(unsigned long) spaceManager.cpp:421
ClassLoaderMetaspace::allocate(unsigned long, Metaspace::MetadataType) metaspace.cpp:1480
Metaspace::allocate(ClassLoaderData*, unsigned long, MetaspaceObj::Type, Thread*) metaspace.cpp:1282
MetaspaceObj::operator new(unsigned long, ClassLoaderData*, unsigned long, MetaspaceObj::Type, Thread*) allocation.cpp:83
ConstMethod::allocate(ClassLoaderData*, int, InlineTableSizes*, ConstMethod::MethodType, Thread*) constMethod.cpp:46
Method::allocate(ClassLoaderData*, int, AccessFlags, InlineTableSizes*, ConstMethod::MethodType, Thread*) method.cpp:87
Method::clone_with_new_data(methodHandle const&, unsigned char*, int, unsigned char*, int, Thread*) method.cpp:1523
Relocator::insert_space_at(int, int, unsigned char*, Thread*) relocator.cpp:159
VM_RedefineClasses::rewrite_cp_refs_in_method(methodHandle, methodHandle*, Thread*) jvmtiRedefineClasses.cpp:2133
VM_RedefineClasses::rewrite_cp_refs_in_methods(InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:2041
VM_RedefineClasses::rewrite_cp_refs(InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:1876
VM_RedefineClasses::merge_cp_and_rewrite(InstanceKlass*, InstanceKlass*, Thread*) jvmtiRedefineClasses.cpp:1834
VM_RedefineClasses::load_new_class_versions(Thread*) jvmtiRedefineClasses.cpp:1405
VM_RedefineClasses::doit_prologue() jvmtiRedefineClasses.cpp:170
VMThread::execute(VM_Operation*) vmThread.cpp:534
JvmtiEnv::RetransformClasses(int, _jclass* const*) jvmtiEnv.cpp:451
jvmti_RetransformClasses(_jvmtiEnv*, int, _jclass* const*) jvmtiEnter.cpp:3969
retransformClasses JPLISAgent.c:1183
Java_sun_instrument_InstrumentationImpl_retransformClasses0 InstrumentationImplNativeMethods.c:109
......
thread_entry(JavaThread*, Thread*) jvm.cpp:3004
JavaThread::thread_main_inner() thread.cpp:2010
JavaThread::run() thread.cpp:1993
Thread::call_run() thread.cpp:395
thread_native_entry(Thread*) os_bsd.cpp:702
_pthread_start 0x00007fff6c6b6109
thread_start 0x00007fff6c6b1b8b

扩展ldc指令

查看这个调用栈的每个方法,找到一个可疑点:

VM_RedefineClasses::rewrite_cp_refs_in_method() -> Relocator::insert_space_at()

其中rewrite_cp_refs_in_method的代码片段如下:

// Rewrite constant pool references in the specific method. This code
// was adapted from Rewriter::rewrite_method().
void VM_RedefineClasses::rewrite_cp_refs_in_method(methodHandle method,
       methodHandle *new_method_p, TRAPS) {
    ......
    for (int bci = 0; bci < code_length; bci += bc_length) {
        address bcp = code_base + bci;
        Bytecodes::Code c = (Bytecodes::Code)(*bcp);
        ......
        switch (c) {
          case Bytecodes::_ldc:
          {
            int cp_index = *(bcp + 1);
            int new_index = find_new_index(cp_index);
    
            if (StressLdcRewrite && new_index == 0) {
              // If we are stressing ldc -> ldc_w rewriting, then we
              // always need a new_index value.
              new_index = cp_index;
            }
            if (new_index != 0) {
              // the original index is mapped so we have more work to do
              if (!StressLdcRewrite && new_index <= max_jubyte) {
                // The new value can still use ldc instead of ldc_w
                // unless we are trying to stress ldc -> ldc_w rewriting
                *(bcp + 1) = new_index;
              } else {
                // the new value needs ldc_w instead of ldc
                u_char inst_buffer[4]; // max instruction size is 4 bytes
                bcp = (address)inst_buffer;
                // construct new instruction sequence
                *bcp = Bytecodes::_ldc_w;
                bcp++;
                // Rewriter::rewrite_method() does not rewrite ldc -> ldc_w.
                // See comment below for difference between put_Java_u2()
                // and put_native_u2().
                Bytes::put_Java_u2(bcp, new_index);
            
                Relocator rc(method, NULL /* no RelocatorListener needed */);
                methodHandle m;
                {
                  PauseNoSafepointVerifier pnsv(&nsv);
            
                  // ldc is 2 bytes and ldc_w is 3 bytes
                  m = rc.insert_space_at(bci, 3, inst_buffer, CHECK);
                }
            
                // return the new method so that the caller can update
                // the containing class
                *new_method_p = method = m;
                // switch our bytecode processing loop from the old method
                // to the new method
                code_base = method->code_base();
                code_length = method->code_size();
                bcp = code_base + bci;
                ......

其中max_jubyte 为一个常量值0xFF

const jubyte  max_jubyte  = (jubyte)-1;  // 0xFF       largest jubyte

其中调用栈执行到的方法是 rc.insert_space_at()

      // ldc is 2 bytes and ldc_w is 3 bytes
      m = rc.insert_space_at(bci, 3, inst_buffer, CHECK);

结合代码注释大概理解一下,rewrite_cp_refs_in_method 的作用是重写方法的常量池引用(Rewrite constant pool references),它的主要执行逻辑如下(此处省略无关的步骤描述):

1) 循环遍历方法的字节码,解析为操作码(Bytecodes::Code c),根据不同的操作码可以确定其后附带的数据长度;
2) 如果ldc指令的数据cp_index存在映射值new_index(合并新类的constant pool过程会将相同的常量项映射到旧的constant pool的常量项),且new_index大于max_jubyte(0xFF),则需要将ldc指令扩展为ldc_w指令;
3) ldc的数据为单字节,ldc_w的数据为2个字节,长度不一样,所以改写此指令时需要插入新的字节码(rc.insert_space_at);

4) Relocator::insert_space_at()方法在修改字节码后,每次都会复制当前java方法的字节码数据及其他数据(如参数表、异常表及StackMapTable等);

ldc 及ldc_w指令介绍:

ldc
Push a single word constant.

ldc_w
Push a single word constant. (16-bit ref in constant pool)

克隆method数据

继续看Relocator::insert_space_at()的代码:

// size is the new size of the instruction at bci. Hence, if size is less than the current
// instruction size, we will shrink the code.
methodHandle Relocator::insert_space_at(int bci, int size, u_char inst_buffer[], TRAPS) {
  _changes = new GrowableArray<ChangeItem*> (10);
  _changes->push(new ChangeWiden(bci, size, inst_buffer));

  ......

  if (!handle_code_changes()) return methodHandle();

  // Construct the new method
  methodHandle new_method = Method::clone_with_new_data(method(),
                              code_array(), code_length(),
                              compressed_line_number_table(),
                              compressed_line_number_table_size(),
                              CHECK_(methodHandle()));

  // Deallocate the old Method* from metadata
  ClassLoaderData* loader_data = method()->method_holder()->class_loader_data();
  loader_data->add_to_deallocate_list(method()());

  set_method(new_method);

  ......

  return new_method;
}

Method::clone_with_new_data 方法的代码:

methodHandle Method::clone_with_new_data(const methodHandle& m, 
        u_char* new_code, 
        int new_code_length, 
        u_char* new_compressed_linenumber_table, 
        int new_compressed_linenumber_size, 
        TRAPS) {
  // Code below does not work for native methods - they should never get rewritten anyway
  assert(!m->is_native(), "cannot rewrite native methods");
  // Allocate new Method*
  AccessFlags flags = m->access_flags();

  ConstMethod* cm = m->constMethod();
  int checked_exceptions_len = cm->checked_exceptions_length();
  int localvariable_len = cm->localvariable_table_length();
  int exception_table_len = cm->exception_table_length();
  int method_parameters_len = cm->method_parameters_length();
  int method_annotations_len = cm->method_annotations_length();
  int parameter_annotations_len = cm->parameter_annotations_length();
  int type_annotations_len = cm->type_annotations_length();
  int default_annotations_len = cm->default_annotations_length();
  ...
  ClassLoaderData* loader_data = m->method_holder()->class_loader_data();
  Method* newm_oop = Method::allocate(loader_data,
                                      new_code_length,
                                      flags,
                                      &sizes,
                                      m->method_type(),
                                      CHECK_(methodHandle()));
  methodHandle newm (THREAD, newm_oop);

  // Create a shallow copy of Method part, but be careful to preserve the new ConstMethod*
  ConstMethod* newcm = newm->constMethod();
  int new_const_method_size = newm->constMethod()->size();

  // This works because the source and target are both Methods. Some compilers
  // (e.g., clang) complain that the target vtable pointer will be stomped,
  // so cast away newm()'s and m()'s Methodness.
  memcpy((void*)newm(), (void*)m(), sizeof(Method));

  // Create shallow copy of ConstMethod.
  memcpy(newcm, m->constMethod(), sizeof(ConstMethod));

  // Reset correct method/const method, method size, and parameter info
  newm->set_constMethod(newcm);
  newm->constMethod()->set_code_size(new_code_length);
  newm->constMethod()->set_constMethod_size(new_const_method_size);

  // Copy new byte codes
  memcpy(newm->code_base(), new_code, new_code_length);
  // Copy line number table
  if (new_compressed_linenumber_size > 0) {
    memcpy(newm->compressed_linenumber_table(),
           new_compressed_linenumber_table,
           new_compressed_linenumber_size);
  }
  // Copy method_parameters
  if (method_parameters_len > 0) {
    memcpy(newm->method_parameters_start(),
           m->method_parameters_start(),
           method_parameters_len * sizeof(MethodParametersElement));
  }
  // Copy checked_exceptions
  if (checked_exceptions_len > 0) {
    memcpy(newm->checked_exceptions_start(),
           m->checked_exceptions_start(),
           checked_exceptions_len * sizeof(CheckedExceptionElement));
  }
  // Copy exception table
  if (exception_table_len > 0) {
    memcpy(newm->exception_table_start(),
           m->exception_table_start(),
           exception_table_len * sizeof(ExceptionTableElement));
  }
  // Copy local variable number table
  if (localvariable_len > 0) {
    memcpy(newm->localvariable_table_start(),
           m->localvariable_table_start(),
           localvariable_len * sizeof(LocalVariableTableElement));
  }
  // Copy stackmap table
  if (m->has_stackmap_table()) {
    int code_attribute_length = m->stackmap_data()->length();
    Array<u1>* stackmap_data =
      MetadataFactory::new_array<u1>(loader_data, code_attribute_length, 0, CHECK_(methodHandle()));
    memcpy((void*)stackmap_data->adr_at(0),
           (void*)m->stackmap_data()->adr_at(0), code_attribute_length);
    newm->set_stackmap_data(stackmap_data);
  }

  // copy annotations over to new method
  newcm->copy_annotations_from(loader_data, cm, CHECK_(methodHandle()));
  return newm;
}

其中比较大的对象:

  • ConstMethod : 35729 (34.9KB)
  • stackmap_data : 1037645 (1013KB)

ConstMethod 数据长度为:35729 (34.9KB)
探究retransformClasses致使JVM Metaspace OOM的问题

stackmak_data 数据长度为:1037645 (1013KB)
探究retransformClasses致使JVM Metaspace OOM的问题

关键差异分支

调试发现3.2.0与3.4.1的执行差异分支在merge_cp_and_rewrite中的 if (_index_map_count == 0)判断语句。

  • 3.4.1版本的_index_map_count = 1597,执行进入rewrite_cp_refs方法,要进行ldc指令扩展的操作。
    探究retransformClasses致使JVM Metaspace OOM的问题
  • 3.2.0版本的_index_map_count = 0,没有执行到rewrite_cp_refs,不需要进行ldc指令扩展的操作。其实这个不难理解,如果新类的常量表在原始class的常量表后面追加新的常量项,不会产生常量项映射,也不会发生ldc index数据增大溢出的问题。
    探究retransformClasses致使JVM Metaspace OOM的问题

merge_cp_and_rewrite方法代码如下:

// Merge constant pools between the_class and scratch_class and
// potentially rewrite bytecodes in scratch_class to use the merged
// constant pool.
jvmtiError VM_RedefineClasses::merge_cp_and_rewrite(
             InstanceKlass* the_class, InstanceKlass* scratch_class,
             TRAPS) {
  // worst case merged constant pool length is old and new combined
  int merge_cp_length = the_class->constants()->length()
        + scratch_class->constants()->length();

  // Constant pools are not easily reused so we allocate a new one
  // each time.
  // merge_cp is created unsafe for concurrent GC processing.  It
  // should be marked safe before discarding it. Even though
  // garbage,  if it crosses a card boundary, it may be scanned
  // in order to find the start of the first complete object on the card.
  ClassLoaderData* loader_data = the_class->class_loader_data();
  ConstantPool* merge_cp_oop =
    ConstantPool::allocate(loader_data,
                           merge_cp_length,
                           CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
  MergeCPCleaner cp_cleaner(loader_data, merge_cp_oop);

  HandleMark hm(THREAD);  // make sure handles are cleared before
                          // MergeCPCleaner clears out merge_cp_oop
  constantPoolHandle merge_cp(THREAD, merge_cp_oop);

  // Get constants() from the old class because it could have been rewritten
  // while we were at a safepoint allocating a new constant pool.
  constantPoolHandle old_cp(THREAD, the_class->constants());
  constantPoolHandle scratch_cp(THREAD, scratch_class->constants());

  // If the length changed, the class was redefined out from under us. Return
  // an error.
  if (merge_cp_length != the_class->constants()->length()
         + scratch_class->constants()->length()) {
    return JVMTI_ERROR_INTERNAL;
  }

  // Update the version number of the constant pools (may keep scratch_cp)
  merge_cp->increment_and_save_version(old_cp->version());
  scratch_cp->increment_and_save_version(old_cp->version());

  ResourceMark rm(THREAD);
  _index_map_count = 0;
  _index_map_p = new intArray(scratch_cp->length(), scratch_cp->length(), -1);

  _operands_cur_length = ConstantPool::operand_array_length(old_cp->operands());
  _operands_index_map_count = 0;
  int operands_index_map_len = ConstantPool::operand_array_length(scratch_cp->operands());
  _operands_index_map_p = new intArray(operands_index_map_len, operands_index_map_len, -1);

  // reference to the cp holder is needed for copy_operands()
  merge_cp->set_pool_holder(scratch_class);
  bool result = merge_constant_pools(old_cp, scratch_cp, &merge_cp,
                  &merge_cp_length, THREAD);
  merge_cp->set_pool_holder(NULL);

  if (!result) {
    // The merge can fail due to memory allocation failure or due
    // to robustness checks.
    return JVMTI_ERROR_INTERNAL;
  }

  // Save fields from the old_cp.
  merge_cp->copy_fields(old_cp());
  scratch_cp->copy_fields(old_cp());

  log_info(redefine, class, constantpool)("merge_cp_len=%d, index_map_len=%d", merge_cp_length, _index_map_count);

  //关键分支
  if (_index_map_count == 0) {
    // there is nothing to map between the new and merged constant pools

    if (old_cp->length() == scratch_cp->length()) {
      // The old and new constant pools are the same length and the
      // index map is empty. This means that the three constant pools
      // are equivalent (but not the same). Unfortunately, the new
      // constant pool has not gone through link resolution nor have
      // the new class bytecodes gone through constant pool cache
      // rewriting so we can't use the old constant pool with the new
      // class.

      // toss the merged constant pool at return
    } else if (old_cp->length() < scratch_cp->length()) { // ** 3.2.0版本执行到这里 **
      // The old constant pool has fewer entries than the new constant
      // pool and the index map is empty. This means the new constant
      // pool is a superset of the old constant pool. However, the old
      // class bytecodes have already gone through constant pool cache
      // rewriting so we can't use the new constant pool with the old
      // class.

      // toss the merged constant pool at return
    } else {
      // The old constant pool has more entries than the new constant
      // pool and the index map is empty. This means that both the old
      // and merged constant pools are supersets of the new constant
      // pool.

      // Replace the new constant pool with a shrunken copy of the
      // merged constant pool
      set_new_constant_pool(loader_data, scratch_class, merge_cp, merge_cp_length,
                            CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
      // The new constant pool replaces scratch_cp so have cleaner clean it up.
      // It can't be cleaned up while there are handles to it.
      cp_cleaner.add_scratch_cp(scratch_cp());
    }
  } else {
    if (log_is_enabled(Trace, redefine, class, constantpool)) {
      // don't want to loop unless we are tracing
      int count = 0;
      for (int i = 1; i < _index_map_p->length(); i++) {
        int value = _index_map_p->at(i);

        if (value != -1) {
          log_trace(redefine, class, constantpool)("index_map[%d]: old=%d new=%d", count, i, value);
          count++;
        }
      }
    }

    // We have entries mapped between the new and merged constant pools
    // so we have to rewrite some constant pool references.
    if (!rewrite_cp_refs(scratch_class, THREAD)) {  // ** 3.4.1版本执行到这里 ** 
      return JVMTI_ERROR_INTERNAL;
    }

    // Replace the new constant pool with a shrunken copy of the
    // merged constant pool so now the rewritten bytecodes have
    // valid references; the previous new constant pool will get
    // GCed.
    set_new_constant_pool(loader_data, scratch_class, merge_cp, merge_cp_length,
                          CHECK_(JVMTI_ERROR_OUT_OF_MEMORY));
    // The new constant pool replaces scratch_cp so have cleaner clean it up.
    // It can't be cleaned up while there are handles to it.
    cp_cleaner.add_scratch_cp(scratch_cp());
  }

  return JVMTI_ERROR_NONE;
} // end merge_cp_and_rewrite()

再回过来看一下本案例的字节码,发现存在一个非常频繁的常量项映射:

  • 原始类中:
  #262 = Class              #544          // demo/BigMethod250
  ......
  #544 = Utf8               demo/BigMethod250
  • 新类中:
     #1 = Utf8               demo/BigMethod250
     #2 = Class              #1           // demo/BigMethod250

新类的cp_index 为#2 比原始类的cp_index #262小,而#262 > 0xFF,需要扩展为ldc_w指令!

探究retransformClasses致使JVM Metaspace OOM的问题

探究retransformClasses致使JVM Metaspace OOM的问题

新类中ldc #2 现次数高达761次,意味着需要扩展ldc指令761次!!由调试数据可知,扩展一个ldc指令可能需要申请超过1MB的空间,本案例执行这么几百次扩展,申请的Metaspace空间就达到了恐怖的421MB。

新类出现大量ldc #2 是因为Arthas trace命令增强字节码时,对每个方法调用都会插入atBeforeInvoke, atInvokeException, atAfterInvoke 3个回调方法,而都用到本类的class参数,反编译的代码如下:

    if ("Dustin".equals("a0")) {
        var10000 = "You found me!";
        String var21 = "demo/BigMethod250|print|(Ljava/lang/String;)V|15";
        Class var20 = BigMethod250.class;
        Object var19 = null;
        SpyAPI.atBeforeInvoke(var20, var21, var19);
        
        try {
            print(var10000);
        } catch (Throwable var2797) {
            String var1540 = "demo/BigMethod250|print|(Ljava/lang/String;)V|17";
            Class var1539 = BigMethod250.class;
            Object var1538 = null;
            SpyAPI.atInvokeException(var1539, var1540, var1538, var2797);
            throw var2797;
        }
        
        String var780 = "demo/BigMethod250|print|(Ljava/lang/String;)V|17";
        Class var779 = BigMethod250.class;
        Object var778 = null;
        SpyAPI.atAfterInvoke(var779, var780, var778);
    } else if ("Dustin".equals("a1")) {

其中类似Class var20 = BigMethod250.class; 的语句就是指令 ldc #2 // class demo/BigMethod250 反编译而来。

Block 回收利用

当扩展方法字节码触发复制方法时,会将旧的方法加入到待回收列表(deallocate_list)中:

  // Deallocate the old Method* from metadata
  ClassLoaderData* loader_data = method()->method_holder()->class_loader_data();
  loader_data->add_to_deallocate_list(method()());

但deallocate_list中的这些method占用的空间不能被立即使用,在下次GC时将被归还到classloader的SpaceManagerblock_freelists中。block_freelists中的block不会直接被释放,而是在下次分配空间时优先从block_freelists中查找,如果找到满足要求的block则重用之。

观察JVM log发现,retransformClasses过程触发Metaspace GC的频率很低,这意味着deallocate_list中的很多待回收method空间不能被充分重用,导致需要申请大量Metaspace空间。

可以使用下面的命令打开JVM log观察Metaspace GC活动:

 % jcmd arthas-demo VM.log what="metaspace*=info,stackmap*=info"
88451:
Command executed successfully
% jcmd arthas-demo VM.log list
88451:
Available log levels: off, trace, debug, info, warning, error
Available log decorators: time (t), utctime (utc), uptime (u), ....
Described tag sets:
 logging: Logging for the log framework itself
Log output configuration:
 #0: stdout all=warning,metaspace*=info,stackmap*=info uptime,level,tags (reconfigured)
 #1: stderr all=off uptime,level,tags

JVM 日志默认输出到stdout:

[48.318s][info][gc,metaspace] GC(1) Metaspace: 20736K->20736K(524288K)
illegalArgumentCount: 25, number is: -100438, need >= 2
illegalArgumentCount: 26, number is: -154116, need >= 2
[50.278s][info][gc,metaspace] GC(3) Metaspace: 22190K->22190K(524288K)
[50.550s][info][gc,metaspace] GC(4) Metaspace: 22428K->22428K(526336K)
[50.780s][info][gc,metaspace] GC(5) Metaspace: 22457K->22457K(526336K)
[51.040s][info][gc,metaspace] GC(6) Metaspace: 34393K->34393K(540672K)
illegalArgumentCount: 27, number is: -111050, need >= 2
[51.534s][info][gc,metaspace] GC(8) Metaspace: 70774K->70774K(585728K)
[51.589s][info][gc,metaspace] GC(9) Metaspace: 76161K->76161K(593920K)
[51.643s][info][gc,metaspace] GC(10) Metaspace: 81547K->81547K(600064K)
illegalArgumentCount: 28, number is: -45880, need >= 2
[52.878s][info][gc,metaspace] GC(12) Metaspace: 149700K->149700K(684032K)
14847=3*7*7*101
[55.057s][info][gc,metaspace] GC(14) Metaspace: 263767K->263767K(825344K)
illegalArgumentCount: 29, number is: -96539, need >= 2
[58.963s][info][gc,metaspace] GC(16) Metaspace: 451906K->451906K(1060864K)
illegalArgumentCount: 30, number is: -93735, need >= 2

解决之道

复制原始类的常量池

增强类需要复制原始类的常量池,不能重新生成常量池,避免因为常量项index发生变化而产生映射,导致需要扩展ldc指令。

ASM模拟代码如下:

    // 解析字节码
    ClassReader reader = new ClassReader(classBytes);
    ClassNode classNode = new ClassNode();
    reader.accept(classNode, ClassReader.SKIP_FRAMES);
    
    // 增强class
    ......
    
    // 生成字节码
    int flags = ClassWriter.COMPUTE_FRAMES | ClassWriter.COMPUTE_MAXS;
    // 创建ClassWriter时传入原始的classReader,自动复制原始类的constant pool 
    ClassWriter writer = new ClassWriter(classReader, flags);
    classNode.accept(writer);
    return writer.toByteArray();

注: 如果要指定ClassLoader,请参考com.taobao.arthas.bytekit.asm.ClassLoaderAwareClassWriter

ASM ClassWriter doc中有关于constant pool的说明:

public ClassWriter(@Nullable ClassReader classReader, int flags)

Constructs a new ClassWriter object and enables optimizations for "mostly add" bytecode transformations. These optimizations are the following:

  • The constant pool and bootstrap methods from the original class are copied as is in the new class, which saves time. New constant pool entries and new bootstrap methods will be added at the end if necessary, but unused constant pool entries or bootstrap methods won't be removed.
  • Methods that are not transformed are copied as is in the new class, directly from the original class bytecode (i.e. without emitting visit events for all the method instructions), which saves a lot of time. Untransformed methods are detected by the fact that the ClassReader receives MethodVisitor objects that come from a ClassWriter (and not from any other ClassVisitor instance).

Params:

  • classReader – the ClassReader used to read the original class. It will be used to copy the entire constant pool and bootstrap methods from the original class and also to copy other fragments of original bytecode where applicable.
  • flags – option flags that can be used to modify the default behavior of this class.Must be zero or more of COMPUTE_MAXS and COMPUTE_FRAMES. These option flags do not affect methods that are copied as is in the new class. This means that neither the maximum stack size nor the stack frames will be computed for these methods.

其它问题

1)当增强的方法的语句比较多时,asm重新生成StackMapTable会非常大,会增加Metaspace空间的消耗。测试发现不生成StackMapTable也可以正确加载运行,是否可以不生成StackMapTable?是否存在JDK版本兼容性问题?这一点没有深入去研究。

    // 生成字节码 不设置ClassWriter.COMPUTE_FRAMES则不生成StackMapTable
    int flags = ClassWriter.COMPUTE_MAXS;
    // 创建ClassWriter时传入原始的classReader,自动复制原始类的constant pool 
    ClassWriter writer = new ClassWriter(classReader, flags);
    classNode.accept(writer);
    return writer.toByteArray();

2)Relocator::insert_space_at()方法在修改字节码后,每次都会克隆方法数据,没有考虑到极端情况下大量调用带来的问题。

Relocator扩展指令时,先复制一份method的字节码(code_array),然后在复制的副本中修改字节码,而不是直接修改原method的字节码。
通过Method::clone_with_new_data()创建的新method复制了Relocator修改后的字节码(code_array)和原方法的参数表、异常表及stackmap等。
本案例中method的stackmap非常大(超过1MB),导致每次扩展指令浪费大量内存。

  // Construct the new method
  methodHandle new_method = Method::clone_with_new_data(method(),
                              code_array(), code_length(),
                              compressed_line_number_table(),
                              compressed_line_number_table_size(),
                              CHECK_(methodHandle()));

小结

这是一个由class constant pool引发的血案,修改起来很简单,要弄明白却不容易。

通过深入分析JVM retransformClasses的处理过程,我们对JVM的class字节码热更新有初步了解,加深了对class常量池的认识。另外也掌握了一些查看JVM Metaspace 和 VM日志的命令,日后在遇到类似的问题可以继续发掘jcmd命令,在新版的JDK 15/16中提供了更多的功能。

上一篇:构建插件式的应用程序框架(一)----开篇


下一篇:《OpenACC并行程序设计:性能优化实践指南》一 1.2 简单的任务并行示例