宝爷Debug小记——Cocos2d-x（3.13之前的版本）底层BUG导致Spine渲染花屏

2023-08-05 21:36:10

最近在工作中碰到不少棘手的BUG，其中的一个是Spine骨骼的渲染花屏，在战斗中派发出大量士兵之后有概率出现花屏闪烁（如下图所示），这种莫名奇妙且难以重现的BUG最为蛋疼。

宝爷Debug小记——Cocos2d-x（3.13之前的版本）底层BUG导致Spine渲染花屏

前段时间为了提高Spine骨骼动画的加载速度，将Spine库进行了升级，新的Spine库支持skel二进制格式，二进制格式的加载速度比json格式要快5倍以上。

这是一个大工程，游戏中所有的骨骼动画都需要使用更高版本的Spine编辑器重新导出，由于部分美术没有对源文件进行版本管理，丢失了源文件，导致部分骨骼动画要重新制作，浪费了不少时间。我们对代码进行了严格的版本管理，并且大受裨益，但美术的源文件管理确实很容易被忽视，所以在这里吃了一个大亏。升级版本之后，部分使用了翻转的骨骼出现了一些问题，需要美术逐个检查，重新设置翻转之后再导出。

使用了新版本的Spine库，除了二进制格式的支持外，渲染方面也进行了一个优化，使用TriangleCommand替换了原先的CustomCommand，这使得多个骨骼动画的渲染可以被合并，原来的版本每个骨骼至少占用一个drawcall。另外新Spine使用的顶点Shader也发生了变化，导致之前使用的旧Shader也需要跟着调整顶点Shader。

接下来，让我们开始Debug，首先排查一下骨骼动画的问题，同一个关卡，我让测试人员帮忙以很高的频率出兵，但是只出一种兵，看看花屏是不是某种兵的渲染导致的。结果是每种兵出到一定的数量之后都会出现这个问题，但是不同的兵种出问题的时间不同，其中的大树人兵种在派出了6个之后就会出现花屏的问题，而其他兵种则比较难出现。

那么大树的骨骼和其他几个骨骼有什么不同呢？询问美术人员之后，得知大树这个骨骼动画使用了较多的Mesh，也就是Spine中的网格功能，这个功能可以让2D的图片实现柔顺的扭曲效果，例如毛发、衣物的飘扬效果。

既然是Spine的网格出问题，那么是否因为Spine的版本问题导致？编辑器导出的版本与Spine运行库的版本不匹配导致的，根据文档让美术使用了3.3.07，3.5.35和3.5.51版本的Spine编辑器导出骨骼，并使用了3.5.35和3.5.51的运行库进行测试，都存在这个问题。

接下来我开始对比Spine的渲染代码，对比上一版本（升级前的Spine，也就是Cocos2d-x3.13.1之前的Spine库），上一版本使用的是自己的批渲染，而最新版本是TriangleCommand，尝试改回去，但代码和数据结构已经发生了较大的改动，强制改回去之后发现渲染效果更加糟糕了。

阅读了Spine的渲染代码之后，尝试跳过spine的网格渲染，我添加了一个测试用的静态变量，然后在运行中打断点，之后动态修改这个变量的值，来控制程序的运行流程，逐个跳过Spine的渲染类型，最后定位到只要把网格渲染跳掉，出再多的大树人也不会导致花屏。我想或许有些没有程序员精神的程序员到这里就会结案，然后通知美术人员去除所有网格，重新导出资源。但我决定认真分析下为什么这个网格渲染会导致花屏。

 static int skiptype = ;


 void SkeletonRenderer::draw (Renderer* renderer, const Mat4& transform, uint32_t transformFlags) {

     SkeletonBatch* batch = SkeletonBatch::getInstance();

     for (auto t : _curTriangles)

     {

         TrianglesMgr::getInstance()->freeTriangles(t);

     }

     _curTriangles.clear();

     _triCmds.clear();

     Color3B nodeColor = getColor();

     _skeleton->r = nodeColor.r / (float);

     _skeleton->g = nodeColor.g / (float);

     _skeleton->b = nodeColor.b / (float);

     _skeleton->a = getDisplayedOpacity() / (float);

     Color4F color;

     AttachmentVertices* attachmentVertices = nullptr;

     for (int i = , n = _skeleton->slotsCount; i < n; ++i) {

         spSlot* slot = _skeleton->drawOrder[i];

         if (!slot->attachment) continue;
if (slot->attachment->type == skiptype) continue;


         switch (slot->attachment->type) {

         case SP_ATTACHMENT_REGION: {

             spRegionAttachment* attachment = (spRegionAttachment*)slot->attachment;

             spRegionAttachment_computeWorldVertices(attachment, slot->bone, _worldVertices);

             attachmentVertices = getAttachmentVertices(attachment);

             color.r = attachment->r;

             color.g = attachment->g;

             color.b = attachment->b;

             color.a = attachment->a;

             break;

         }

         case SP_ATTACHMENT_MESH: {

             spMeshAttachment* attachment = (spMeshAttachment*)slot->attachment;

             spMeshAttachment_computeWorldVertices(attachment, slot, _worldVertices);

             attachmentVertices = getAttachmentVertices(attachment);

             color.r = attachment->r;

             color.g = attachment->g;

             color.b = attachment->b;

             color.a = attachment->a;

             break;

         }

         default:

两种渲染最后的处理都一样，不同的地方就在于上面这个switch中的顶点计算部分，阅读了一下旧版本Spine的Mesh顶点计算代码，再看看新的Mesh顶点计算，直接吐血，原本的几行代码，新版本使用了几百行代码，都是各种复杂的计算，可读性很糟糕...，尝试把旧的Mesh顶点计算代码应用到新的Spine，结果也是非常糟糕。

接下来我决定换一个简单点的环境来定位问题，这样可以排除其他的干扰！我修改了一下Cocos2d-x3.13版本的TestCpp中的SpineTest进行简单的测试，结果发现了一个有意思的现象，当我添加到第十二个树人时渲染出现了一些奇怪的现象（美术给我的是小树人，顶点较少，所以到第十二个才出问题）

再次检查了一下渲染的代码后突然注意到左下角的顶点数，当我添加第12个树人的时候，顶点数突破了65535！记得在Cocos2d-x底层渲染中，65535是VBO顶点缓存区的最大值，接下来把目标锁定在Cocos2d-x的渲染中。再次阅读了一下Render的代码，特别是TriangleCommand的渲染，调试了一下，发现渲染的顶点是2W多个，而Index索引是7W多个，难道是index的限制不能超过65535？于是把代码中的INDEX_VBO_SIZE替换为VBO_SIZE，这样一次渲染中Index和Vertex都不能超过65535，改完之后，问题果然解决了。那这就结案了吗？我觉得还得再深入探讨一下，把问题的根源彻底确定。

 void Renderer::processRenderCommand(RenderCommand* command)

 {

     auto commandType = command->getType();

     if( RenderCommand::Type::TRIANGLES_COMMAND == commandType)

     {

         // flush other queues

         flush3D();

         auto cmd = static_cast<TrianglesCommand*>(command);

         // flush own queue when buffer is full

         if(_filledVertex + cmd->getVertexCount() > VBO_SIZE || _filledIndex + cmd->getIndexCount() > INDEX_VBO_SIZE)

         {

             CCASSERT(cmd->getVertexCount()>=  && cmd->getVertexCount() < VBO_SIZE, "VBO for vertex is not big enough, please break the data down or use customized render command");

             CCASSERT(cmd->getIndexCount()>=  && cmd->getIndexCount() < INDEX_VBO_SIZE, "VBO for index is not big enough, please break the data down or use customized render command");

             drawBatchedTriangles();

         }

         // queue it

         _queuedTriangleCommands.push_back(cmd);

         _filledIndex += cmd->getIndexCount();

         _filledVertex += cmd->getVertexCount();

     }

难道IndexCount真的不能超过65535吗？google查阅了不少资料，glGet获取GL_MAX_ELEMENTS_INDICES，发现其值是10W+，仔细阅读了OpenGL超级宝典关于缓存区部分的介绍，也没有说Index不能超过65535。Cocos2d-x底层的VBO也分配了足够的空间。难道是顶点或者索引错位了之类的问题导致的，于是我把动画停止，把所有的树人都限定在同一个位置，然后在Render的最底层，打印出每个树人渲染时的所有顶点和索引信息，然后对比一下只有一个树人、11个树人以及12个树人渲染的顶点和索引信息有何不同。

 // 增加一些调试用的静态变量

 static bool __dbg = false;

 static bool __deepDbg = false;

 static int __cmdCount = ;

 static int __curCmdCount = ;

 static int __idxCount = ;

 static int __vexCount = ;

 static int __maxidx = ;


 void Renderer::fillVerticesAndIndices(const TrianglesCommand* cmd)

 {

     memcpy(&_verts[_filledVertex], cmd->getVertices(), sizeof(V3F_C4B_T2F) * cmd->getVertexCount());

     // fill vertex, and convert them to world coordinates

     const Mat4& modelView = cmd->getModelView();

     for(ssize_t i=; i < cmd->getVertexCount(); ++i)

     {

         modelView.transformPoint(&(_verts[i + _filledVertex].vertices));
         // 打印所有顶点的xyz和纹理uv

         if(__dbg && __deepDbg)

         {

             CCLOG("vertex %d is xyz %.2f,%.2f,%.2f uv %.2f,%.2f", i + _filledVertex - __vexCount,_verts[i + _filledVertex].vertices.x,

                 _verts[i + _filledVertex].vertices.y, _verts[i + _filledVertex].vertices.z,

                 _verts[i + _filledVertex].texCoords.u, _verts[i + _filledVertex].texCoords.v);

         }
     }

     // fill index

     const unsigned short* indices = cmd->getIndices();

     for(ssize_t i=; i< cmd->getIndexCount(); ++i)

     {

         _indices[_filledIndex + i] = _filledVertex + indices[i];
         if (__dbg)

         {

             if (__maxidx < _indices[_filledIndex + i])

             {

                 __maxidx = _indices[_filledIndex + i];

             }

             if (__deepDbg)

             {

                 CCLOG("index %d is %d", _filledIndex + i - __idxCount, _indices[_filledIndex + i] - __vexCount);

             }

         }
     }

     _filledVertex += cmd->getVertexCount();

     _filledIndex += cmd->getIndexCount();

 }

 void Renderer::drawBatchedTriangles()

 {

     if(_queuedTriangleCommands.empty())

         return;

     CCGL_DEBUG_INSERT_EVENT_MARKER("RENDERER_BATCH_TRIANGLES");

     if (__dbg)

     {

         __vexCount = ;

         __idxCount = ;

         __curCmdCount = ;

     }


     _filledVertex = ;

     _filledIndex = ;

     /************** 1: Setup up vertices/indices *************/

     _triBatchesToDraw[].offset = ;

     _triBatchesToDraw[].indicesToDraw = ;

     _triBatchesToDraw[].cmd = nullptr;

     int batchesTotal = ;

     int prevMaterialID = -;

     bool firstCommand = true;

     for(auto it = std::begin(_queuedTriangleCommands); it != std::end(_queuedTriangleCommands); ++it)

     {

         const auto& cmd = *it;

         auto currentMaterialID = cmd->getMaterialID();

         const bool batchable = !cmd->isSkipBatching();
         if (__dbg)

         {

             if (__curCmdCount % __cmdCount == )

             {

                 CCLOG("begin %d =====================================", __curCmdCount / __cmdCount);

                 __vexCount = _filledVertex;

                 __idxCount = _filledIndex;

             }

             ++__curCmdCount;

         }


         fillVerticesAndIndices(cmd);

         // in the same batch ?

         if (batchable && (prevMaterialID == currentMaterialID || firstCommand))

         {

             CC_ASSERT(firstCommand || _triBatchesToDraw[batchesTotal].cmd->getMaterialID() == cmd->getMaterialID() && "argh... error in logic");

             _triBatchesToDraw[batchesTotal].indicesToDraw += cmd->getIndexCount();

             _triBatchesToDraw[batchesTotal].cmd = cmd;

         }

         else

         {

             // is this the first one?

             if (!firstCommand) {

                 batchesTotal++;

                 _triBatchesToDraw[batchesTotal].offset = _triBatchesToDraw[batchesTotal-].offset + _triBatchesToDraw[batchesTotal-].indicesToDraw;

             }

             _triBatchesToDraw[batchesTotal].cmd = cmd;

             _triBatchesToDraw[batchesTotal].indicesToDraw = (int) cmd->getIndexCount();

             // is this a single batch ? Prevent creating a batch group then

             if (!batchable)

                 currentMaterialID = -;

         }

         // capacity full ?

         if (batchesTotal +  >= _triBatchesToDrawCapacity) {

             _triBatchesToDrawCapacity *= 1.4;

             _triBatchesToDraw = (TriBatchToDraw*) realloc(_triBatchesToDraw, sizeof(_triBatchesToDraw[]) * _triBatchesToDrawCapacity);

         }

         prevMaterialID = currentMaterialID;

         firstCommand = false;

     }

     batchesTotal++;
     if (__dbg)

     {

         CCLOG("MAX IDX %d", __maxidx);

     }
     __dbg = false;

在添加第一个树人后，打断点，并将__dbg和__deepDbg开启，它会打印出本次渲染的树人详情，添加到第十一和第十二个的时候，再各打印一次，通过Beyond Compare对比结果，发现这些信息完全正确，每个树人的所有顶点和索引都是完全一样的，渲染的内容并没有被修改或发生错位。那正确的内容为什么渲染不出正确的结果呢？于是继续分析接下来的glDrawElements方法，在十二个树人渲染的时候，断点检查了一下该函数的所有参数，发现了第二个参数的值出现了问题！这个值表示要渲染的顶点索引数量，在只渲染一次的情况下， _triBatchesToDraw[i].indicesToDraw应该等同于_filledIndex才对，而断点看到的值却远小于_filledIndex，查找了一下indicesToDraw的所有引用，发现这个值在每合并一个Command的时候会加上该Command的IndexCount，而这个变量的类型是GLushort！结果终于真相大白，这个变量在不断增加的过程中溢出了，从而导致渲染的Index出现问题，最终导致的花屏。

     for (int i=; i<batchesTotal; ++i)

     {

         CC_ASSERT(_triBatchesToDraw[i].cmd && "Invalid batch");

         _triBatchesToDraw[i].cmd->useMaterial();

         glDrawElements(GL_TRIANGLES, (GLsizei) _triBatchesToDraw[i].indicesToDraw, GL_UNSIGNED_SHORT, (GLvoid*) (_triBatchesToDraw[i].offset*sizeof(_indices[])) );

         _drawnBatches++;

         _drawnVertices += _triBatchesToDraw[i].indicesToDraw;

     }

最终的改法应该是将indicesToDraw的类型修改为GLsizei，测试通过后，开开心心地打算提交一个pull request，结果却发现，在下一个版本3.14中，该BUG已被修复...，想想还是应该多升级一下引擎啊....

最后反思一下这个Bug，有些千奇百怪的Bug，处理到最后往往是那么一两行代码的事情，整个解决Bug的流程看上去虽然很绕，但实际上是先确定并重现我呢体，再从出问题的地方——Spine一点点排查，一直到最底层的渲染逻辑。如果是用逆向思维，可能一下子就定位到问题了，但一开始根本没怀疑Cocos2d-x的渲染有问题，因为Cocos2d-x的版本已经有段时间没有升级过了，而Spine则是最近升级的。

所以呢，就算不升级引擎，也应该多关心一下引擎的更新日志，了解修改了哪些BUG。除了程序的原因，美术过量使用了网格，也是这个BUG的一大诱因，过量使用网格，会导致Spine骨骼动画加载变慢，资源文件变大，并影响性能。

在分析Spine渲染代码的时候，发现一个可优化的点，就是每次添加一个渲染命令，都会重新分配一块内存用于存储顶点信息，为什么不直接使用传入的顶点信息指针呢？可能是因为后面对顶点进行了坐标转换，这样同一个顶点可能被转换多次，那么在这里使用一个简易的内存池也可以起到很好的优化作用。

码农公寓

相关文章