引擎设计跟踪(九.14.2f) 最近更新: OpenGL ES & tools

2023-01-04 20:24:10

之前骨骼动画的IK暂时放一放, 最近在搞GLES的实现. 之前除了GLES没有实现, Android的代码移植已经完毕:

[原]跨平台编程注意事项(三): window 到 android 的移植

总的来说上次移植的改动不是很大, 主要是DLL与.so之间的调整和适配, 还有些C++标准相关的编译错误. 数据包的加载/初始化/配置文件和插件的加载测试可用了, 但GLES没有实现, 所以上次的移植只能在真机上空跑.

最近想在业余时间抽空把GLES的空白填上, 目前接口调整差不多了, GLES runtime正在填实现.

1.先简单说下Tile Based Rendering GPU的原理和注意事项

TBR方式会将屏幕空间划分为若干个Tile, 每个tile比屏幕小, 比如32x32.
TBR会把几何数据在屏幕空间划分到每个tile, 然后对每个Tile进行渲染, 几何数据可能是跨很多tile的, 所以需要一直保存, 而且drawcall的几何数据越多, 耗费的内存越大.
TBR的架构, GPU内部有针对Tile的快速内存(fast memory, 暂时先叫tile cache吧), 访问速度很快. 但是video memory一般不是卡载的物理显存, 而是使用系统主存, video memory 到cache的传输相对比较慢.
由于Tile Cache的存在, 去读写depth 和color 都很快.这和现代PC的GPU不同. TBR的blending, depth write/test, multisample相对来说都会快些. 对于深度不同的像素, 即使重复着色, 也只是在Tile Cache上进行, 最终一次写入到video memory(实际中使用发现alpha blend仍然是比较慢的操作).
由于Tile Cache到video memory很慢, 所以GLES提供了InvalidateFrameBuffer的hint, 对于这种架构, 可以避免cache和memory之间的额外传输.
如果GPU有hidden surface removal 特性(PVR GPU), GPU会去排序这个几何数据, 只在Tile Cache上绘制可见的部分, pixel负载小很多. 所以app在绘制的时候, solid物体不需要按距离排序, 但是discard/texkill 会导致其特性失效. 对于失效的情况, 或者没有该特性的GPU, 仍然可以利用early z: 使用传统方式的pre-z pass先写深度.
Tile Based GPU的几何负载(三角形数量)相对要比现代PC的GPU要低很多. 现代PC几百万的三角形是小意思, 但是Tile based需要保存这些几何数据, 用于各个tile的渲染, 内存和运行开销都比较大.

2.GLES和D3D接口统一

渲染接口基本类似, 有等价的实现, 主要在shader接口:

GL/ES是运行时link program, 他的shader是中间对象.D3D一般是离线编译然后运行时直接载入.

GLES3有glGetProgramBinary和glProgramBinary, 可以保存和加载编译后的shader.但是编译和保存仍然要在target device上做.

Blade之前的接口是IShader => D3D9VertexShader : has a IDirect3DVertexShader9

=> D3D9FragementShader : has a IDirect3DPixelShader9

现在的接口把shader类型合并, 不再有不同类型的的shader 对象, 而是一个shader包含了vs和fs等对象

IShader => D3D9Shader : has a (IDirect3DVertexShader9 & IDirect3DPixelShader9 )

=> GLESShader : has a gl program

同时IRenderDevice:: setShader( EShaderType, HSHADER& ) 改为setShader(HSAHDER&)

GLSL/ES的shader, 所有的uniform和vertex input stream(vertex attribute) 都没有semantic. 需要用户自己根据名字来绑定和设置.

对于uniform, 因为Blade的shader resouce 会额外的保存一个semantic map, 用于更新引擎内置的变量, 比如WORLD_MATRIX, EYE_POS等等, 所以uniform的绑定和更新没有问题.

而对于vertex atribute, 现在的做法是, 把这些变量使用固定的名字替换. 比如 HLSL中的POSITION0, 对应的GLSL, 其变量名字叫做blade_position0.

这样就可以在运行时glBindAttributeLocation, 绑定到VBO上.

3.工具

打包工具BPK已经有了,runtime也在android上测试可用. 目前需要的工具有:　shader compiler, texture compressor.

shader compiler使用的是HLSL2GLSL:

先说下windows下现有的shader compiler:

offline:

HLSL ==(TexShaderSerializer::load) ==> D3DSoftwareShader : compiled binary == (BinarySerializer::save) ==> binary shader : with semantic map

runtime:

binary shader ==(BinarySerializer::load)==> D3DShader

GLES下已经做的shader compiler:

offline:

HLSL ==(TexShaderSerializer::load) ==> D3DSoftwareShader : compiled binary with HLSL text ==(replace with GLSL)==>

binary with GLSL text ==(HybridShaderSerializer::save)==> hybird shader :　text with binary semnatic map

runtime:

hybrid shader ==(HybridShaderSerializer::load)==> GLESShader

对于GLES3.0, 可以在启动时将shader(program)保存为binary(只保存一次), 这样shader以后不用再编译, 加载速度会快很多.这个以后也会做.

(https://software.intel.com/en-us/articles/opengl-es-30-precompiled-shaders)

GLES2的扩展有glShaderBinary, 不过是保存链接前的shader, 而不是链接后的program.

starting up precompile: once and for all

hybrid shader ==(HybridShaderSerializer::load)==> GLESShader ==(BinaryShaderSerializer::save) ==> binary shader

runtime:

binary shader ==(BinaryShaderSerilizer::load) ==> GLESShader

需要记录的是IShader是渲染设备/API相关的接口, 其接口抽象位于foundation library, 实现在另一个DLL/so. 而ShaderResource和所有的ShaderSerializer是可复用的,平台无关的. 整个Graphics Subsystem是平台无关的, 具体平台相关的优化(比如Tile Based)需要用渲染配置文件(这个文件的范例.xml以前记录过)来做, 还有Blade::IRenderDevice内部的implementation来做针对的处理.

shader compiler因为用了三方库, 所以目前做完了, 可以转换为GLSL ES 3.0, 等runtime填充玩, 有了压缩纹理格式就可以测试了.

texture compressor是把纹理压缩成目标平台使用的格式, 这里Blade准备用的是ETC2/EAC. 之前blade在windows上是实时压缩, 因为看到国外有的引擎这么做, 主要优点是用png保存在磁盘节约磁盘空间,png的压缩比要比S3TC高. 但是使用中发现,对于大贴图, 加载稍微有点慢, 而且对于移动端, 在线压缩也不是好方法,这个之前提到过. 以后的方案改为先离线压缩好贴图, 所有平台统一使用这种预压缩方式.

texture compresstor的话, 最近工作太忙, 没有太多业余时间. 可能也没时间去手写, 会用三方库来做压缩. 目前还没做, 后面会做. 还要做的是, 梳理目标平台数据生成/打包流程. 即综合shader compiler, texture compressor, BPK packager, 一次性生成最终数据的build/project script.

其他的游戏数据, 已经设计成跨平台的, 理论上也应该是跨平台, 不需要做任何额外处理. blade现有的x86和x64用的都是相同的数据或者BPK数据包. 但是android上面可能需要调试.

最后, HLSL之前的uniform semantic解析, 是放在文件的注释里面的:

 //!BladeShaderHeader

   //![VertexShader]

   //!Entry=TerrainVSMain

   //!Profile=vs_3_0

   //![FragmentShader]

   //!Entry=TerrainPSMain

   //!Profile=ps_3_0

 #include "inc/light.hlsl"

 #include "inc/common.hlsl"

 #include "inc/terrain_common.hlsl"

 //![Semantics]

 //!wvp_matrix = WORLD_VIEWPROJ_MATRIX

 //!world_translate = WORLD_POSITION

 void TerrainVSMain(

     float2 hpos        : POSITION0,

     float2 vpos        : POSITION1,

     float4 normal    : NORMAL0,        //ubyte4-n normal

     uniform float4x4 wvp_matrix,

     uniform float4 world_translate,

     uniform float4 scaleFactor,        //scale

     uniform float4 UVInfo,            //uv information

     out    float4 outPos : POSITION,

     out    float4 outUV  : TEXCOORD0,

     out float4 outBlendUV : TEXCOORD1,

     out float3 outWorldPos : TEXCOORD2,

     out float3 outWorldNormal : TEXCOORD3

     )

 {

     float4 pos = float4(hpos.x, getMorphHeight(vpos, hpos+world_translate.xz, eye_position.xz), hpos.y, );

     pos = pos*scaleFactor;

     float blendOffset = UVInfo[];

     float tileSize = UVInfo[];

     float blockSize = UVInfo[];

     float blockUVMultiple = UVInfo[];

     //normalUV

     outUV.xy = pos.xz*(tileSize-)/(tileSize*tileSize) + 0.5/tileSize;

     //block repeat UV

     outUV.zw = pos.xz*blockUVMultiple/blockSize;

     //blendUV

     outBlendUV.xy = pos.xz*(tileSize-)/(tileSize*tileSize) + blendOffset/tileSize;

     outBlendUV.zw = pos.xz/tileSize;

     //use local normal as world normal, because our terrain has no scale/rotations

     outWorldNormal = expand_vector(normal).xyz;    //ubytes4 normal ranges 0-1, need convert to [-1,1]

     //don't use full transform because our terrain has no scale/rotation

     outWorldPos = pos.xyz+world_translate.xyz;

     outPos = mul(pos, wvp_matrix);

 }

现在去掉了注释中的声明, 改成了HLSL的格式. 之前因为D3D的Effect才支持解析uniform的semantic, 所以误以为, 这种格式只有.FX才支持, 如果直接用D3DCompile会报错.

但是前几天试了一下, D3DCompile不会对unform的semantic报错, 只是直接忽略掉它了. 所以全部改成这种格式.

需要稍微加点代码手动解析semantic, 用tokenizer就可以了.

 //!BladeShaderHeader

 //![Shader]

 //!VSEntry=TerrainVSMain

 //!VSProfile=vs_3_0

 //!FSEntry=TerrainPSMain

 //!FSProfile=ps_3_0

 #include "inc/light.hlsl"

 #include "inc/common.hlsl"

 #include "inc/terrain_common.hlsl"

 void TerrainVSMain(

     float2 hpos        : POSITION0,

     float2 vpos        : POSITION1,

     float4 normal    : NORMAL0,        //ubyte4-n normal

     uniform float4x4 wvp_matrix : WORLD_VIEWPROJ_MATRIX,

     uniform float4 world_translate : WORLD_POSITION,

     uniform float4 scaleFactor : _SHADER_,        //per shader custom variable: scale

     uniform float4 UVInfo : _SHADER_,            //per shader custom variable: uv information

     out    float4 outPos : POSITION,

     out    float4 outUV  : TEXCOORD0,

     out float4 outBlendUV : TEXCOORD1,

     out float3 outWorldPos : TEXCOORD2,

     out float3 outWorldNormal : TEXCOORD3

     )

 {

     float4 pos = float4(hpos.x, getMorphHeight(vpos, hpos+world_translate.xz, eye_position.xz), hpos.y, );

     pos = pos*scaleFactor;

     float blendOffset = UVInfo[];

     float tileSize = UVInfo[];

     float blockSize = UVInfo[];

     float blockUVMultiple = UVInfo[];

     //normalUV

     outUV.xy = pos.xz*(tileSize-)/(tileSize*tileSize) + 0.5/tileSize;

     //block repeat UV

     outUV.zw = pos.xz*blockUVMultiple/blockSize;

     //blendUV

     outBlendUV.xy = pos.xz*(tileSize-)/(tileSize*tileSize) + blendOffset/tileSize;

     outBlendUV.zw = pos.xz/tileSize;

     //use local normal as world normal, because our terrain has no rotations

     outWorldNormal = expand_vector(normal).xyz;    //ubytes4 normal ranges 0-1, need convert to [-1,1]

     //don't use full transform because our terrain has no scale/rotation

     outWorldPos = pos.xyz+world_translate.xyz;

     outPos = mul(pos, wvp_matrix);

 }

关于shader变量, WORLD_VIEWPORJ_MATRIX是blade的FX framework内置的变量, 而"_SHADER_"这个semantic, 仅仅是表示这个变量是模块自定义的shader变量, framework没有内置, 用户模块(如例子中的地形模块)需要根据变量名字, 直接设置/更新该变量. 至少需要设置一次, 如果没有变化, 就不需要再更新它的值. 这个变量的CPU数据是由material/FX framework 自动根据变量类型分配的内存, 保留在shader/instance/global shader constant table里面.

后面有空了做ETC2/EAC的纹理压缩. 目前移植相对来说工作量不大, 可能适配和优化会花时间. 主要还是平台无关的core feature都不完善, 以后会集中做这些, 否则移植了意义也不是很大. 只要core feature和游戏代码有了, 即使出了新平台应该也能很快适配. 当然游戏的工程量跟引擎不是一个数量级, 希望以后有机会可以跟人合作.

GLES 3.0 有了UBO, 这也是一个优化点. 不过我觉得UBO的接口不暴露出来比较好, 而是放在IRenderDevice的implementation里面, 这样对于没有constant buffer的API来说, 可以不用关心其接口.

当然也可以抽象出接口, 对于不支持的API(比如Direct3D9),可以用某些方法模拟, 之前提到过Ogre的数组缓冲方式, 最后一次性提交.

这个特性先放一放, 以后实现DX11/DX12的时候, 可以综合对照一下, 看看接口如何抽象最好.

码农公寓

相关文章