一、X264性能分析
测试环境
测试环境:Intel Pentium4 3.00GHz (双核cpu),开启超线程
内存: DDR 1.00G
操作系统: Windows sever 2003 Enterprise Edition
分析软件: Intel(R) VTune(TM) Performance Analyzer 8.0(评估版lic)
编译软件: VC71+nasm0.98
Bus Speed: 800MHz
测试程序: X264 20060506 编码器
1、Debug版本
编码参数:
X264 -fps -o foreman.264 forman.cif 352x288
编码400frames,编码效率:23fps左右(libx264 debug版本),35fps(libx264 release版本),提高了10fps以上,比较可观
2、
编码参数:
X264 -fps --no-asm -o foreman.264 forman.cif 352x288
--no-asm,Disable all CPU optimizations即未使用mmx,mmxext, sse,sse2,3dNow,3dnow ext,altivec等汇编指令优化。
编码400frames,编码效率2.67fps(libx264 debug版本),12.67fps(libx264 release版本),提高了10fps
Clockticks per Instructions Retired (CPI)表示该程序段的平均执行一条指令所需的时钟周期数,CPI越大表示该程序段调用的浮点数操作,乘法,除法,I/O处理,系统调用或文
件访问等代价昂贵的操作较多。
Instructions Retired events, 表示执行的指令数,越大表示该模块调用的较多.
Clockticks events 则表示该模块所消耗的时钟周期数,一般Clockticks events = Instructions Retired events * Clockticks per Instructions Retired (CPI),越大表示该模块消耗的时间越多,后面的Clockticks %则表示该模块的在所有程序中的时耗百分比.
这里有一点需要注意:(还是举例吧),例如要分析视频编码中去块滤波器算法/程序的时耗,并不是一个 x264_frame_deblocking_filter函数的时间消耗就是所有x264编解码过程中的时间消耗,由于 x264_frame_deblocking_filter调用deblck_edge,x264_clip3(该函数也被其他函数所调用)函数,而 deblock_edge下又调用x264_deblock_v8_luma_mmxext, x264_deblock_h_luma_mmxext,x_264_deblock_h_chroma_mmxext, deblock_luma_intra_c, x264_deblock_v_chroma_mmxext(这些函数通过指针重定义的方式以适应于不同的硬件平台,比如Intel,AMD的CPU采用 不同的指令系统,其实Mplayer,FFMPEG,T264等软件都采用类似的重定义方式,已达到一个软件使用与不同构架/平台,如 arm,powerpc,x86等)等函数。那么这里如果统计去块滤波器的算法的时间消耗百分比,就需要将该函数及其所有调用的子函数的时间消耗都计算在 内,x264_deblock_****都是唯一被deblock_edge调用,但对于x264_clip3,并不仅仅是去块滤波器部分调用,那么就只 能部分计算在去块滤波器之内,至于部分是多少要根据个函数的调用次数,这里不确定。
相关x264时耗分析数据后面的表格。deblock占4.3%左右,quant+dequant占3.3%左右,DCT+IDCT占1.1%左 右,主要是运动估计和运动补偿,ME中大量的sad/satd的计算,MC中的六阶滤波器tap_filter是主要时耗,具体我没有太细统计将近20% 左右,x264中由于采用了算法优化,程序优化及mmx,sse,sse2等指令优化,将原本消耗较大的去块滤波器等都有了较大程度地优化。
这里再讨论一下程序性能优化技术,程序性能优化可以大致从3个部分考虑。
1、算法结构优化,实现同样的应用功能可采用多种不同的算法和方 法,比如H.264种的运动估计全搜索和快速运动估计算法,实现的编码效率基本一致,但是处理时间可以节省10~20倍,所以需要选择高效的算法。还有递 归算法非递归化,递归算法使得程序结构清晰,可读性高,但却需要执行大量的过程调用,堆栈保存等,运行效率低下。
2、编译优化,现在很多编译器都实现了较强的代码优化功能,多数编译器都基于数据流分析以实现别名分析(通过变量重命名来消除数据相关,提高流水线 的执行效率),常数折叠,公共子表达式消除、冗余代码删除,循环逆转和循环展开等与体系结构无关的优化,例如GNU gcc就是个很好的编译工具。还有借用并行程序设计技术,进行相关性分析,并通过相应技术是程序具有更好的局部性以提高Cashe命中率。对于GCC中采 用-O -O2 -O3 -O4等选项选择针对速度/面积等性能优化,另外debug版本由于程序中加入较多的debug参数,影响程序效率,上面x264的debug和 release运行效率的对比可见一斑.编译优化属于静态优化,由编译器自动完成,但是编译器很难得到程序的语义信息,算法流程等信息。所以需要我们手工 编程优化以最大程度提高程序运行效率
3、程序优化,包括a)使用inline函数,很多编译器支持inline关键字,减少函数调用开销却增加了代码量。b)针对程序运行平台,如 x86(Intel),Xscale,ARM,DSP等不同构架,可采用相应的汇编优化,将主要时耗部分/循环调用等,进行汇编指令优化 MMX,SSE,WiMMX,ARM/Thumb指令,DSP汇编等,或者采用专用的库函数,如针对Intel CPU/Xscale构架的嵌入式系统(PXA255,PXA270等)可使用IPP/GPP库,提高程序效率。c)对于DSP系统,由于有多个并行处理 单元,编译器会并行优化,所以需要尽量减少频繁小循环跳转,将循环展开,同时减少循环或内层循环也可以提高CPU的流线效率,尽量不断流。d)在 Switch语句中根据发生频率排序case语句,编译器对于switch语句将生成if-else-if的嵌套代码,按概率排序可提高效率 (FPGA/CPLD等逻辑器件中,采用VHDL语言描述的switch是生成多个逻辑器件,并且完全并行的)。e)减少函数调用参数. f)减少耗时的浮点数操作,除法操作等降低CPI。
Size |
Function |
Clockticks per Instructions Retired (CPI) |
Instructions Retired events |
Clockticks events |
Clockticks % |
Source File |
4917 |
refine_subpel |
3.050938338 |
1119000000 |
3414000000 |
6.582219909 |
f:\x264-060506\x264-060506\encoder\me.c |
176 |
x264_mc_chroma_mmxext |
1.463709677 |
2232000000 |
3267000000 |
6.298802707 |
|
21502 |
x264_me_search_ref |
2.515923567 |
942000000 |
2370000000 |
4.569379374 |
f:\x264-060506\x264-060506\encoder\me.c |
880 |
x264_pixel_satd_8x8_sse2 |
1.43551797 |
1419000000 |
2037000000 |
3.927352652 |
|
99 |
RTC_CheckStackVars |
3.563157895 |
570000000 |
2031000000 |
3.915784603 |
|
3296 |
x264_pixel_satd_16x16_sse2 |
1.54047619 |
1260000000 |
1941000000 |
3.742263867 |
|
237 |
get_ref_mmx |
1.725925926 |
810000000 |
1398000000 |
2.695355428 |
f:\x264-060506\x264-060506\common\i386\mc-c.c |
1183 |
block_residual_write_cabac |
3.15862069 |
435000000 |
1374000000 |
2.649083232 |
f:\x264-060506\x264-060506\encoder\cabac.c |
6480 |
x264_macroblock_analyse |
24.05555556 |
54000000 |
1299000000 |
2.504482619 |
f:\x264-060506\x264-060506\encoder\analyse.c |
272 |
x264_pixel_satd_4x4_mmxext |
1.229850746 |
1005000000 |
1236000000 |
2.383018104 |
|
80 |
x264_pixel_avg_w16_mmxext |
2.096045198 |
531000000 |
1113000000 |
2.145873099 |
|
232 |
x264_mb_decimate_score |
1.354085603 |
771000000 |
1044000000 |
2.012840534 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
64 |
x264_pixel_avg_w8_mmxext |
1.756906077 |
543000000 |
954000000 |
1.839319799 |
|
2413 |
x264_frame_deblocking_filter |
1.703910615 |
537000000 |
915000000 |
1.76412748 |
f:\x264-060506\x264-060506\common\frame.c |
2491 |
x264_macroblock_cache_save |
2.152173913 |
414000000 |
891000000 |
1.717855284 |
f:\x264-060506\x264-060506\common\macroblock.c |
656 |
x264_center_filter_mmxext |
1.211864407 |
708000000 |
858000000 |
1.654231014 |
|
146 |
quant_4x4 |
2.989247312 |
279000000 |
834000000 |
1.607958818 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
5930 |
x264_macroblock_cache_load |
2.090225564 |
399000000 |
834000000 |
1.607958818 |
f:\x264-060506\x264-060506\common\macroblock.c |
206 |
x264_cabac_encode_renorm |
2.125984252 |
381000000 |
810000000 |
1.561686622 |
f:\x264-060506\x264-060506\common\cabac.c |
83 |
array_non_zero_count |
1.191964286 |
672000000 |
801000000 |
1.544334548 |
f:\x264-060506\x264-060506\encoder\macroblock.h |
96 |
memset |
9.464285714 |
84000000 |
795000000 |
1.532766499 |
F:\VS70Builds\3077\vc\crtbld\crt\src\intel\memset.asm |
363 |
predict_16x16_p |
1.095435685 |
723000000 |
792000000 |
1.526982474 |
f:\x264-060506\x264-060506\common\predict.c |
184 |
x264_cabac_encode_decision |
2.371428571 |
315000000 |
747000000 |
1.440222107 |
f:\x264-060506\x264-060506\common\cabac.c |
37 |
_RTC_CheckEsp |
1.707142857 |
420000000 |
717000000 |
1.382381861 |
|
3693 |
x264_macroblock_encode |
2.890243902 |
246000000 |
711000000 |
1.370813812 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
47 |
x264_clip_uint8 |
1.317365269 |
501000000 |
660000000 |
1.272485395 |
f:\x264-060506\x264-060506\common\clip1.h |
304 |
x264_quant_4x4_core15_mmx |
1.674796748 |
369000000 |
618000000 |
1.191509052 |
|
2091 |
x264_mb_analyse_intra |
1.844036697 |
327000000 |
603000000 |
1.162588929 |
f:\x264-060506\x264-060506\encoder\analyse.c |
1680 |
x264_pixel_satd_8x16_sse2 |
1.144508671 |
519000000 |
594000000 |
1.145236856 |
|
1696 |
x264_pixel_satd_16x8_sse2 |
1.449612403 |
387000000 |
561000000 |
1.081612586 |
|
164 |
motion_compensation_chroma_mmxext |
1.459677419 |
372000000 |
543000000 |
1.046908439 |
f:\x264-060506\x264-060506\common\mc.c |
328 |
deblock_edge |
1.594059406 |
303000000 |
483000000 |
0.931227948 |
f:\x264-060506\x264-060506\common\frame.c |
363 |
predict_8x8c_p |
1.453703704 |
324000000 |
471000000 |
0.90809185 |
f:\x264-060506\x264-060506\common\predict.c |
176 |
x264_macroblock_cache_mv |
1.662650602 |
249000000 |
414000000 |
0.798195384 |
f:\x264-060506\x264-060506\common\macroblock.h |
71 |
x264_clip3 |
1.666666667 |
216000000 |
360000000 |
0.694082943 |
f:\x264-060506\x264-060506\common\common.h |
121 |
x264_macroblock_cache_ref |
2.333333333 |
153000000 |
357000000 |
0.688298918 |
f:\x264-060506\x264-060506\common\macroblock.h |
272 |
x264_horizontal_filter_mmxext |
1.216494845 |
291000000 |
354000000 |
0.682514894 |
|
1104 |
x264_pixel_sad_x4_16x16_sse2 |
4.423076923 |
78000000 |
345000000 |
0.66516282 |
|
480 |
x264_pixel_satd_8x4_sse2 |
1.430379747 |
237000000 |
339000000 |
0.653594771 |
|
496 |
x264_deblock_v8_luma_mmxext |
1.066666667 |
315000000 |
336000000 |
0.647810747 |
|
432 |
x264_pixel_sad_x4_8x8_mmxext |
1.671641791 |
201000000 |
336000000 |
0.647810747 |
|
288 |
x264_pixel_sad_16x16_sse2 |
4.608695652 |
69000000 |
318000000 |
0.6131066 |
|
910 |
x264_mb_predict_mv |
2.363636364 |
132000000 |
312000000 |
0.601538551 |
f:\x264-060506\x264-060506\common\macroblock.c |
106 |
bs_write1 |
2.666666667 |
117000000 |
312000000 |
0.601538551 |
f:\x264-060506\x264-060506\common\bs.h |
224 |
x264_sub4x4_dct_mmx |
1.16091954 |
261000000 |
303000000 |
0.584186477 |
|
211 |
scan_zigzag_4x4full |
1.672727273 |
165000000 |
276000000 |
0.532130256 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
656 |
x264_deblock_h_luma_mmxext |
3.214285714 |
84000000 |
270000000 |
0.520562207 |
|
227 |
predict_16x16_dc |
2.378378378 |
111000000 |
264000000 |
0.508994158 |
f:\x264-060506\x264-060506\common\predict.c |
496 |
x264_pixel_satd_4x8_mmxext |
1.242857143 |
210000000 |
261000000 |
0.503210134 |
|
960 |
x264_pixel_ssd_16x16_sse2 |
4.315789474 |
57000000 |
246000000 |
0.474290011 |
|
33 |
abs |
1.860465116 |
129000000 |
240000000 |
0.462721962 |
f:\vs70builds\3077\vc\crtbld\crt\src\abs.c |
864 |
x264_pixel_sad_x3_16x16_sse2 |
3.391304348 |
69000000 |
234000000 |
0.451153913 |
|
962 |
x264_mb_analyse_inter_p8x8 |
1.948717949 |
117000000 |
228000000 |
0.439585864 |
f:\x264-060506\x264-060506\encoder\analyse.c |
3064 |
x264_macroblock_write_cabac |
2.62962963 |
81000000 |
213000000 |
0.410665741 |
f:\x264-060506\x264-060506\encoder\cabac.c |
1209 |
x264_mb_encode_8x8_chroma |
2.379310345 |
87000000 |
207000000 |
0.399097692 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
829 |
memcpy |
11 |
18000000 |
198000000 |
0.381745619 |
F:\VS70Builds\3077\vc\crtbld\crt\src\intel\memcpy.asm |
386 |
predict_8x8c_dc |
2.52 |
75000000 |
189000000 |
0.364393545 |
f:\x264-060506\x264-060506\common\predict.c |
202 |
bs_write |
1.909090909 |
99000000 |
189000000 |
0.364393545 |
f:\x264-060506\x264-060506\common\bs.h |
352 |
x264_pixel_sad_x3_8x8_mmxext |
2.172413793 |
87000000 |
189000000 |
0.364393545 |
|
144 |
x264_pixel_sad_8x8_mmxext |
2.384615385 |
78000000 |
186000000 |
0.358609521 |
|
156 |
predict_16x16_h |
2 |
90000000 |
180000000 |
0.347041471 |
f:\x264-060506\x264-060506\common\predict.c |
178 |
predict_16x16_v |
2.52173913 |
69000000 |
174000000 |
0.335473422 |
f:\x264-060506\x264-060506\common\predict.c |
128 |
x264_mc_copy_w16_mmx |
9.666666667 |
18000000 |
174000000 |
0.335473422 |
|
405 |
x264_cabac_mb_mvd_cpn |
2.192307692 |
78000000 |
171000000 |
0.329689398 |
f:\x264-060506\x264-060506\encoder\cabac.c |
161 |
x264_cabac_putbit |
1.4 |
120000000 |
168000000 |
0.323905373 |
f:\x264-060506\x264-060506\common\cabac.c |
304 |
x264_dequant_4x4_mmx |
2.545454545 |
66000000 |
168000000 |
0.323905373 |
|
592 |
x264_pixel_sad_x4_16x8_sse2 |
2.291666667 |
72000000 |
165000000 |
0.318121349 |
|
103 |
x264_median |
2.6 |
60000000 |
156000000 |
0.300769275 |
f:\x264-060506\x264-060506\common\common.h |
398 |
predict_4x4_ddl |
1.5625 |
96000000 |
150000000 |
0.289201226 |
f:\x264-060506\x264-060506\common\predict.c |
272 |
x264_add4x4_idct_mmx |
1.139534884 |
129000000 |
147000000 |
0.283417202 |
|
418 |
x264_cabac_mb_cbp_luma |
2.666666667 |
54000000 |
144000000 |
0.277633177 |
f:\x264-060506\x264-060506\encoder\cabac.c |
414 |
predict_4x4_ddr |
2.285714286 |
63000000 |
144000000 |
0.277633177 |
f:\x264-060506\x264-060506\common\predict.c |
405 |
predict_4x4_vl |
1.777777778 |
81000000 |
144000000 |
0.277633177 |
f:\x264-060506\x264-060506\common\predict.c |
1455 |
x264_mb_predict_mv_ref16x16 |
3.692307692 |
39000000 |
144000000 |
0.277633177 |
f:\x264-060506\x264-060506\common\macroblock.c |
1181 |
x264_mb_analyse_inter_p16x16 |
4.6 |
30000000 |
138000000 |
0.266065128 |
f:\x264-060506\x264-060506\encoder\analyse.c |
176 |
x264_macroblock_cache_mvd |
1.769230769 |
78000000 |
138000000 |
0.266065128 |
f:\x264-060506\x264-060506\common\macroblock.h |
816 |
x264_pixel_sad_x4_8x16_mmxext |
1.769230769 |
78000000 |
138000000 |
0.266065128 |
|
199 |
scan_zigzag_4x4 |
2.045454545 |
66000000 |
135000000 |
0.260281104 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
446 |
predict_4x4_mode_available |
2.25 |
60000000 |
135000000 |
0.260281104 |
f:\x264-060506\x264-060506\encoder\analyse.c |
1148 |
x264_mb_analyse_inter_p16x8 |
3.142857143 |
42000000 |
132000000 |
0.254497079 |
f:\x264-060506\x264-060506\encoder\analyse.c |
1746 |
x264_mb_analyse_init |
8.2 |
15000000 |
123000000 |
0.237145005 |
f:\x264-060506\x264-060506\encoder\analyse.c |
511 |
x264_mb_analyse_intra_chroma |
2.733333333 |
45000000 |
123000000 |
0.237145005 |
f:\x264-060506\x264-060506\encoder\analyse.c |
425 |
predict_4x4_hd |
1.28125 |
96000000 |
123000000 |
0.237145005 |
f:\x264-060506\x264-060506\common\predict.c |
425 |
predict_4x4_vr |
1.413793103 |
87000000 |
123000000 |
0.237145005 |
f:\x264-060506\x264-060506\common\predict.c |
122 |
predict_8x8c_h |
1.952380952 |
63000000 |
123000000 |
0.237145005 |
f:\x264-060506\x264-060506\common\predict.c |
425 |
x264_mb_encode_i4x4 |
2.105263158 |
57000000 |
120000000 |
0.231360981 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
464 |
x264_pixel_sad_x3_16x8_sse2 |
5 |
24000000 |
120000000 |
0.231360981 |
|
672 |
x264_pixel_sad_x3_8x16_mmxext |
2.666666667 |
45000000 |
120000000 |
0.231360981 |
|
297 |
predict_4x4_hu |
1.772727273 |
66000000 |
117000000 |
0.225576956 |
f:\x264-060506\x264-060506\common\predict.c |
120 |
predict_8x8c_v |
3.083333333 |
36000000 |
111000000 |
0.214008907 |
f:\x264-060506\x264-060506\common\predict.c |
464 |
x264_deblock_h_chroma_mmxext |
1.166666667 |
90000000 |
105000000 |
0.202440858 |
|
240 |
x264_pixel_sad_8x16_mmxext |
1.888888889 |
54000000 |
102000000 |
0.196656834 |
|
1104 |
x264_mb_analyse_inter_p8x16 |
3 |
33000000 |
99000000 |
0.190872809 |
f:\x264-060506\x264-060506\encoder\analyse.c |
176 |
x264_pixel_sad_16x8_sse2 |
3.666666667 |
27000000 |
99000000 |
0.190872809 |
|
194 |
x264_cabac_encode_bypass |
1.192307692 |
78000000 |
93000000 |
0.17930476 |
f:\x264-060506\x264-060506\common\cabac.c |
836 |
x264_cabac_mb_cbf_ctxidxinc |
1.875 |
48000000 |
90000000 |
0.173520736 |
f:\x264-060506\x264-060506\encoder\cabac.c |
80 |
x264_mc_copy_w8_mmx |
3 |
30000000 |
90000000 |
0.173520736 |
|
1385 |
x264_slice_write |
4.833333333 |
18000000 |
87000000 |
0.167736711 |
f:\x264-060506\x264-060506\encoder\encoder.c |
680 |
deblock_luma_intra_c |
2.153846154 |
39000000 |
84000000 |
0.161952687 |
f:\x264-060506\x264-060506\common\frame.c |
503 |
x264_mb_mc_0xywh |
1.857142857 |
42000000 |
78000000 |
0.150384638 |
f:\x264-060506\x264-060506\common\macroblock.c |
134 |
predict_4x4_dc |
5 |
15000000 |
75000000 |
0.144600613 |
f:\x264-060506\x264-060506\common\predict.c |
577 |
x264_mb_predict_mv_16x16 |
2.5 |
30000000 |
75000000 |
0.144600613 |
f:\x264-060506\x264-060506\common\macroblock.c |
324 |
plane_expand_border |
6.25 |
12000000 |
75000000 |
0.144600613 |
f:\x264-060506\x264-060506\common\frame.c |
272 |
x264_deblock_v_chroma_mmxext |
1.642857143 |
42000000 |
69000000 |
0.133032564 |
|
123 |
x264_sub8x8_dct_mmx |
1.111111111 |
54000000 |
60000000 |
0.11568049 |
f:\x264-060506\x264-060506\common\i386\dct-c.c |
1359 |
x264_macroblock_probe_skip |
2.714285714 |
21000000 |
57000000 |
0.109896466 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
305 |
x264_cabac_mb_mvd |
3.4 |
15000000 |
51000000 |
0.098328417 |
f:\x264-060506\x264-060506\encoder\cabac.c |
1880 |
x264_analyse_update_cache |
4 |
12000000 |
48000000 |
0.092544392 |
f:\x264-060506\x264-060506\encoder\analyse.c |
64 |
array_non_zero |
4.666666667 |
9000000 |
42000000 |
0.080976343 |
f:\x264-060506\x264-060506\encoder\macroblock.h |
271 |
x264_mb_dequant_2x2_dc |
3.5 |
12000000 |
42000000 |
0.080976343 |
f:\x264-060506\x264-060506\common\quant.c |
266 |
mc_luma_mmx |
2.166666667 |
18000000 |
39000000 |
0.075192319 |
f:\x264-060506\x264-060506\common\i386\mc-c.c |
199 |
dct2x2dc |
3.25 |
12000000 |
39000000 |
0.075192319 |
f:\x264-060506\x264-060506\common\dct.c |
149 |
quant_2x2_dc |
1 |
33000000 |
33000000 |
0.06362427 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
320 |
x264_cabac_mb_cbp_chroma |
2.75 |
12000000 |
33000000 |
0.06362427 |
f:\x264-060506\x264-060506\encoder\cabac.c |
61 |
_alloca_probe |
|
0 |
33000000 |
0.06362427 |
F:\VS70Builds\3077\vc\crtbld\crt\src\intel\chkstk.asm |
38 |
x264_me_search |
2.5 |
12000000 |
30000000 |
0.057840245 |
f:\x264-060506\x264-060506\encoder\analyse.c |
145 |
x264_mb_predict_intra4x4_mode |
1 |
30000000 |
30000000 |
0.057840245 |
f:\x264-060506\x264-060506\common\macroblock.c |
194 |
x264_mb_predict_mv_pskip |
|
0 |
30000000 |
0.057840245 |
f:\x264-060506\x264-060506\common\macroblock.c |
279 |
x264_nal_encode |
2 |
15000000 |
30000000 |
0.057840245 |
f:\x264-060506\x264-060506\common\common.c |
172 |
x264_cabac_mb_skip |
3 |
9000000 |
27000000 |
0.052056221 |
f:\x264-060506\x264-060506\encoder\cabac.c |
59 |
predict_4x4_v |
4.5 |
6000000 |
27000000 |
0.052056221 |
f:\x264-060506\x264-060506\common\predict.c |
1253 |
x264_mb_mc |
3 |
9000000 |
27000000 |
0.052056221 |
f:\x264-060506\x264-060506\common\macroblock.c |
74 |
x264_deblock_v_luma_mmxext |
1.125 |
24000000 |
27000000 |
0.052056221 |
f:\x264-060506\x264-060506\common\frame.c |
227 |
predict_8x8chroma_mode_available |
4 |
6000000 |
24000000 |
0.046272196 |
f:\x264-060506\x264-060506\encoder\analyse.c |
47 |
bs_size_te |
|
0 |
24000000 |
0.046272196 |
f:\x264-060506\x264-060506\common\bs.h |
1131 |
x264_cabac_mb_type |
1 |
21000000 |
21000000 |
0.040488172 |
f:\x264-060506\x264-060506\encoder\cabac.c |
80 |
predict_4x4_h |
3.5 |
6000000 |
21000000 |
0.040488172 |
f:\x264-060506\x264-060506\common\predict.c |
17 |
x264_ratecontrol_qp |
|
0 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\encoder\ratecontrol.c |
55 |
scan_zigzag_2x2_dc |
2 |
9000000 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
115 |
x264_macroblock_encode_skip |
2 |
9000000 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
279 |
x264_mb_analyse_transform |
6 |
3000000 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\encoder\analyse.c |
138 |
x264_sub16x16_dct_mmx |
1.5 |
12000000 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\common\i386\dct-c.c |
125 |
bs_size_ue |
1 |
18000000 |
18000000 |
0.034704147 |
f:\x264-060506\x264-060506\common\bs.h |
142 |
x264_me_refine_qpel |
2.5 |
6000000 |
15000000 |
0.028920123 |
f:\x264-060506\x264-060506\encoder\me.c |
233 |
x264_mb_analyse_load_costs |
1.666666667 |
9000000 |
15000000 |
0.028920123 |
f:\x264-060506\x264-060506\encoder\analyse.c |
14 |
_security_check_cookie |
2.5 |
6000000 |
15000000 |
0.028920123 |
f:\vs70builds\3077\vc\crtbld\crt\src\secchk.c |
444 |
x264_cabac_mb8x8_mvd |
4 |
3000000 |
12000000 |
0.023136098 |
f:\x264-060506\x264-060506\encoder\cabac.c |
312 |
x264_cabac_mb_qp_delta |
4 |
3000000 |
12000000 |
0.023136098 |
f:\x264-060506\x264-060506\encoder\cabac.c |
207 |
predict_16x16_dc_top |
2 |
6000000 |
12000000 |
0.023136098 |
f:\x264-060506\x264-060506\common\predict.c |
213 |
predict_8x8c_dc_top |
|
0 |
12000000 |
0.023136098 |
f:\x264-060506\x264-060506\common\predict.c |
39 |
x264_cabac_pos |
4 |
3000000 |
12000000 |
0.023136098 |
f:\x264-060506\x264-060506\common\cabac.h |
368 |
x264_deblock_h_chroma_intra_mmxext |
1.333333333 |
9000000 |
12000000 |
0.023136098 |
|
5058 |
x264_encoder_encode |
|
0 |
9000000 |
0.017352074 |
f:\x264-060506\x264-060506\encoder\encoder.c |
768 |
x264_mb_cache_mv_p8x8 |
|
0 |
9000000 |
0.017352074 |
f:\x264-060506\x264-060506\encoder\analyse.c |
211 |
predict_16x16_dc_left |
1.5 |
6000000 |
9000000 |
0.017352074 |
f:\x264-060506\x264-060506\common\predict.c |
299 |
predict_8x8c_dc_left |
0.75 |
12000000 |
9000000 |
0.017352074 |
f:\x264-060506\x264-060506\common\predict.c |
91 |
x264_cabac_encode_terminal |
|
0 |
9000000 |
0.017352074 |
f:\x264-060506\x264-060506\common\cabac.c |
104 |
_aulldiv |
3 |
3000000 |
9000000 |
0.017352074 |
F:\VS70Builds\3077\vc\crtbld\crt\src\intel\ulldiv.asm |
341 |
x264_macroblock_encode_pskip |
|
0 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
130 |
x264_psnr |
|
0 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\encoder\encoder.c |
950 |
x264_slice_header_write |
|
0 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\encoder\encoder.c |
161 |
x264_cabac_encode_ue_bypass |
1 |
6000000 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\encoder\cabac.c |
100 |
x264_add8x8_idct_mmx |
0.333333333 |
18000000 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\common\i386\dct-c.c |
69 |
plane_copy |
|
0 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\common\csp.c |
142 |
x264_cabac_context_init |
2 |
3000000 |
6000000 |
0.011568049 |
f:\x264-060506\x264-060506\common\cabac.c |
3526 |
_output |
|
0 |
6000000 |
0.011568049 |
f:\vs70builds\3077\vc\crtbld\crt\src\output.c |
52 |
_allmul |
2 |
3000000 |
6000000 |
0.011568049 |
F:\VS70Builds\3077\vc\crtbld\crt\src\intel\llmul.asm |
173 |
__add_12 |
|
0 |
6000000 |
0.011568049 |
|
600 |
_log_pentium4 |
|
0 |
6000000 |
0.011568049 |
|
96 |
x264_quant_2x2_dc_core16_mmxext |
|
0 |
6000000 |
0.011568049 |
|
802 |
x264_ratecontrol_start |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\encoder\ratecontrol.c |
1051 |
x264_mb_encode_i16x16 |
1 |
3000000 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\encoder\macroblock.c |
139 |
x264_nal_end |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\encoder\encoder.c |
426 |
x264_slice_init |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\encoder\encoder.c |
227 |
predict_16x16_mode_available |
1 |
3000000 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\encoder\analyse.c |
1625 |
x264_cqm_init |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\set.c |
430 |
x264_mb_dequant_4x4_dc |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\quant.c |
137 |
predict_16x16_dc_128 |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\predict.c |
101 |
predict_8x8c_dc_128 |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\predict.c |
575 |
x264_macroblock_slice_init |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\macroblock.c |
929 |
x264_mb_mc_8x8 |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\macroblock.c |
338 |
i420_to_i420 |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\csp.c |
29 |
bs_pos |
1 |
3000000 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\bs.h |
172 |
bs_write_ue |
|
0 |
3000000 |
0.005784025 |
f:\x264-060506\x264-060506\common\bs.h |
1100 |
_read |
|
0 |
3000000 |
0.005784025 |
f:\vs70builds\3077\vc\crtbld\crt\src\read.c |
1242 |
I10_OUTPUT |
1 |
3000000 |
3000000 |
0.005784025 |
|
62 |
__addl |
|
0 |
3000000 |
0.005784025 |
|
332 |
__dtold |
|
0 |
3000000 |
0.005784025 |
|
1007 |
__ld12mul |
0.5 |
6000000 |
3000000 |
0.005784025 |
|
389 |
_cftof |
|
0 |
3000000 |
0.005784025 |
|
675 |
x264_pixel_ssd_wxh |
0 |
6000000 |
0 |
0 |
f:\x264-060506\x264-060506\common\pixel.c |
108 |
write_string |
0 |
3000000 |
0 |
0 |
f:\vs70builds\3077\vc\crtbld\crt\src\output.c |
467 |
_filbuf |
0 |
3000000 |
0 |
0 |
f:\vs70builds\3077\vc\crtbld\crt\src\_filbuf.c |
|
3、设置set.c中的sps->b_vui = 0;表示vui信息不出现在码流中 sps->b_frame_mbs_only = 1;表示采用所有图像均帧编码
4、屏蔽掉:cavlc.c中的else if( i_mb_type == B_8x8 ),else if( i_mb_type!= B_DIRECT ), else if( i_mb_type == B_DIRECT ),else if( i_mb_type == B_8x8 )等相关内容,编码级别为baseline没有B帧。
5、去掉common.h中的CHECKED_MALLOC中的if(!var)...(即检查分配内存成功与否)
6、屏蔽掉ratecontrol_en.c中的x264_ratecontrol_new中的if( h->param.rc.i_rc_method == X264_RC_CRF)..和if( h->param.rc.b_stat_read )...等相关内容,因为已经设置i_rc_method == X264_RC_NONE,参数i_rc_method表示码率控制,CQP
x264优化(二)
1、去掉assert()语句
2、去掉common.c中的x264_param_parse()函数,及其相关定义和调用,这个主要是用来检查参数赋值对不对。 并将i_rc_method直接赋值为X264_RC_CQP。
3、去掉analyse.c中的static const int i_mb_b_cost_table[19]类似的数组(B帧用到的),以及以if( h->sh.i_type == SLICE_TYPE_B )...开头的语句。
4、去掉analyse.c中的x264_mb_analyse_inter_direct(),x264_mb_analyse_inter_b16x16(),x264_mb_analyse_inter_b8x8,x264_mb_analyse_inter_b16x8,x264_mb_analyse_inter_b8x16()等五个函数,这五个函数是用来进行B帧帧间预测的,不需要用到。
5、去掉有h->sh.i_type == SLICE_TYPE_B的语句。
6、将以for(i_list = 0;i_list<(h->sh.i_type == SLICE_TYPE_B ? 2 : 1 );i_list++ )的循环去掉,因为不使用B帧只执行一次,不需循环,但需加入i_list = 0;置初值。
7、analyse.c中的x264_mb_analyse_b_rd(),和x264_refine_bidir()函数去掉。
8、去掉cavlc_en.c中的uint8_t mb_type_b_to_golomb[3][9]和sub_mb_type_b_to_golomb[13]数组
9、去掉common.c中的parse_enum
x264优化(三)
1、去掉ratecontrol.c中的parse_zones相关的三处代码
2、去掉encoder.c中的x264_encoder_close()函数中的x264_ratecontrol_summary()函数及在ratecontrol.c中的相应代码(因为在这个函数中调用了if(rc->b_abr)...;
3、去掉rate_estimate_qscale()函数,clip_qscale()函数。
4、int x264_me_refine_bidir( x264_t *h, x264_me_t *m0, x264_me_t *m1,int i_weight ) 函数(me_en.c中)及其相关函数去掉,在程序中未能调用,且有内联,占用了大量的空间。
5、bs.h中int型数据改为short型,函数返回int的除外。
6、去掉 if( analysis.i_mbrd >= 2 && h->mb.i_type != I_PCM )的内容。
7、删除me.c中的COST_MV_RD宏。
8、删除analyse.c中的x264_intra_rd_refine函数,x264_intra_rd函数,x264_mb_analyse_p_rd()x264_mb_analyse_transform_rd() (可以考虑删除所有以_rd结尾的函数或变量)
9、删除x264_rd_cost_mb,x264_rd_cost_subpart,x264_rd_cost_part,uint64_t x264_rd_cost_i8x8,x264_rd_cost_i4x4,x264_rd_cost_i8x8_chroma
10、删除me.c中COST_BIMV_SATD宏里面if(rd)的内
x264优化(四)
1、删除x264_macroblock_encode_p8x8,x264_mb_analyse_inter_p8x8_mixed_ref,x264_mb_cache_mv_b8x8,sub16x16_dct8,sub8x8_dct8,x264_psy_trellis_init。
2、删除x264_mb_predict_mv_direct16x16,static int x264_mb_predict_mv_direct16x16_spatial。
3、删除x264_mb_mc_01xywh(可能是对后向参考帧计算的,或者和B帧有关),x264_macroblock_bipred_init, x264_mb_load_mv_direct8x8,x264_mb_mc_1xywh。
4、删除x264_ratecontrol_mb,predict_row_size和predict_size函数
5、删除x264_predict_8x8_filter,scaling_list_write,transpose函数
6、删除quant_8x8,dequant_8x8。set.c中的x264_cqm_parse_file,x264_cqm_parse_jmlist,common.c中的x264_encoder_headers,x264_encoder_reconfig。
7、 frame.c中删除x264_frame_expand_border_mod16(),macroblock_en.c中删除x264_denoise_dct()
8、删除x264_mb_transform_8x8_allowed,x264_mb_analyse_transform,x264_cabac_mb_transform_size,x264_psy_trellis_init,x264_mb_cache_fenc_satd(和rd有关的函数),去掉和b_transform_8x8相关的东西。i_mb_c
x264优化(五)
1、删除和dequant8_mf有关的一个循环。h->mb.pic.p_integral,h->sh.i_num_ref_idx_l1_active,去掉(m)->integral = &h->mb.pic.p_integral[list][ref][(xoff)+(yoff)*(m)->i_stride[0]]和common.h中的uint16_t *p_integral[2][16];
2、删除void x264_rdo_init,static ALWAYS_INLINE int quant_trellis_cabac(),删除trellis_node_t结构体,x264_cabac_size_decision_noup2。
3、删除 cabac.c,cabac1.c和cabac.h文件。
4、删除x264_macroblock_cache_skip
5、去掉和cpu相关的代码。
6、去掉rdo率失真优化相关东西。
7、去掉ssim相关的代码。SSIM(structural similarity index) 一种衡量两幅图像相似度的新指标,其值越大越好,最大为1,经常用到图像处理中,特别在图像去噪处理中在图像相似度评价上全面超越SNR(signal to noise ratio)和PSNR(peak signal to noise ratio)。
x264优化(六)
1、删除get_diff_limited_q,get_qscale,parse_zone函数
2、去掉和zones相关的结构体,代码。
3、去掉类似于b_have_lowres这样的变量,这样的变量赋了初始值之后,以后if(该变量)的语句是可以预测到的,若始终为0,那么这样的if判断是可以去掉的。去掉i_aq_mode相关的一些if判断语句。
4、2pass 多次压缩码率控制 int b_stat_write; Enable stat writing in psz_stat_out char *psz_stat_out; int
x264优化(七)
1、去掉有关信噪比的计算PSNR
2、去掉x264_rc_analyse_slice,x264_lowres_context_init,函数。
3、由于DIA菱形搜索算法是最快的,这里只保留菱形搜索法,将其他算法删去。
4、i_rd16x16bi,i_rd16x16direct,i_rd16x16,i_rd16x8bi,int i_rd8x16bi,i_rd8x8bi。
5、删除x264_slicetype_mb_cost,x264_slicetype_frame_cost,x264_slicetype_path,x264_slicetype_path_search函数。
6、删除ssd_mb,ssd_plane,sum_sa8d,sum_satd。
7、删除matroska.h和matroska.c文件。
8、删除gcd函数,删除muxer.h和muxer.c文件中有关y4m,mkv,thread相关的一些内容,因为这里输入只有YUV的原始数据额格式,最后编码出来的数据也是.264的原始编码数据。
9、bs.h文件,bs_write32,bs_align_0,bs_align_1。 common.h文件,x264_predictor_difference。
10、去掉和SLICE_TYPE_B,B_SKIP,B_BI_BI,B_BI_L1,B_BI_LO,B_L1_BI,B_L1_L1,B_L1_L0,B_L0_L1,B_L0_L0,B_DIRECT有关的条件,赋值等语句。
x264优化(八)
1、 D_L1_4x4 = 4, D_L1_8x4 = 5, D_L1_4x8 = 6, D_L1_8x8 = 7, D_BI_4x4 = 8, D_BI_8x4 = 9, D_BI_4x8 = 10, D_BI_8x8 = 11, D_DIRECT = 12, 可删除。
2、x264_mb_partition_count_table[]删除,x264_pixel_ssd_wxh()删除。
3、去掉x264_mb_analysis_t里的i_mbrd变
x264优化(九)1、analyse.c文件中去掉WEIGHTED_AVG宏,删除scenecut()函数,x264_zigzag_scan2数组。2、去掉b_bframe_pyramid,i_bframe,X264_TYPE_B,X264_TYPE_BREF变量和相关代码。3、去掉Encode函数,for( i_frame = 0, i_file = 0;(i_frame < i_frame_total || i_frame_total == 0); )循环中的parse_qpfile()函数。4、去掉encode.c文件中x264_thread_sync_context()函数。5、stdint.h文件中将不必要的宏去掉。6、common.h文件中dist_scale_factor,bipred_weight,map_col_to_list0_buf,map_col_to_list0数组去掉,b_direct_auto_read,b_direct_auto_write,b_direct_spatial_mv_pred,b_sp_for_swidth,i_qs_delta,i_delay,fenc_dct8,fenc_dct4,fenc_satd,fenc_satd_sum,fenc_sa8d,fenc_sa8d_sum,i_neighbour_transform_size,i_neighbour_interlaced,i_cbp_top,i_cbp_left,i_last_dqp,i_misc_bits,i_direct_score,i_ssd_global,i_ssd,f_slice_qp,i_consecutive_bframes,i_direct_frames删除。if( h->frames.i_input <= h->frames.i_delay )循环去掉。 在Encodex264优化(十) 1、去掉局部变量未使用的变量。2、根据CCS的调试结果,去掉i_update_interval,opterr,print_errors变量。i_yuv_size,lambda2_tab[2][52],LAMBDA_BITS变量,i_left_type,i_top_type。def_dequant8,def_quant8数组。square1,hex2,mod6m1数组,quant8_scale,dequant8_scale,quant8_scan。x264_mb_cache_mv_b8x16()函数,x264_mb_cache_mv_b8x16()函数。predict_8x8_vl(),predict_8x8_hd(),predict_8x8_vr()munge_cavlc_nnz(),restore_cavlc_nnz_row(),munge_cavlc_nnz_row(),x264_atoi(),x264_atof(),3、去掉ratecontrol.c文件的expected_bits_sum,wanted_bits_window,short_term_cplxsum,short_term_cplxsum,short_term_cplxcount,rate_factor_constant,last_satd,last_rceq,cplxr_sum,cbr_decay变量,qscale2bits()函数,qscale2qp()函数。4、去掉x264_frame_t *last_nonb;5、删除slicetype.c文件。
原文地址: X264性能优化_liuchen1206的专栏-CSDN博客