heyfluke / libyuv

Automatically exported from code.google.com/p/libyuv

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

linux top bottlenecks

GoogleCodeExporter opened this issue · comments

Investigate top bottlenecks

LIBYUV_DISABLE_AVX2=1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=1000 
perf record out/Release/libyuv_unittest --gtest_filter=*
perf report

 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_ScaleTestRoundToByte_Test::T◆
 13.81%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBo▒
  4.94%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C                                  ▒
  4.07%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2                   ▒
  3.63%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_SSSE3                           ▒
  3.57%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBMatrixRow_SSSE3                      ▒
  3.06%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3                      ▒
  3.02%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3                          ▒
  2.63%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2                      ▒
  2.58%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleARGB(unsigned char const*, int, in▒
  2.57%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C                        ▒
  2.45%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2                             ▒
  2.44%  libyuv_unittest  libc-2.19.so         [.] __random_r                                     ▒
  2.23%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS                                   ▒
  1.64%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRMatrixRow_SSSE3                      ▒
  1.46%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2                      ▒
  1.29%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86                                   ▒
  1.26%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, uns▒
  1.24%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_SSSE3                           ▒
  1.21%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2                 ▒
  1.14%  libyuv_unittest  libc-2.19.so         [.] __random                                       ▒
  1.08%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C                                    ▒
  0.99%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_SSSE3                               ▒
  0.75%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2                   ▒
  0.75%  libyuv_unittest  libc-2.19.so         [.] _int_malloc       

Original issue reported on code.google.com by fbarch...@google.com on 16 Sep 2015 at 11:36

r1483 removes redundent scale rounding test.

Rounding test is still top bottleneck though on linux.

 16.52%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()

Original comment by fbarch...@google.com on 17 Sep 2015 at 5:28

The following is a complete list of C functions (there should be none)

LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*
perf report >out.txt
grep _C out.txt

     5.88%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     3.08%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     1.38%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols1_C(int, int, int, int, unsigned short const*, unsigned char*)
     1.28%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     0.52%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
     0.25%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_C
     0.14%  libyuv_unittest  libyuv_unittest      [.] libyuv::ScaleAddCols2_C(int, int, int, int, unsigned short const*, unsigned char*)
     0.07%  libyuv_unittest  libyuv_unittest      [.] ScaleColsUp2_C
     0.03%  libyuv_unittest  libyuv_unittest      [.] MirrorUVRow_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] TransposeWxH_C
     0.01%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_0_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_1_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] TransposeUVWx8_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown2Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown34_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_2_Box_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_CropNV12_Test::TestBody()
     0.00%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_C
     0.00%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVJ422Row_C


Original comment by fbarch...@google.com on 17 Sep 2015 at 6:35

LIBYUV_FLAGS=-1 LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=999 perf 
record out/Release/libyuv_unittest --gtest_filter=*

    18.31%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.47%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.05%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.81%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.43%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.08%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.86%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.72%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.60%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.48%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.47%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.30%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.08%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 23 Sep 2015 at 8:27

  • Changed state: Started
NV12ToARGB optimized
    18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
     6.50%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     5.16%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     4.83%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     2.92%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     2.83%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     2.69%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.59%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     1.75%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.61%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.48%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.45%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.40%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.26%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     0.93%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     0.92%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     0.83%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
     0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
     0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
     0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
     0.61%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
     0.57%  libyuv_unittest  libyuv_unittest      [.] next_marker
     0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3

Original comment by fbarch...@google.com on 25 Sep 2015 at 7:31

NV12 AVX2
 18.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::libyuvTest_TestRoundToByte_Test::TestBody()
  6.53%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
  5.08%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  4.84%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  3.64%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  3.42%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  3.12%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  3.00%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  2.90%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  2.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
  2.71%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.38%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  1.76%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  1.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  1.49%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.49%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.41%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.25%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
  0.99%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  0.92%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  0.91%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  0.87%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  0.84%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  0.68%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.67%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.62%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.62%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.62%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.55%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.54%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_SSSE3
  0.54%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
  0.50%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
  0.48%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
  0.47%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
  0.46%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
  0.45%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
  0.43%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
  0.42%  libyuv_unittest  libyuv_unittest      [.] I422ToBGRARow_AVX2
  0.41%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
  0.40%  libyuv_unittest  libc-2.19.so         [.] _int_free
  0.40%  libyuv_unittest  libyuv_unittest      [.] NV12ToARGBRow_AVX2

Original comment by fbarch...@google.com on 25 Sep 2015 at 11:57

TestRoundToByte is too slow.  Improve rounding and/or test

LIBYUV_REPEAT=100 out/Release/libyuv_unittest 
--gtest_filter=libyuvTest.TestRoundToByte
Note: Google Test filter = libyuvTest.TestRoundToByte
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.TestRoundToByte
[       OK ] libyuvTest.TestRoundToByte (10731 ms)
[----------] 1 test from libyuvTest (10731 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (10731 ms total)
[  PASSED  ] 1 test.

Original comment by fbarch...@google.com on 2 Oct 2015 at 6:01

LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=**TestRoundToByte* 
[       OK ] libyuvTest.TestRoundToByte (419442 ms)
[----------] 1 test from libyuvTest (419442 ms total)

Performance of 4 rounding methods on Linux GCC:

#define ROUND(f) static_cast<int>(f + 0.5)
TestRoundToByte (10731 ms)

#define ROUND(f) lrintf(f)
TestRoundToByte (7911 ms)

#define ROUND(f) static_cast<int>(round(f))
TestRoundToByte (12700 ms)

#define ROUND(f) _mm_cvt_ss2si(_mm_load_ss(&f))
TestRoundToByte (10428 ms)

Original comment by fbarch...@google.com on 2 Oct 2015 at 6:19

     7.94%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_C
     6.08%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.04%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.46%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     3.87%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.69%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.63%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     3.53%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.31%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     2.91%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.15%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     1.95%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.83%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.80%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.71%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.57%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.52%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.13%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.12%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.12%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.05%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     1.03%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.02%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3

Original comment by fbarch...@chromium.org on 2 Oct 2015 at 11:03

r1502 performance
     6.66%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.48%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.77%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.46%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     4.14%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     3.96%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.76%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.71%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C
     3.57%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     3.12%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.29%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     2.13%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     1.95%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     1.92%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.80%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
     1.75%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.63%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.47%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.22%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.21%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.21%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.11%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.08%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.90%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2

Original comment by fbarch...@chromium.org on 7 Oct 2015 at 5:47

fbarchard@fbarchard-linux:~/src/libyuv/libyuv$ runyuv10 | more
LIBYUV_WIDTH=640 LIBYUV_HEIGHT=360 LIBYUV_REPEAT=4000 
out/Release/libyuv_unittest --gtest_filter=* | grep ms | sed 
's/\(.*(\)\([0-9]*\)\( ms)\)/\2 - \1\2\3/g' | sort -rn
| sed 's/.*- \(.*\)/\1/g'
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_Linear (11452 ms)
[  FAILED  ] libyuvTest.ScaleDownBy8_Box (10933 ms)
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_Bilinear (9219 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy4_Box (6844 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Box (5228 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Bilinear (5218 ms)
[       OK ] libyuvTest.ARGBScaleClipTo1280x720_None (4465 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Box (3887 ms)
[       OK ] libyuvTest.ARGBScaleClipTo569x480_Linear (3768 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy8_Box (3407 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy8_Bilinear (3346 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom569x480_Bilinear (3327 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom352x288_Linear (3257 ms)
[       OK ] libyuvTest.ARGBScaleClipTo569x480_Bilinear (3215 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom320x240_Linear (3149 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Bilinear (3067 ms)
[       OK ] libyuvTest.TestFixedDiv (2970 ms)
[       OK ] libyuvTest.TestFixedDiv1_Opt (2970 ms)
[       OK ] libyuvTest.TestFixedDiv_Opt (2966 ms)
[       OK ] libyuvTest.ARGBScaleDownBy4_Box (2903 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Bilinear (2869 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Box (2852 ms)
[       OK ] libyuvTest.ARGBScaleDownBy8_Bilinear (2837 ms)
[       OK ] libyuvTest.ScaleTo1280x720_Linear (2825 ms)
[  FAILED  ] libyuvTest.ScaleDownBy3_Box (2764 ms)
[       OK ] libyuvTest.ARGBScaleDownBy8_Box (2744 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom352x288_Bilinear (2629 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom320x240_Bilinear (2512 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Any (2412 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Unaligned (2390 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Opt (2379 ms)
[       OK ] libyuvTest.I420ToRGB565Dither_Invert (2379 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by8_Linear (2148 ms)
[       OK ] libyuvTest.ARGBScaleClipFrom569x480_Linear (2141 ms)
[       OK ] libyuvTest.ARGBScaleTo1280x720_Bilinear (2138 ms)
[       OK ] libyuvTest.ARGBScaleDownClipBy3by4_Linear (2123 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Invert (2040 ms)
[       OK ] libyuvTest.ARGBScaleTo1280x720_Linear (2038 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Opt (2019 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Unaligned (2017 ms)
[       OK ] libyuvTest.ARGBToRGB565Dither_Any (2007 ms)

Original comment by fbarch...@chromium.org on 7 Oct 2015 at 5:24

     6.80%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
     6.59%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
     4.93%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
     4.57%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
     4.30%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
     4.08%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
     3.99%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
     3.68%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
     3.22%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
     2.37%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
     2.15%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
     2.02%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
     2.00%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
     1.89%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
     1.87%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
     1.83%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
     1.68%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
     1.25%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
     1.25%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
     1.23%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
     1.15%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
     1.12%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
     0.91%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
     0.90%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
     0.88%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
     0.85%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
     0.84%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
     0.81%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
     0.74%  libyuv_unittest  libyuv_unittest      [.] next_marker
     0.74%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_SSSE3
     0.72%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C
     0.67%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB1555Row_SSE2
     0.65%  libyuv_unittest  libyuv_unittest      [.] ARGBScaleClip
     0.64%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_Any_AVX2
     0.62%  libyuv_unittest  libyuv_unittest      [.] ARGBToUVRow_AVX2
     0.61%  libyuv_unittest  libyuv_unittest      [.] ARGBToYJRow_AVX2
     0.58%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV422Row_SSSE3
     0.57%  libyuv_unittest  libyuv_unittest      [.] I422ToBGRARow_AVX2
     0.56%  libyuv_unittest  libyuv_unittest      [.] I422ToRGBARow_AVX2
     0.53%  libyuv_unittest  libc-2.19.so         [.] _int_free
     0.49%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBTestFilter(int, int, int, int, libyuv::FilterMode, int, int)
     0.49%  libyuv_unittest  libyuv_unittest      [.] ScaleRowDown38_3_Box_SSSE3
     0.49%  libyuv_unittest  libyuv_unittest      [.] ARGB1555ToARGBRow_SSE2
     0.45%  libyuv_unittest  libyuv_unittest      [.] ARGBUnattenuateRow_AVX2
     0.43%  libyuv_unittest  libyuv_unittest      [.] ARGBBlendRow_SSSE3
     0.42%  libyuv_unittest  libyuv_unittest      [.] ARGBToARGB4444Row_SSE2
     0.39%  libyuv_unittest  libyuv_unittest      [.] I411ToARGBRow_SSSE3
     0.39%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_AVX2

Original comment by fbarch...@chromium.org on 8 Oct 2015 at 3:16

r1513
  6.88%  libyuv_unittest  libyuv_unittest      [.] InterpolateRow_AVX2
  6.73%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEvenBox_SSE2
  4.98%  libyuv_unittest  libyuv_unittest      [.] ScaleFilterCols_SSSE3
  4.69%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBFilterCols_SSSE3
  4.31%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDown2Box_SSE2
  4.12%  libyuv_unittest  libyuv_unittest      [.] ScaleARGB
  3.90%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBCols_SSE2
  3.68%  libyuv_unittest  libyuv_unittest      [.] CopyRow_ERMS
  2.92%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
  2.40%  libyuv_unittest  libyuv_unittest      [.] ScaleARGBRowDownEven_SSE2
  2.22%  libyuv_unittest  libyuv_unittest      [.] FixedDiv_X86
  2.04%  libyuv_unittest  libyuv_unittest      [.] ARGBShuffleRow_AVX2
  1.97%  libyuv_unittest  libyuv_unittest      [.] CumulativeSumToAverageRow_SSE2
  1.86%  libyuv_unittest  libyuv_unittest      [.] ScaleAddRow_AVX2
  1.83%  libyuv_unittest  libyuv_unittest      [.] ScaleCols_C
  1.70%  libyuv_unittest  libyuv_unittest      [.] I422ToABGRRow_AVX2
  1.55%  libyuv_unittest  libyuv_unittest      [.] ScaleAddCols1_C
  1.44%  libyuv_unittest  libc-2.19.so         [.] _int_malloc
  1.27%  libyuv_unittest  libyuv_unittest      [.] libyuv::ARGBClipTestFilter(int, int, int, int, libyuv::FilterMode, int)
  1.27%  libyuv_unittest  libyuv_unittest      [.] ComputeCumulativeSumRow_SSE2
  1.16%  libyuv_unittest  libyuv_unittest      [.] ARGBToYRow_AVX2
  1.14%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_SSSE3
  0.93%  libyuv_unittest  libyuv_unittest      [.] SobelXRow_SSE2
  0.92%  libyuv_unittest  libyuv_unittest      [.] SobelYRow_SSE2
  0.89%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565Row_SSE2
  0.85%  libyuv_unittest  libyuv_unittest      [.] TransposeWx8_Fast_SSSE3
  0.84%  libyuv_unittest  libyuv_unittest      [.] ScaleSlope
  0.80%  libyuv_unittest  libyuv_unittest      [.] FixedDiv1_X86
  0.75%  libyuv_unittest  libyuv_unittest      [.] next_marker
  0.74%  libyuv_unittest  libyuv_unittest      [.] I444ToARGBRow_SSSE3
  0.72%  libyuv_unittest  libyuv_unittest      [.] ARGBToUV411Row_C

Original comment by fbarch...@chromium.org on 16 Oct 2015 at 6:09

On Arm, some performance numbers
I   31.227s run_tests_on_device(HT4A2JT03762)  [==========] Running 20 tests 
from 1 test case.
I   31.227s run_tests_on_device(HT4A2JT03762)  [----------] Global test 
environment set-up.
I   31.228s run_tests_on_device(HT4A2JT03762)  [----------] 20 tests from 
LibYUVConvertTest
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI420_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI420_Opt (353 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI422_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI422_Opt (407 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI444_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI444_Opt (2681 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI411_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI411_Opt (838 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI420Mirror_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI420Mirror_Opt (423 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToNV12_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToNV12_Opt (296 ms)
I   31.228s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToNV21_Opt
I   31.228s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToNV21_Opt (275 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB_Opt (1480 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToBGRA_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToBGRA_Opt (1490 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToABGR_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToABGR_Opt (1465 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGBA_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGBA_Opt (1509 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRAW_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRAW_Opt (1576 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB24_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB24_Opt (1651 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB565_Opt
I   31.229s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB565_Opt (1563 ms)
I   31.229s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB1555_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB1555_Opt (1566 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToARGB4444_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToARGB4444_Opt (1533 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToYUY2_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToYUY2_Opt (348 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToUYVY_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToUYVY_Opt (350 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToI400_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToI400_Opt (149 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [ RUN      ] 
LibYUVConvertTest.I420ToRGB565Dither_Opt
I   31.230s run_tests_on_device(HT4A2JT03762)  [       OK ] 
LibYUVConvertTest.I420ToRGB565Dither_Opt (1962 ms)
I   31.230s run_tests_on_device(HT4A2JT03762)  [----------] 20 tests from 
LibYUVConvertTest (21920 ms total)
I   31.230s run_tests_on_device(HT4A2JT03762)  
I   31.230s run_tests_on_device(HT4A2JT03762)  [----------] Global test 
environment tear-down
I   31.230s run_tests_on_device(HT4A2JT03762)  [==========] 20 tests from 1 
test case ran. (21924 ms total)
I   31.230s run_tests_on_device(HT4A2JT03762)  [  PASSED  ] 20 tests.

Original comment by fbarch...@google.com on 18 Oct 2015 at 7:30