wuffs significantly slower than OpenCV 4.9.0 when decoding PNGs for 7680x4320 image #149

zchrissirhcz · 2024-06-09T07:55:48Z

Problem

When decoding a big image (height=4320, width=7680, channels=4, data type = uint8_t), wuffs is much slow than OpenCV 4.9.0, on Apple M1 (Mac-mini).

Time cost

7680x4320 image

	time cost
opencv 4.9.0	270 ms
wuffs latest("unsupported.c")	370 ms

OpenCV 4.9.0 details

brew install opencv

which is built on libpng 1.6.43:

  Media I/O: 
    ZLib:                        /Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk/usr/lib/libz.tbd (ver 1.2.12)
    JPEG:                        /opt/homebrew/lib/libjpeg.dylib (ver 80)
    WEBP:                        /opt/homebrew/lib/libwebp.dylib (ver encoder: 0x020f)
    PNG:                         /opt/homebrew/lib/libpng.dylib (ver 1.6.43)
    TIFF:                        /opt/homebrew/lib/libtiff.dylib (ver 42 / 4.6.0)
    JPEG 2000:                   OpenJPEG (ver 2.5.2)
    OpenEXR:                     OpenEXR::OpenEXR (ver 3.2.4)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

What exactly code do I use

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
// Copyright 2023 The Wuffs Authors.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
//
// SPDX-License-Identifier: Apache-2.0 OR MIT

// ----------------

/*
toy-aux-image demonstrates using the wuffs_aux::DecodeImage C++ function to
decode an in-memory compressed image. In this example, the compressed image is
hard-coded to a specific image: a JPEG encoding of the first frame of the
test/data/muybridge.gif animated image.

To run:

$CXX toy-aux-image.cc && ./a.out; rm -f a.out

for a C++ compiler $CXX, such as clang++ or g++.

The expected output:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@X@@@@XX@@@@@@@@@@X
XXXXX@@XXX@@@@@@@II@@@X@X@@@@@
XXXXX@@XX@@X@@@XO+XXX@XX@@@X@@
XXXXXXXX@XX@X@XI=I@@XXI+OXX@XX
XXXXXXXXXXXXXXX+=+OXO+=::OXX@X
XXXXXXXXXXXXXXXXXX=+==:::=XXXX
XXXXXXXXO+:::::+OO+===+OI=+XXX
XXXO::=++:::==+++XI+++X@XXO@XX
XXXO=X@X+::=::::+O++=I@XX@XXXX
XXXXX@XXX=:::::::::=+@XXXX@XXX
XXXXXXXX@O::IXO=::::O@@XXXXXXX
XXXXXXXXO=X+X@@XX::O@@XXXXXXXX
XXXXXXXXXOO=X@X@X+OIXXXXXXXXXX
XXXXXXXXXXX+IIXX+X@OX@XXXXXXXX
XXXXXXXXX@XXOI+IIOOOXXXXXXXXXX
XXXXXXXXXXX@XXXXX@XXXXXXXXXXXX
XXXXXXXXXXXXXXXXX@XXXXXXXXXXXX
OOOOXXXXXXXXXXOXXXXXXXXXXXXOOO
=+++IIIIIIIOOOOOOOOOOIIIIIIII+
*/

// Wuffs ships as a "single file C library" or "header file library" as per
// https://github.com/nothings/stb/blob/master/docs/stb_howto.txt
//
// To use that single file as a "foo.c"-like implementation, instead of a
// "foo.h"-like header, #define WUFFS_IMPLEMENTATION before #include'ing or
// compiling it.
#define WUFFS_IMPLEMENTATION

// Defining the WUFFS_CONFIG__STATIC_FUNCTIONS macro is optional, but when
// combined with WUFFS_IMPLEMENTATION, it demonstrates making all of Wuffs'
// functions have static storage.
//
// This can help the compiler ignore or discard unused code, which can produce
// faster compiles and smaller binaries. Other motivations are discussed in the
// "ALLOW STATIC IMPLEMENTATION" section of
// https://raw.githubusercontent.com/nothings/stb/master/docs/stb_howto.txt
#define WUFFS_CONFIG__STATIC_FUNCTIONS

// Defining the WUFFS_CONFIG__MODULE* macros are optional, but it lets users of
// release/c/etc.c choose which parts of Wuffs to build. That file contains the
// entire Wuffs standard library, implementing a variety of codecs and file
// formats. Without this macro definition, an optimizing compiler or linker may
// very well discard Wuffs code for unused codecs, but listing the Wuffs
// modules we use makes that process explicit. Preprocessing means that such
// code simply isn't compiled.
/*
#define WUFFS_CONFIG__MODULES
#define WUFFS_CONFIG__MODULE__AUX__BASE
#define WUFFS_CONFIG__MODULE__AUX__IMAGE
#define WUFFS_CONFIG__MODULE__BASE
#define WUFFS_CONFIG__MODULE__JPEG
*/
#define WUFFS_CONFIG__MODULES
#define WUFFS_CONFIG__MODULE__AUX__BASE
#define WUFFS_CONFIG__MODULE__AUX__IMAGE
#define WUFFS_CONFIG__MODULE__ADLER32
#define WUFFS_CONFIG__MODULE__BASE
#define WUFFS_CONFIG__MODULE__CRC32
#define WUFFS_CONFIG__MODULE__DEFLATE
#define WUFFS_CONFIG__MODULE__PNG
#define WUFFS_CONFIG__MODULE__ZLIB

// Defining the WUFFS_CONFIG__DST_PIXEL_FORMAT__ENABLE_ALLOWLIST (and the
// associated ETC__ALLOW_FOO) macros are optional, but can lead to smaller
// programs (in terms of binary size). By default (without these macros),
// Wuffs' standard library can decode images to a variety of pixel formats,
// such as BGR_565, BGRA_PREMUL or RGBA_NONPREMUL. The destination pixel format
// is selectable at runtime. Using these macros essentially makes the selection
// at compile time, by narrowing the list of supported destination pixel
// formats. The FOO in ETC__ALLOW_FOO should match the pixel format passed (as
// part of the wuffs_base__image_config argument) to the decode_frame method.
//
// If using the wuffs_aux C++ API, without overriding the SelectPixfmt method,
// the implicit destination pixel format is BGRA_PREMUL.
#define WUFFS_CONFIG__DST_PIXEL_FORMAT__ENABLE_ALLOWLIST
#define WUFFS_CONFIG__DST_PIXEL_FORMAT__ALLOW_BGRA_PREMUL

// If building this program in an environment that doesn't easily accommodate
// relative includes, you can use the script/inline-c-relative-includes.go
// program to generate a stand-alone C file.
//##include "wuffs-v0.4.c"
//#include "wuffs-v0.3.c"
#include "wuffs-unsupported-snapshot.c"

//static std::string decode()
cv::Mat ncv::read_png(const std::string filename)
{
  // Call wuffs_aux::DecodeImage, which is the entry point to Wuffs' high-level
  // C++ API for decoding images. This API is easier to use than Wuffs'
  // low-level C API but the low-level one (1) handles animation, (2) handles
  // asynchronous I/O, (3) handles metadata and (4) does no dynamic memory
  // allocation, so it can run under a `SECCOMP_MODE_STRICT` sandbox.
  // Obviously, if you don't need any of those features, then these simple
  // lines of code here suffices.
  //
  // This example program doesn't explicitly use Wuffs' low-level C API but, if
  // you're curious to learn more, the wuffs_aux::DecodeImage implementation in
  // internal/cgen/auxiliary/*.cc uses it, as does the example/convert-to-nia C
  // program. There's also documentation at doc/std/image-decoders.md
  //
  // If you also want metadata like EXIF orientation and ICC color profiles,
  // script/print-image-metadata.cc has some example code. It uses Wuffs'
  // low-level API but it's a C++ program to use Wuffs' shorter convenience
  // methods: `decoder->decode_frame_config(NULL, &src)` instead of C's
  // `wuffs_base__image_decoder__decode_frame_config(decoder, NULL, &src)`.
  std::ifstream file(filename, std::ios::binary | std::ios::ate);
  if (!file.is_open())
  {
    std::cerr << "failed to open file " << filename << "\n";
    return cv::Mat();
  }
  std::streampos filesize = file.tellg();
  file.seekg(0, std::ios::beg);
  std::vector<char> buffer(filesize);
  if (!file.read(buffer.data(), filesize))
  {
    std::cerr << "error: could not read file content.\n";
    return cv::Mat();
  }
  file.close();

  wuffs_aux::DecodeImageCallbacks callbacks;
  wuffs_aux::sync_io::MemoryInput input(buffer.data(), buffer.size());
  wuffs_aux::DecodeImageResult result =
      wuffs_aux::DecodeImage(callbacks, input);
  if (!result.error_message.empty()) {
    std::cerr << "error: " << result.error_message << "\n";
    return cv::Mat();
  }
  // If result.error_message is empty then the DecodeImage call succeeded. The
  // decoded image is held in result.pixbuf, backed by memory that is released
  // when result.pixbuf_mem_owner (a std::unique_ptr) is destroyed. In this
  // example program, this happens at the end of this function.

  wuffs_base__table_u8 table = result.pixbuf.plane(0);
  //printf("table: %p, %zu, %zu, %zu\n", table.ptr, table.width, table.height, table.stride);

  // print result.pixbuf.pixcfg
//   printf("bpp: %d\n", result.pixbuf.pixcfg.pixel_format().bits_per_pixel());
//   printf("human redable: height=%zu, width=%zu, channel=%zu\n", 
//     result.pixbuf.pixcfg.height(),
//     result.pixbuf.pixcfg.width(),
//     result.pixbuf.pixcfg.pixel_format().bits_per_pixel() / 8
//   );

  cv::Size size;
  size.height = result.pixbuf.pixcfg.height();
  size.width = result.pixbuf.pixcfg.width();
  int channels = result.pixbuf.pixcfg.pixel_format().bits_per_pixel() / 8;
  cv::Mat image(size, CV_8UC(channels));
  std::copy_n(table.ptr, size.width * size.height * channels, image.data);

  return image;
}





int main()
{
    std::cout << "OpenCV version (runtime): " << cv::getVersionString() << std::endl;

    //const std::string filename = "/Users/zz/data/peppers.png";
    const std::string filename = "/Users/zz/data/ASRDebug_0_7680x4320.png";
    cv::Mat src2;
    {
        birch::AutoTimer timer1("cv::imread");
        src2 = cv::imread(filename);
    }
    printf("src2: rows=%d, cols=%d\n", src2.rows, src2.cols);
    //cv::imwrite("result2.png", src2);

    cv::Mat src1;
    {
        birch::AutoTimer timer1("ncv::read_png");
        src1 = ncv::read_png(filename);
    }
    //cv::imwrite("result1.png", src1);
    printf("src2: rows=%d, cols=%d\n", src1.rows, src1.cols);

    std::cout << cv::getBuildInformation() << std::endl;

    return 0;
}

The text was updated successfully, but these errors were encountered:

zchrissirhcz · 2024-06-09T08:06:07Z

The testing image size is large than github limit. For the performance test, we can just generate it from C++ code:

int create_test_7680_4320_png_image()
{
    const std::string image_path = "lena.png";
    cv::Mat image = cv::imread(image_path);
    if (image.empty()) {
        std::cerr << "image file not found" << std::endl;
        return -1;
    }

    int originalWidth = image.cols;
    int originalHeight = image.rows;

    int targetWidth = 7680;
    int targetHeight = 4320;

    int rows = targetHeight / originalHeight;
    int cols = targetWidth / originalWidth;

    cv::Mat result = cv::Mat(targetHeight, targetWidth, CV_8UC4, cv::Scalar(0, 0, 0, 0));

    cv::Mat imageWithAlpha;
    cv::cvtColor(image, imageWithAlpha, cv::COLOR_BGR2BGRA);

    for (int i = 0; i < rows; ++i) {
        for (int j = 0; j < cols; ++j) {
            int x = j * originalWidth;
            int y = i * originalHeight;

            imageWithAlpha.copyTo(result(cv::Rect(x, y, originalWidth, originalHeight)));
        }
    }
    cv::imwrite("result.png", result);

    return 0;
}

nigeltao · 2024-06-09T10:45:10Z

What exactly code do I use

What's the command (the compiler invocation) to build that code?

zchrissirhcz · 2024-06-09T12:28:24Z

I use CMake for build:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release

The compiler is AppleClang:

Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Target: arm64-apple-darwin23.2.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

nigeltao · 2024-06-09T13:12:48Z

I'm not very familiar with cmake (and I don't have an Apple M1). Do you know if -DCMAKE_BUILD_TYPE=Release passes -O2 or -O3 to clang?

Also, do you know, after the #include "wuffs-unsupported-snapshot.c" line, if the WUFFS_BASE__CPU_ARCH__ARM_CRC32 and WUFFS_BASE__CPU_ARCH__ARM_NEON macros are defined?

Specifically, if you do something like

#ifdef WUFFS_BASE__CPU_ARCH__ARM_CRC32
#error "asdf1"
#else
#error "asdf2"
#endif

Do you see asdf1 or asdf2. Ditto for #ifdef WUFFS_BASE__CPU_ARCH__ARM_NEON.

zchrissirhcz · 2024-06-09T13:36:21Z

-O3 is used. I find it in build/compile_commands.json:

{
  "directory": "/Users/zz/work/cppsober/kcv/build",
  "command": "/Library/Developer/CommandLineTools/usr/bin/c++ -DGL_SILENCE_DEPRECATION -isystem /opt/homebrew/Cellar/opencv/4.9.0_8/include/opencv4 -isystem /Users/zz/.arcpkg/birch/autotimer/0.1/mac-arm64-static/inc/birch -O3 -DNDEBUG -std=gnu++17 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.2.sdk -o CMakeFiles/test_wuffs.dir/imwrite.cpp.o -c /Users/zz/work/cppsober/kcv/imwrite.cpp",
  "file": "/Users/zz/work/cppsober/kcv/imwrite.cpp",
  "output": "CMakeFiles/test_wuffs.dir/imwrite.cpp.o"
},

zchrissirhcz · 2024-06-09T13:38:58Z

WUFFS_BASE__CPU_ARCH__ARM_CRC32 and WUFFS_BASE__CPU_ARCH__ARM_NEON are enabled.

// To simplify Wuffs code, "cpu_arch >= arm_xxx" requires xxx but also
// unaligned little-endian load/stores.
#if defined(__ARM_FEATURE_UNALIGNED) && !defined(__native_client__) && \
    defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)
// Not all gcc versions define __ARM_ACLE, even if they support crc32
// intrinsics. Look for __ARM_FEATURE_CRC32 instead.
#if defined(__ARM_FEATURE_CRC32)
#include <arm_acle.h>
#define WUFFS_BASE__CPU_ARCH__ARM_CRC32
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES" // new added
#endif  // defined(__ARM_FEATURE_CRC32)
#if defined(__ARM_NEON)
#include <arm_neon.h>
#define WUFFS_BASE__CPU_ARCH__ARM_NEON
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_NEON: YES" // new added
#endif  // defined(__ARM_NEON)
#endif  // defined(__ARM_FEATURE_UNALIGNED) etc

The outout of compilation:

➜  kcv git:(main) ✗ cmake --build build -j8
[ 56%] Built target glfw
[ 76%] Built target imgui
[ 89%] Built target konacv
[ 94%] Built target test
[ 97%] Building CXX object CMakeFiles/test_wuffs.dir/imwrite.cpp.o
In file included from /Users/zz/work/cppsober/kcv/imwrite.cpp:117:
/Users/zz/work/cppsober/kcv/wuffs-unsupported-snapshot.c:120:9: warning: WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES [-W#pragma-messages]
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_CRC32: YES"
        ^
/Users/zz/work/cppsober/kcv/wuffs-unsupported-snapshot.c:125:9: warning: WUFFS_BASE__CPU_ARCH__ARM_NEON: YES [-W#pragma-messages]
#pragma message "WUFFS_BASE__CPU_ARCH__ARM_NEON: YES"
        ^
2 warnings generated.
[100%] Linking CXX executable test_wuffs
[100%] Built target test_wuffs

nigeltao · 2024-06-10T12:39:33Z

OK, I don't think there's an obvious fix. Still, I don't have an Apple M1 so it might take me a while to make progress on this.

Can you e-mail the image file (or a link to it) to nigeltao golang org? Thanks.

zchrissirhcz · 2024-06-10T12:58:48Z

OK, I don't think there's an obvious fix. Still, I don't have an Apple M1 so it might take me a while to make progress on this.

Can you e-mail the image file (or a link to it) to nigeltao golang org? Thanks.

Been sent, please check.

nigeltao · 2024-06-16T06:34:55Z

Thanks for sharing your 7680x4320 image. My wuffs bench time-to-decode numbers on x86_64 Intel (i5-10210U Comet Lake), not arm64 Apple (M1):

370ms wuffs latest (clang 14)
334ms wuffs latest (gcc 12)
533ms libpng (Debian 12 Bookworm)

Looks like I'm going to have to find an Apple M1 (or similar)...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wuffs significantly slower than OpenCV 4.9.0 when decoding PNGs for 7680x4320 image #149

wuffs significantly slower than OpenCV 4.9.0 when decoding PNGs for 7680x4320 image #149

zchrissirhcz commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 10, 2024

zchrissirhcz commented Jun 10, 2024

nigeltao commented Jun 16, 2024

wuffs significantly slower than OpenCV 4.9.0 when decoding PNGs for 7680x4320 image #149

wuffs significantly slower than OpenCV 4.9.0 when decoding PNGs for 7680x4320 image #149

Comments

zchrissirhcz commented Jun 9, 2024

Problem

Time cost

OpenCV 4.9.0 details

What exactly code do I use

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

zchrissirhcz commented Jun 9, 2024

nigeltao commented Jun 10, 2024

zchrissirhcz commented Jun 10, 2024

nigeltao commented Jun 16, 2024