PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Home Page:http://www.paddlepaddle.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CINN 單測報錯

zlsh80826 opened this issue · comments

bug描述 Describe the Bug

使用官方 image 在 develop 版本測試單測有多筆 CINN 測試失敗
此 Issue 與 #62655 區別在 CINN_ONLY=OFF

複現步驟

docker run -it --rm --gpus=all registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda12.0-cudnn8.9-trt8.6-gcc12.2
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
BUILD_DIR=build-cinn
cmake -B${BUILD_DIR} -S. \
    -DCUDA_ARCH_NAME=Manual \
    -DCUDA_ARCH_BIN="80" \
    -DWITH_CINN=ON \
    -DWITH_TESTING=ON \
    -DCINN_ONLY=OFF \
    -DWITH_MKL=OFF \
    -DWITH_GPU=ON \
    -Wno-dev 2>&1 | tee c-${BUILD_DIR}.log

cmake --build ${BUILD_DIR} -j$(nproc) 2>&1 | tee ${BUILD_DIR}.log

pip install ${BUILD_DIR}/python/dist/*.whl

錯誤列表

所有 ops/ 單測, 報錯:

371: [libprotobuf ERROR /home/Paddle/third_party/protobuf/src/google/protobuf/descriptor_database.cc:642] File already exists in database: padd
le/cinn/frontend/paddle/framework.proto
371: [libprotobuf FATAL /home/Paddle/third_party/protobuf/src/google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encode
d_file_descriptor, size):
371: terminate called after throwing an instance of 'google::protobuf::FatalException'
371:   what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
371: Child aborted

cinn 單測:

The following tests FAILED:
        361 - test_cinn_op_benchmark (Failed)
        362 - test_cinn_fake_resnet (Failed)
        363 - test_cinn_real_resnet18 (Failed)
        364 - test_cinn_real_mobilenetV2 (Failed)
        365 - test_cinn_real_efficientnet (Failed)
        366 - test_cinn_real_mobilenetV1 (Failed)
        367 - test_cinn_real_resnet50 (Failed)
        368 - test_cinn_real_squeezenet (Failed)
        552 - test_cinn_sub_graph_map_expr (Failed)
        554 - test_cinn_broadcast_symbolic (Failed)
        555 - test_cinn_sub_graph_symbolic (Failed)
        905 - cinn_instruction_run_op_test (Failed)
        1008 - test_cinn (Failed)
        1009 - test_cinn_prim (Failed)
        1010 - test_cinn_prim_gelu (Failed)
        1011 - test_cinn_prim_layer_norm (Failed)
        1012 - test_cinn_prim_mean (Failed)
        2402 - test_bert_cinn (Failed)
        2404 - test_bert_prim_cinn (Failed)
        2406 - test_prim_simplenet_cinn (Failed)
        2407 - test_resnet_cinn (Failed)
        2409 - test_resnet_prim_cinn (Failed)

其他补充信息 Additional Supplementary Information

No response

感谢您的问题反馈,我同步给CINN负责同学,尽快给你回复。

@zyfncg 请帮忙看看,或者帮忙转达给更明确的负责同学,感谢!

CINN目前正处于重构阶段,这些单测均为已知问题,后续会逐步修复。