[Bug] got Segmentation fault (core dumped) when convert_weight.

Question

[Bug] got Segmentation fault (core dumped) when convert_weight.

ricar0 opened this issue 2 months ago · comments

Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking. You are always welcomed to post on the forum first 😸

Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.

Expected behavior

What you were expecting

Actual behavior

got Segmentation fault (core dumped) when convert_weight.

mlc_llm convert_weight --model-type phi ./dist/models/phi-2 --quantization q4f16_1 -o ./dist/phi-2
[2024-03-23 17:05:01] INFO auto_config.py:115: Found model configuration: dist/models/phi-2/config.json
[2024-03-23 17:05:03] INFO auto_device.py:85: Not found device: cuda:0
[2024-03-23 17:05:04] INFO auto_device.py:85: Not found device: rocm:0
[2024-03-23 17:05:05] INFO auto_device.py:85: Not found device: metal:0
[2024-03-23 17:05:08] INFO auto_device.py:76: Found device: vulkan:0
[2024-03-23 17:05:08] INFO auto_device.py:76: Found device: vulkan:1
[2024-03-23 17:05:08] INFO auto_device.py:76: Found device: vulkan:2
[2024-03-23 17:05:08] INFO auto_device.py:76: Found device: vulkan:3
[2024-03-23 17:05:09] INFO auto_device.py:85: Not found device: opencl:0
[2024-03-23 17:05:09] INFO auto_device.py:33: Using device: vulkan:0
[2024-03-23 17:05:09] INFO auto_weight.py:70: Finding weights in: dist/models/phi-2
[2024-03-23 17:05:09] INFO auto_weight.py:136: Not found Huggingface PyTorch
[2024-03-23 17:05:09] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: dist/models/phi-2/model.safetensors.index.json
[2024-03-23 17:05:09] INFO auto_weight.py:106: Using source weight configuration: dist/models/phi-2/model.safetensors.index.json. Use `--source` to override.
[2024-03-23 17:05:09] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-03-23 17:05:09] INFO auto_config.py:153: Found model type: phi. Use `--model-type` to override.
Weight conversion with arguments:
  --config          dist/models/phi-2/config.json
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
  --model-type      phi
  --device          vulkan:0
  --source          dist/models/phi-2/model.safetensors.index.json
  --source-format   huggingface-safetensor
  --output          dist/phi-2
[2024-03-23 17:05:09] INFO phi_model.py:53: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
Start storing to cache dist/phi-2
[2024-03-23 17:05:16] INFO huggingface_loader.py:182: Loading HF parameters from: dist/models/phi-2/model-00002-of-00002.safetensors                                                                    
[2024-03-23 17:05:18] INFO huggingface_loader.py:172: [Not quantized] Parameter: "lm_head.linear.bias", shape: (51200,), dtype: float16                                                                 
[2024-03-23 17:05:23] INFO group_quantization.py:232: Compiling quantize function for key: ((51200, 2560), float16, vulkan, axis=1, output_transpose=False)                                             
  0%|▌                                                                                                                                                                  | 1/325 [00:06<06:56,  1.29s/it]
Segmentation fault (core dumped)

Environment

Operating System: Ubuntu18.04,
TVM: download from https://github.com/apache/tvm
python: 3.11

python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
TVM_DEBUG_WITH_ABI_CHANGE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU: OFF
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: OFF
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM: OFF
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: OFF
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: 89cd74c07d06910990404aab08b3a46bead39d1d
USE_VULKAN: /data2/wangmy/vulkansdk/x86_64
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-21 16:36:32 -0400
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: NOT-FOUND
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: none
USE_BNNS: OFF
USE_FLASHINFER: OFF
USE_CUBLAS: OFF
USE_METAL: OFF
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION: 
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /usr/bin/c++
HIDE_PRIVATE_SYMBOLS: OFF

Steps to reproduce

Preferably a minimal script to cause the issue to occur.

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

needs-triage

Hui Li · Answer 1 · Sat Mar 23 2024 23:39:30 GMT+0800 (China Standard Time)

tvm will die

Hui Li · Answer 2 · Sat Mar 23 2024 23:51:04 GMT+0800 (China Standard Time)

tvm will die is real