intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ollama codestral model produces nonsensical output on PVC

tkarna opened this issue · comments

I'm using ollama in langchain. The following code generation test uses codestral model and produces valid a response on standard ollama install running on CPU. If I run with ollama installed with ipex-llm (following the online instructions), I get nonsensical characters in the output.

Brief tests suggest that the length of the input prompt may trigger this. Shorter/simpler queries work fine.

Tested on a server with Sapphire Rapids CPUs and 2 PVCs. Docker with Ubuntu 22.04.4 LTS, OneAPI 2024.1, Python 3.10.12, langchain 0.2.1.

Valid response on CPU:

```diff
  - ResultData data{
  + ResultData data{
  +     utils::GetPid(),
  +     device_id_,
  +     correlator_.GetTimestamp(),
  +     options_.GetMetricGroup(),
  +     device_props_list_,
  +     kernel_name_list,
  +     std::move(kernel_interval_list)};
```

Invalid response on ipex-llm and PVC:

 ```diff
-    ResultData data{
+    ResultData data{
        utils<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>...

Reproducer:

reproducer.py
from langchain_community.llms import Ollama

patch_prompt = """You are an experienced programmer. You will be given an issue related to a C++ software project,
some related code blocks for context, and a proposed fix. Your ultimate goal is to write code block that fixes the issue.

To respond you MUST use the following format.
~~~
```diff
[fixed code block in diff format.]
```
~~~

Issue: {input}
"""

issue_text = """
Issue description: COPY_INSTEAD_OF_MOVE
Creating a copy of a variable that is no longer used instead of using std::move().
Context: 
"kernel_interval_list" is copied in call to copy constructor "std::vector >", when it could be moved instead.
Use "std::move"("kernel_interval_list") instead of "kernel_interval_list".
Affected part of source code:
```cpp

    ResultData data{
        utils::GetPid(),
        device_id_,
        correlator_.GetTimestamp(),
        options_.GetMetricGroup(),
        device_props_list_,
        kernel_name_list,
        kernel_interval_list};
```
In:
identifier: Profiler::DumpResultFile
path: pti/tools/oneprof/profiler.h:262
```cpp
  void Profiler::DumpResultFile() {
    std::vector kernel_name_list;
    std::vector kernel_interval_list;

    if (CheckOption(PROF_KERNEL_INTERVALS) ||
        CheckOption(PROF_KERNEL_METRICS) ||
        CheckOption(PROF_AGGREGATION)) {

      if (cl_kernel_collector_ != nullptr) {
        const ClKernelIntervalList& cl_kernel_interval_list =
          cl_kernel_collector_->GetKernelIntervalList();

        std::vector device_list =
          utils::cl::GetDeviceList(CL_DEVICE_TYPE_GPU);
        if (!device_list.empty()) {
          PTI_ASSERT(device_id_ < device_list.size());
          AddKernelIntervals(
              cl_kernel_interval_list,
              device_list[device_id_],
              kernel_name_list,
              kernel_interval_list);
        }
      }

      if (ze_kernel_collector_ != nullptr) {
        const ZeKernelIntervalList& ze_kernel_interval_list =
          ze_kernel_collector_->GetKernelIntervalList();

        std::vector device_list =
          utils::ze::GetDeviceList();
        if (!device_list.empty()) {
          PTI_ASSERT(device_id_ < device_list.size());
          AddKernelIntervals(
              ze_kernel_interval_list,
              device_list[device_id_],
              kernel_name_list,
              kernel_interval_list);
        }
      }
    }

    if (CheckOption(PROF_KERNEL_QUERY)) {
      PTI_ASSERT(metric_query_collector_ != nullptr);
      kernel_name_list = metric_query_collector_->GetKernels();
    }

    ResultStorage* storage = ResultStorage::Create(
        options_.GetRawDataPath(), utils::GetPid());
        PTI_ASSERT(storage != nullptr);

    ResultData data{
        utils::GetPid(),
        device_id_,
        correlator_.GetTimestamp(),
        options_.GetMetricGroup(),
        device_props_list_,
        kernel_name_list,
        kernel_interval_list};

    storage->Dump(&data);

    delete storage;
  }
```
"""

llm = Ollama(
    model="codestral",
    temperature=0.1,
    top_k=10,
    top_p=0.5,
    repeat_penalty=1.03,
    num_thread=28,
)

full_prompt = patch_prompt.format(input=issue_text)
response = llm.invoke(full_prompt)
print(response)

Hi @tkarna ,
I have reproduced this error, and we are trying to figure out the root cause and fix it.
Once it is done, will update here to let you know.

Hi @tkarna ,
We have fixed this issue, you can try with pip install ipex-llm[cpp]==2.1.0b20240603 again (which will be released tonight).

Thank you! I confirm that with the latest version ipex-llm[cpp]==2.1.0b20240603 the example works correctly.