microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

segmentation fault when running long sequence input to Phi3-128k-instruct-onnx

houminmin opened this issue · comments

I downloaded the phi3-mini-128k-instruct-onnx model (cpu_and_mobile/cpu-int4-rtn-blocks-32) from hugging face, and used the phi3-qa.py to run text generation following the instructions in the readme. Only thing I changed in the phi3-qa.py is instead of having user input text, I have predefined text. I found that when the input text becomes long (about a few thousand tokens), I would get segmentation fault. Example of failure is: text="tell me about the history of United States of America"*100

I'm using the onnxruntime=1.18.0, onnxruntime-genai=0.3.0rc1, onnx = 1.16.0

I'm using a Linux CPU server, os = rocky Linux 8.6

Could you please advise on how to solve this issue? Thanks!

Can you try setting the min_length to a larger number (e.g. 2048)? I was able to run your prompt successfully after setting the min_length.

$ python phi3-qa.py -m /path/to/Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32/ --min_length 2048

Output:  The history of the United States of America is a rich tveapestry that began with the arrival of the first European settlers in the 16th century. The first permanent English colony was established in Jamestown, Virginia in 1607. The Thirteen Colonies, located along the East Coast, were founded by the British between the 17th and 18th centuries. These colonies eventually declared independence from Britain in 1776, marking the birth of the United States.

The U.S. Constitution was ratified in 1788, establishing a federal system of government. The country expanded westward, and by the early 19th century, it had acquired vast territories through the Louisiana Purchase and the Mexican-American War. The Civil War (1861-1865) was a pivotal event, leading to the abolition of slavery and the preservation of the Union.

The late 19th and early 20th centuries saw rapid industrialization and urbanization, with significant immigration from Europe and Asia. The Progressive Era (1890s-1920s) brought about social and political reforms, including women's suffrage, which was achieved nationally in 1920 with the 19th Amendment.

The 20th century was marked by two World Wars and the Great Depression. The Civil Rights Movement of the 1950s and 1960s led to significant social changes, including the desegregation of schools and the passage of the Civil Rights Act of 1964.

The late 20th and early 21st centuries have seen the U.S. become a global superpower, with significant influence in international affairs. The country has faced numerous challenges, including economic recessions, terrorism, and political polarization. Despite these challenges, the U.S. continues to be a beacon of democracy and freedom worldwide.

The history of the United States is a testament to the country's resilience, diversity, and unwavering pursuit of liberty and justice for all. From the first settlers to the modern-day, the United States has continually evolved, shaping and being shaped by the people who call it home.

The history of the United States is also marked by significant events and movements that have shaped the nation's identity. The American Revolution, the Civil War, the Civil Rights Movement, and the Women's Suffrage Movement are just a few examples of these pivotal moments. These events have not only shaped the nation's history but also its culture, values, and societal norms.

In conclusion, the history of the United States is a complex tapestry woven from the threads of countless stories, struggles, and triumphs. It is a history that continues to unfold, shaping the nation's future and its place in the world. The United States remains a symbol of hope and opportunity for people around the globe, a testament to the enduring spirit of its people.


Tell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history of United States of Americatell me about the history

I ran a few more experiments. Here is what I found:

  1. when I set max_length = 2048, 4096, or 8192, the script runs fine.
  2. when I set max_length = 16384 or 32768, the script runs for some text, but fails for other text (segmentation fault). It seems to depend on the number of tokens of the input text.
  3. when I set max_length = 65536 or higher, the script fails for all texts that I tested.
  4. min_length = 2048 did not solve the issue.
    Could you please advise on how I can solve this issue?
  1. when I set max_length = 2048, 4096, or 8192, the script runs fine.
  2. when I set max_length = 16384 or 32768, the script runs for some text, but fails for other text (segmentation fault). It seems to depend on the number of tokens of the input text.

Can you share some prompts where it fails? From some testing, I see a scenario where the output partially prints and then abruptly stops printing. This appears to be dependent on the prompt and the max length.

Examples:

# Prompt = "tell me about the history of United States of America" * 100
$ python phi3-qa.py -m /path/to/Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32/ --max_length 2048
Prompt length = 1207

Output:  The history of the United States of America is a rich t
# Prompt = "tell me about the history of United States of America" * 100
$ python phi3-qa.py -m /path/to/Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32/ --max_length 16384
Prompt length = 1207

Output:  The history of the United States of America is a rich t
# Prompt = "tell me about the history of United States of America" * 200
$ python phi3-qa.py -m /path/to/Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32/ --max_length 8192
Prompt length = 2407

Output:  The history of the United States of America is vast and complex, beginning with the indigenous peoples who inhabited the land thousands of years before European explorers arrived. The first permanent settlement was established by the Pilgrims at Plymouth in 1620. The 18th century saw the American Revolution, leading to the formation of the United States in 1776. The country expanded westward, culminating in the Louisiana Purchase in 1803. The 19th century was marked by the Civil War and the abolition of slavery. The 20th century saw the rise and fall of the two World Wars, the Civil Rights Movement, and significant technological advancements. The 21st century has been characterized by the digital revolution, globalization, and ongoing social and political changes.

While the first two examples may exhibit behavior similar to a "segmentation fault", currently they do not appear to be due to segmentation faults since there is some initial output text returned that isn't from the prompt (e.g. is a rich t).

I do see a segmentation fault, however, when using a much longer prompt.

# Prompt = "tell me about the history of United States of America" * 1000
$ python phi3-qa.py -m /path/to/Phi-3-mini-128k-instruct-onnx/cpu_and_mobile/cpu-int4-rtn-block-32/ --max_length 16384
Prompt length = 12007

Output: Segmentation fault (core dumped)

This error is currently being investigated.

  1. when I set max_length = 65536 or higher, the script fails for all texts that I tested.

This is likely because you are running into out-of-memory errors with your hardware because the KV caches need to fit into memory.

  1. min_length = 2048 did not solve the issue.

Did you test with the same prompt or a different one? Can you share more information about your machine and environment setup?

@kunal-vaishnavi Sorry about my delayed reply. Example prompts: 1) the one you showed "tell me about..."*1000, 2) you can download a long-context summary dataset from hf: https://huggingface.co/datasets/MocktaiLEngineer/qmsum-processed and sample few longest ones.

"This is likely because you are running into out-of-memory errors with your hardware because the KV caches need to fit into memory." --> No, I used the same prompt "tell me about..."*100, then set max_length arg to 65536, and then I got the segmentation error.

I was using a CPU server (x86_64) with more than 300GB of memory. os = rocky Linux 8.6
onnxruntime=1.18.0, onnxruntime-genai=0.3.0rc1, onnx = 1.16.0

Thanks for looking into this issue!

This should get resolved with the new ort-genai release
Fix was merged into main here: microsoft/onnxruntime#20921. We will release 0.3.0 package this week where this issue should no longer be reproducible.
I'll close the issue now, but please feel free to share your feedback or comments.