nanovms / ops

ops - build and run nanos unikernels

Home Page:https://ops.city

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unikernel Process Not Starting with Memory Size Beyond a Certain Range in Firecracker

Roopsai507 opened this issue · comments

I'm encountering an issue when trying to start a unikernel process with memory sizes beyond a specific range using Firecracker. Below are the configurations I've tested:

Working Configuration (1024 or below):
"machine-config": { "vcpu_count": 1, "mem_size_mib": 1024, "smt": false }

Issue Configuration (Not Working) - above 1024:
"machine-config": { "vcpu_count": 1, "mem_size_mib": 1025, "smt": false }
"machine-config": { "vcpu_count": 1, "mem_size_mib": 2048, "smt": false }

If the mem_size_mib is above 1024 my unikernel process is not getting started, Why?

hrm - I tried to repro and couldn't - are you seeing anything in firecracker logs? what if you enable '--trace' ?

What Nanos kernel version are you using? I can only reproduce the problem with old versions (0.1.41 and older), while with kernel version 0.1.42 or newer the problem doesn't occur. If you still see the problem when using a recent kernel version, what is your Firecracker version? I'm testing with Firecracker v1.3.3

I was using

Ops Kernel version :  0.1.47
Firecracker version :  1.1.2

Updating to the latest Firecracker version (1.6.0) resolved the issue!
It is working with the higher firecracker version. Thank you

Hi, with Ops Kernel version : 0.1.47 , Firecracker version : 1.6.0
Encountering issues during the startup of the Firecracker. At times, my unikernel process doesn't start successfully. I faced this issue even after multiple attempts.

FAILED LOGS

2024-01-10T16:09:58.808443225 [9090:main] Running Firecracker v1.6.0
2024-01-10T16:09:58.820577445 [9090:main] Artificially kick devices.
2024-01-10T16:09:58.820646238 [9090:main] Successfully started microvm that was configured from one single json
2024-01-10T16:09:58.820801259 [9090:fc_vcpu 0] Received KVM_EXIT_SHUTDOWN signal
2024-01-10T16:09:58.820835506 [9090:main] Vmm is stopping.
2024-01-10T16:09:58.820978355 [9090:main] Vmm is stopping.
2024-01-10T16:09:58.822355908 [9090:main] Firecracker exiting successfully. exit_code=0                                                                        

Why do I encounter the error mentioned above? However, when I retry using the same script(created a script file for firecracker start) after some time, it works fine, as shown below.

024-01-10T16:35:44.236402451 [9090:main] Running Firecracker v1.6.0
2024-01-10T16:35:44.255481477 [9090:main] Artificially kick devices.
2024-01-10T16:35:44.255549506 [9090:main] Successfully started microvm that was configured from one single json
warning: ACPI MADT not found, default to 1 processor
en1: assigned 11.244.15.161
Server started
en1: assigned FE80::200:FF:FE00:0

the first log snippet doesn't seem to indicate an error as it has an exit code of 0; do you have a code sample for your guest we could look at?

Here is the sample code that I have been trying to run in firecracker.
main.zip
Reference for v8go - https://github.com/rogchap/v8go
config.json is for ops image creation - ops image create -c config.json --imagename test

> ops image tree test
/
|   lib64
|   |   ld-linux-x86-64.so.2
|   lib
|   |   x86_64-linux-gnu
|   |   |   libstdc++.so.6
|   |   |   libnss_dns.so.2
|   |   |   libc.so.6
|   |   |   libgcc_s.so.1
|   |   |   libm.so.6
|   |   |   libpthread.so.0
|   |   |   libresolv.so.2
|   etc
|   |   ssl
|   |   |   certs
|   |   |   |   ca-certificates.crt
|   |   resolv.conf
|   |   passwd
|   main
|   proc
|   |   sys
|   |   |   kernel
|   |   |   |   hostname
|   sample.js

I have created binary from the Golang code, created the ops image, and tried to run it in Firecracker.
I am facing one more issue now - running out of memory

allocate_table_page error: failed to allocate page table memory
map_level error: failed to allocate page table memory
ra 0xffffffff800b96a5

frame trace: 
ffffc00000e4fec8:   ffffffff800b96a5
ffffc00000e4ff18:   ffffffff800b9786
ffffc00000e4ff38:   ffffffff800f111b
ffffc00000e4ff98:   ffffffff80122add
ffffc00000e4ffe8:   ffffffff800010ed
0000000102403c08:   0000000000ae62c3
0000000102403c38:   0000000000a5ba2f
0000000102403c58:   0000000000a1c07f
0000000102403d88:   0000000000a1abaa
000000c0000b5950:   000000000080c355
000000c0000b5988:   0000000000a15e68
000000c0000b59b0:   0000000000a17e35
000000c0000b59e0:   0000000000a1a217
000000c0000b5ac8:   0000000000a074a9
000000c0000b5af0:   0000000000a08dc2
000000c0000b5b40:   0000000000a09a6e
000000c0000b5b70:   0000000000a062d4
000000c0000b5fb0:   0000000000a0a288
000000c0000b5fd8:   000000000086ea21

loaded klibs: 
map failed for v 0x3c1e00000000, p 0xbb16000, len 0x1000, flags 0x8000000000000206
2024-01-17T12:24:30.929038447 [9090:main] Vmm is stopping.

I got this issue when I was performing loadtest on my server: unikernel-firecracker
memory (RAM) - 256MB (if I try to increase RAM - some more requests are executed)

./wrk -d 10s -c 1 -t 1 'http://10.0.0.1:8080' -s body.lua
Running 10s test @ http://10.0.0.1:8080
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.42ms  394.93us   6.91ms   70.21%
    Req/Sec   220.33     33.68   270.00     83.33%
  527 requests in 10.02s, 66.90KB read
Requests/sec:     52.58
Transfer/sec:      6.67KB

the same i have tried it in centos-firecracker with the same conf - memory 256 MB

./wrk -d 30s -c 50 -t 40 'http://10.0.0.1:8080' -s body.lua
Running 30s test @ http://10.0.0.1:8080
  40 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   260.60ms  104.57ms 787.84ms   78.11%
    Req/Sec     4.25      2.38    30.00     89.11%
  4599 requests in 30.04s, 583.86KB read
Requests/sec:    153.07
Transfer/sec:     19.43KB

Can i know root cause of the issue and how to fix it?

The out of memory issue is fixed by nanovms/nanos#1994. If you want to try out the updated Nanos kernel without waiting for this PR to be merged, you can add the --nanos-version d875bfe option to your ops image create command, and then change the "kernel_image_path" option in the Firecracker config file to "~/.ops/d875bfe/kernel.img".

As for the other issue (the unikernel process doesn't start successfully), I was unable to reproduce it with the sample code you gave us, neither when using the latest Nanos release, nor after applying the changes in the above PR: I tried multiple times with both Firecracker v1.3.3 and v1.6.0, and the process always starts successfully every time I run Firecracker.