Concurrent disk creation requests failed with memory budget exceeded error
askfongjojo opened this issue · comments
The disk creation was done with terraform on rack2 against a TF plan I've been using for at least 6 months without problems. Here is the error I got when the system attempted to create 20 disks, 10GiB each in size (TF uses 10 concurrent thread by default):
oxide_disk.boot[7]: Creating...
oxide_disk.boot[15]: Creating...
oxide_disk.boot[13]: Creating...
oxide_disk.boot[17]: Creating...
oxide_disk.boot[14]: Creating...
oxide_disk.boot[10]: Creating...
oxide_disk.boot[8]: Creating...
oxide_disk.boot[1]: Creating...
oxide_disk.boot[0]: Creating...
oxide_vpc_subnet.app: Creation complete after 1s [id=ac5d0381-aaa4-4189-8552-c6f4b734f29c]
oxide_disk.boot[11]: Creating...
oxide_disk.boot[18]: Creating...
oxide_disk.boot[4]: Creating...
oxide_disk.boot[6]: Creating...
oxide_disk.boot[12]: Creating...
oxide_disk.boot[5]: Creating...
oxide_disk.boot[2]: Creating...
oxide_disk.boot[16]: Creating...
oxide_disk.boot[9]: Creating...
oxide_disk.boot[3]: Creating...
oxide_disk.boot[19]: Creating...
oxide_disk.boot[5]: Creation complete after 4s [id=f2f26899-75be-417a-8bb6-592dd41d1335]
oxide_disk.boot[18]: Creation complete after 6s [id=471b62f8-5a17-4892-9320-99bdf1c2034e]
╷
│ Error: Error creating disk
│
│ with oxide_disk.boot[10],
│ on app.tf line 59, in resource "oxide_disk" "boot":
│ 59: resource "oxide_disk" "boot" {
│
│ API error: POST https://oxide.sys.rack2.eng.oxide.computer/v1/disks?project=5e49b6de-cb2d-438d-83af-95c415bbb901
│ ----------- RESPONSE -----------
│ Status: 500 Internal
│ Message: Internal Server Error
│ RequestID: 65ec3fc7-ee1c-4acc-95d2-525dc61aab78
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [65ec3fc7-ee1c-4acc-95d2-525dc61aab78]
│ Date: [Thu, 21 Mar 2024 04:43:12 GMT]
│ Content-Length: [124]
│
╵
╷
│ Error: Error creating disk
│
│ with oxide_disk.boot[11],
│ on app.tf line 59, in resource "oxide_disk" "boot":
│ 59: resource "oxide_disk" "boot" {
│
│ API error: POST https://oxide.sys.rack2.eng.oxide.computer/v1/disks?project=5e49b6de-cb2d-438d-83af-95c415bbb901
│ ----------- RESPONSE -----------
│ Status: 500 Internal
│ Message: Internal Server Error
│ RequestID: 9d553973-b41b-4ec7-b35d-3bd82ecea37f
│ ------- RESPONSE HEADERS -------
│ Content-Type: [application/json]
│ X-Request-Id: [9d553973-b41b-4ec7-b35d-3bd82ecea37f]
│ Date: [Thu, 21 Mar 2024 04:43:12 GMT]
│ Content-Length: [124]
│
The errors in the Nexus log all complain about memory budget being exceeded, e.g.
root@oxz_nexus_65a11c18:~# grep 'Internal Server' /var/svc/log/oxide-nexus\:default.log | looker
04:43:12.830Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (dropshot_external): request completed
error_message_external = Internal Server Error
error_message_internal = saga ACTION error at node "datasets_and_regions": unexpected database error: scan with start key /Table/434/2/"i\\xf0\\xb8c\\xf7?B\\xb2\\x98\\"\\xb2\u{2d9}\\xf0\\x90\\x03": root: memory budget exceeded: 133120 bytes requested, 134104377 currently allocated, 134217728 bytes in budget
file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/29ae98d/dropshot/src/server.rs:837
latency_us = 305395
local_addr = 172.30.2.5:443
method = POST
remote_addr = 172.20.17.42:60877
req_id = 18940d64-d8c1-40a8-9e97-01c03a5cf957
response_code = 500
uri = https://oxide.sys.rack2.eng.oxide.computer/v1/disks?project=5e49b6de-cb2d-438d-83af-95c415bbb901
04:43:12.830Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (dropshot_external): request completed
error_message_external = Internal Server Error
error_message_internal = saga ACTION error at node "datasets_and_regions": unexpected database error: scan with start key /Table/434/2/"\\x13\\x86c\\xad\\xa3\\x82E\\x95\\xba\\xf0\\b\\xf6\\xb0'jg": root: memory budget exceeded: 133120 bytes requested, 134104377 currently allocated, 134217728 bytes in budget
file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/29ae98d/dropshot/src/server.rs:837
latency_us = 304769
local_addr = 172.30.2.5:443
method = POST
remote_addr = 172.20.17.42:60877
req_id = 65ec3fc7-ee1c-4acc-95d2-525dc61aab78
response_code = 500
uri = https://oxide.sys.rack2.eng.oxide.computer/v1/disks?project=5e49b6de-cb2d-438d-83af-95c415bbb901
04:43:12.961Z INFO 65a11c18-7f59-41ac-b9e7-680627f996e7 (dropshot_external): request completed
error_message_external = Internal Server Error
error_message_internal = saga ACTION error at node "datasets_and_regions": unexpected database error: root: memory budget exceeded: 40960 bytes requested, 134190929 currently allocated, 134217728 bytes in budget
file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/29ae98d/dropshot/src/server.rs:837
latency_us = 435717
local_addr = 172.30.2.5:443
method = POST
remote_addr = 172.20.17.42:60877
req_id = 66546451-fac1-4c58-8a66-217afd0c71fc
response_code = 500
uri = https://oxide.sys.rack2.eng.oxide.computer/v1/disks?project=5e49b6de-cb2d-438d-83af-95c415bbb901
The error didn't occur when I created disks sequentially, even though these were much bigger disks:
{
"block_size": 512,
"description": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9 test instance ",
"device_path": "/mnt/prov-time-32c-64m",
"id": "0228bbb9-07b3-4fd0-80be-1f80546d7baf",
"image_id": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9",
"name": "prov-time-32c-64m",
"project_id": "5e49b6de-cb2d-438d-83af-95c415bbb901",
"size": 68719476736,
"snapshot_id": null,
"state": {
"state": "creating"
},
"time_created": "2024-03-21T04:48:52.648048Z",
"time_modified": "2024-03-21T04:48:52.648048Z"
}
{
"block_size": 512,
"description": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9 test instance ",
"device_path": "/mnt/prov-time-32c-96m",
"id": "fc8c8f39-bd1d-4d81-94a3-9400c020f554",
"image_id": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9",
"name": "prov-time-32c-96m",
"project_id": "5e49b6de-cb2d-438d-83af-95c415bbb901",
"size": 103079215104,
"snapshot_id": null,
"state": {
"state": "creating"
},
"time_created": "2024-03-21T04:48:57.152238Z",
"time_modified": "2024-03-21T04:48:57.152238Z"
}
{
"block_size": 512,
"description": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9 test instance ",
"device_path": "/mnt/prov-time-32c-128m",
"id": "12e6a8c6-7f46-4a27-a23d-b6536622e196",
"image_id": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9",
"name": "prov-time-32c-128m",
"project_id": "5e49b6de-cb2d-438d-83af-95c415bbb901",
"size": 137438953472,
"snapshot_id": null,
"state": {
"state": "creating"
},
"time_created": "2024-03-21T04:49:02.273872Z",
"time_modified": "2024-03-21T04:49:02.273872Z"
}
{
"block_size": 512,
"description": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9 test instance ",
"device_path": "/mnt/prov-time-32c-256m",
"id": "1dab86ec-e1e2-4d3f-a4f2-40b0718a8661",
"image_id": "cb6eb1e9-69fd-40ad-9373-83926f8b32d9",
"name": "prov-time-32c-256m",
"project_id": "5e49b6de-cb2d-438d-83af-95c415bbb901",
"size": 274877906944,
"snapshot_id": null,
"state": {
"state": "creating"
},
"time_created": "2024-03-21T04:49:09.296486Z",
"time_modified": "2024-03-21T04:49:09.296486Z"
}
Looking at /nexus/db-queries/src/db/queries/region_allocation.rs
(in particular, here) which appears to be the code that generates the offending query, I wonder if the inclusion of the inv_zpool
table may be causing the much higher memory consumption - it has a fairly large number of rows on rack2:
root@[fd00:1122:3344:105::3]:32221/omicron> select count(*) from inv_zpool;
count
----------
231660
(1 row)
@smklein - Would you kindly take a look?
Just dropping some notes from a quick skim:
error_message_internal = saga ACTION error at node "datasets_and_regions": unexpected database error: root: memory budget exceeded: 40960 bytes requested, 134190929 currently allocated, 134217728 bytes in budget
Details on this error from CRDB. It sounds like this error pops out when overall pressure on CRDB queries is too high; it may not be caused by any individual query.
Looking at
/nexus/db-queries/src/db/queries/region_allocation.rs
(in particular, here) which appears to be the code that generates the offending query, I wonder if the inclusion of theinv_zpool
table may be causing the much higher memory consumption - it has a fairly large number of rows on rack2:
The linked query has a ... LIMIT 1
attached, so it's only grabbing a single row. I doubt that's the offender? But I don't know enough about CRDB to know whether that particular query (even with a LIMIT 1
) could balloon memory usage.
Couple more observations: https://www.cockroachlabs.com/blog/memory-usage-cockroachdb/ notes that ... LIMIT 1
queries may still cause high memory usage if there's not an index:
SELECT * FROM sometable ORDER BY somecolumn LIMIT 1. Although LIMIT restricts the size of the result set, this can blow up if somecolumn is not indexed and the table contains many rows.
inv_zpool
does have an index, and it looks like it is being used in this subquery, and that this query doesn't use very much memory total (50 KiB):
explain analyze SELECT total_size FROM omicron.public.inv_zpool WHERE inv_zpool.id = '01376d9a-434a-42d3-a4c8-dc78edcdfd06' ORDER BY inv_zpool.time_collected DESC LIMIT 1;
info
----------------------------------------------------------------------------------------------------
planning time: 535µs
execution time: 8ms
distribution: local
vectorized: true
rows read from KV: 2 (181 B)
cumulative time spent in KV: 4ms
maximum memory usage: 50 KiB
network usage: 0 B (0 messages)
• index join
│ nodes: n1
│ actual row count: 1
│ KV time: 672µs
│ KV contention time: 0µs
│ KV rows read: 1
│ KV bytes read: 85 B
│ estimated max memory allocated: 20 KiB
│ estimated max sql temp disk usage: 0 B
│ estimated row count: 1
│ table: inv_zpool@inv_zpool_pkey
│
└── • scan
nodes: n1
actual row count: 1
KV time: 4ms
KV contention time: 0µs
KV rows read: 1
KV bytes read: 96 B
estimated max memory allocated: 20 KiB
estimated row count: 1 (<0.01% of the table; stats collected 12 minutes ago)
table: inv_zpool@inv_zpool_by_id_and_time
spans: [/'01376d9a-434a-42d3-a4c8-dc78edcdfd06' - /'01376d9a-434a-42d3-a4c8-dc78edcdfd06']
limit: 1
(33 rows)
However - the size of inv_zpool
has grown significantly even in the 8 hours since you opened the issue:
root@[fd00:1122:3344:109::3]:32221/omicron OPEN> select count(*) from inv_zpool;
count
----------
252648
I'm still not sure this is the root cause of your errors, but we should fix it - it looks like it's not getting pruned when collections get pruned. I'll take a look at that shortly.
#4283 is the only other time we hit it. It has something to do with the number of rows processed and the only change in the dataset placement query is related to inv_zpool
AFAICT. (And the size of the disk doesn't matter at all - it's CRDB.)