panic: failed to delete stream: failed to delete stream data directory
ekbfh opened this issue · comments
Hi!
Sometime LB doesn't start with that error:
/bin/liftbridge --config=/etc/liftbridge/liftbridge.yml
INFO [..12:46:04] Liftbridge Version: v1.3.0
INFO [..12:46:04] Server ID: centos-noc1
INFO [..12:46:04] Namespace: liftbridge-default
INFO [..12:46:04] Default Retention Policy: [Age: 1 day, Compact: false]
INFO [..12:46:04] Default Partition Pausing: disabled
INFO [..12:46:04] Starting server on 172.22.2.229:9292...
INFO [..12:46:05] raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:centos-noc1 Address:centos-noc1}]"
DEBU[..12:46:05] Loaded existing state for metadata Raft group
INFO[..12:46:05] raft: entering follower state: follower="Node at centos-noc1 [Follower]" leader=
DEBU[..12:46:05] api: FetchMetadata []
WARN[..12:46:06] raft: heartbeat timeout reached, starting election: last-leader=
INFO[..12:46:06] raft: entering candidate state: node="Node at centos-noc1 [Candidate]" term=3558
DEBU[..12:46:06] raft: votes: needed=1
DEBU[..12:46:06] raft: vote granted: from=centos-noc1 term=3558 tally=1
INFO[..12:46:06] raft: election won: tally=1
INFO[..12:46:06] raft: entering leader state: leader="Node at centos-noc1 [Leader]"
INFO[..12:46:06] Server became metadata leader, performing leader promotion actions
DEBU[..12:46:07] fsm: Replaying Raft log...
panic: failed to delete stream: failed to delete stream data directory: remove /var/lib/liftbridge/streams/events.default: directory not empty
goroutine 39 [running]:
github.com/liftbridge-io/liftbridge/server.(*Server).Apply(0xc00037c000, 0xc00007c8c0, 0x0, 0x0)
/home/circleci/project/server/fsm.go:111 +0x3e3
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0000781d0)
/go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:90 +0x2c1
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000030200, 0x40, 0x40)
/go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc000488000)
/go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:219 +0x42f
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc000488000, 0xc000079ea0)
/go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
/go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:144 +0x6
Directory indeed not empty, but what is the problem?
It fixes only when I remove all data_dir
Also i can attach an archieve with LB data, if it needed.
If you could please attach the contents of the directory that it fails to delete (/var/lib/liftbridge/streams/events.default
) after the error occurs that would help to debug. The directory should be empty when it attempts to delete, so I'm curious what it contains.
It can be in multinode cluster(3) and in single-node too.
Config:
clustering:
raft.bootstrap.seed: true
server.id: name111
cursors:
stream.auto.pause.time: 0
stream.partitions: 1
data:
dir: /var/lib/liftbridge
host: 172.24....
logging:
level: info
raft: true
nats.servers:
- nats://172.24.....:4222
streams:
compact.enabled: false
retention.max:
age: 24h
Thanks @ekbfh. I believe I have a fix for the issue. Could you try running this branch? https://github.com/liftbridge-io/liftbridge/tree/fix_issue_297
If you can confirm that this resolves your issue, I can get the patch merged in.
Yes, liftbridge is works now. I get an error, change binary, restarts -- it works.
Fix merged in #299. Will be part of the next release.