liftbridge-io / liftbridge

Lightweight, fault-tolerant message streams.

Home Page:https://liftbridge.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panic: failed to delete stream: failed to delete stream data directory

ekbfh opened this issue · comments

commented

Hi!
Sometime LB doesn't start with that error:

/bin/liftbridge --config=/etc/liftbridge/liftbridge.yml
INFO [..12:46:04] Liftbridge Version:        v1.3.0            
INFO [..12:46:04] Server ID:                 centos-noc1       
INFO [..12:46:04] Namespace:                 liftbridge-default 
INFO [..12:46:04] Default Retention Policy:  [Age: 1 day, Compact: false] 
INFO [..12:46:04] Default Partition Pausing: disabled          
INFO [..12:46:04] Starting server on 172.22.2.229:9292...      
INFO [..12:46:05]  raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:centos-noc1 Address:centos-noc1}]" 
DEBU[..12:46:05] Loaded existing state for metadata Raft group 
INFO[..12:46:05]  raft: entering follower state: follower="Node at centos-noc1 [Follower]" leader= 
DEBU[..12:46:05] api: FetchMetadata []                        
WARN[..12:46:06]  raft: heartbeat timeout reached, starting election: last-leader= 
INFO[..12:46:06]  raft: entering candidate state: node="Node at centos-noc1 [Candidate]" term=3558 
DEBU[..12:46:06] raft: votes: needed=1                        
DEBU[..12:46:06] raft: vote granted: from=centos-noc1 term=3558 tally=1 
INFO[..12:46:06]  raft: election won: tally=1                 
INFO[..12:46:06]  raft: entering leader state: leader="Node at centos-noc1 [Leader]" 
INFO[..12:46:06] Server became metadata leader, performing leader promotion actions 
DEBU[..12:46:07] fsm: Replaying Raft log...                   
panic: failed to delete stream: failed to delete stream data directory: remove /var/lib/liftbridge/streams/events.default: directory not empty
 
goroutine 39 [running]:
github.com/liftbridge-io/liftbridge/server.(*Server).Apply(0xc00037c000, 0xc00007c8c0, 0x0, 0x0)
        /home/circleci/project/server/fsm.go:111 +0x3e3
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0000781d0)
        /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:90 +0x2c1
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000030200, 0x40, 0x40)
        /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc000488000)
        /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:219 +0x42f
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc000488000, 0xc000079ea0)
        /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
        /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:144 +0x6

Directory indeed not empty, but what is the problem?

It fixes only when I remove all data_dir
Also i can attach an archieve with LB data, if it needed.

If you could please attach the contents of the directory that it fails to delete (/var/lib/liftbridge/streams/events.default) after the error occurs that would help to debug. The directory should be empty when it attempts to delete, so I'm curious what it contains.

commented

events.default.zip

It can be in multinode cluster(3) and in single-node too.
Config:

clustering:
    raft.bootstrap.seed: true
    server.id: name111
cursors:
    stream.auto.pause.time: 0
    stream.partitions: 1
data:
    dir: /var/lib/liftbridge
host: 172.24....
logging:
    level: info
    raft: true
nats.servers:
- nats://172.24.....:4222
streams:
    compact.enabled: false
    retention.max:
        age: 24h

Thanks @ekbfh. I believe I have a fix for the issue. Could you try running this branch? https://github.com/liftbridge-io/liftbridge/tree/fix_issue_297

If you can confirm that this resolves your issue, I can get the patch merged in.

commented

Yes, liftbridge is works now. I get an error, change binary, restarts -- it works.

Fix merged in #299. Will be part of the next release.