seaweedfs / seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrade 3.62 -> 3.63 - sorted needle write error: write file.sdx: bad file descriptor

SystemZ opened this issue · comments

I've seen data loss bugs in versions before 3.63 so I upgraded but maybe there is data loss, hard to tell.
I expected to not have write errors or to rewrite volumes somehow to fix problem easily but I'm stuck 😢

I use official docker images for all my nodes.
I upgraded my volume servers by setting container's entrypoint to /bin/sleep 90000 and using 3.63 image.
This way volume server didn't start before I used weed fix inside container shell if I understood correctly those instructions
#5348
After weed fix I turned on volume server with newer image.
I repeated this for all my 4 volume servers one by one.

I noticed that during volume server start, some volumes are broken

W0312 17:58:14.860120 volume_checking.go:121 data file /data/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes!

I tried restarting volume servers, master, no change.
I tried volume.fsck but it didn't solve problem.

volume.fsck -reallyDeleteFromVolume -verifyNeedles -forcePurging -collection mastodon -volumeId 74
dataNode:192.168.2.2:8072       volume:74       entries:27811   orphan:1        0.00%   14928B                                                                                                                                              
temporarily marked 74 on server 192.168.2.2:8072 writable for forced purge                                                                                                                                                                  
marked 74 on server 192.168.2.2:8072 writable for forced purge                                                                                                                                                                              
purging orphan data for volume 74...                                                                                                                                                                                                        
error: findExtraChunksInVolumeServers: purging volume 74: delete fileId 74,e183c00000000: sorted needle write error: write /data/mastodon_74.sdx: bad file descriptor

Any idea how to try fix issue, preferably without data loss?
I think problematic volumes like this had pretty big changes in trash in/out.

"some volumes are broken" do they fail to load or just become readonly?

3.63 removed wrong logic to auto fixing volume data, and needs to manually fix the errors.

There should not be any data lost.

W0312 17:58:14.860120 volume_checking.go:121 data file /data/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes!

for this particular warning, you can truncate file /data/depo_93069.dat to 10908271200 bytes

Those volumes are read only, seems to load.
How can I truncate file?

truncate -s [number of bytes] filename

I still have problems with this :/

Case 1

Before using it, volume server log showed this at start:

W0313 05:54:23.722144 volume_checking.go:121 data file /data/depo_55.dat actual 11451781248 bytes expected 11449684024 bytes!

After using it, there is some needle verify error during fsck

truncate -s 11449684024 /data/depo_55.dat
# volume server restart, it loaded without errors in log about volume 55

> volume.fsck -collection depo -verifyNeedles -volumeId 55
total 5920 directories, 25770 files
failed to read 55:157934 needle status of file REDACTED: rpc error: code = Unknown desc = EOF

Total           entries:5479    orphan:0        0.00%   0B
This could be normal if multiple filers or no filers are used.
no orphan data

and error in volume server log

E0313 05:59:54.112599 needle_read.go:45 /data/depo_55.dat read 0 dataSize 2097224 offset 11449684024 fileSize 11449684024: EOF

Case 2

Replication 010, different sizes on two hosts.
Strawberry use EXT4, nas use XFS.
I didn't modify it yet.

I0313 06:52:01.357663 volume_loading.go:142 loading memory index /data/danbooru_4.idx to memory                                                                                                                                                     
I0313 06:52:01.360191 disk_location.go:182 data file /data/danbooru_4.dat, replication=010 v=3 size=1754610808 ttl= 

root@strawberry:~# ls -al /docker/containers/seaweedfs/volume/danbooru_4.dat
-rw-r--r-- 1 root root 1754610808 Mar  3 16:33 /docker/containers/seaweedfs/volume/danbooru_4.dat

root@strawberry:~# ls -ls --block-size=1k /docker/containers/seaweedfs/volume/danbooru_4.dat 
10485764 -rw-r--r-- 1 root root 1713488 Mar  3 16:33 /docker/containers/seaweedfs/volume/danbooru_4.dat
W0313 06:17:06.842800 volume_checking.go:121 data file /data/danbooru_4.dat actual 1754610800 bytes expected 1754340568 bytes!                                                                                                                    
I0313 06:17:06.843036 volume_loading.go:128 volumeDataIntegrityChecking failed data file /data/danbooru_4.dat actual 1754610800 bytes expected 1754340568 bytes

root@nas:~# ls -al /mnt/user/seaweedfs/volume/danbooru_4.dat 
-rw-r--r-- 1 root root 1754610800 Mar  3 16:33 /mnt/user/seaweedfs/volume/danbooru_4.dat

root@nas:~# ls -ls --block-size=1k  /mnt/user/seaweedfs/volume/danbooru_4.dat
10485760 -rw-r--r-- 1 root root 1713488 Mar  3 16:33 /mnt/user/seaweedfs/volume/danbooru_4.dat

I had the same problem and weed compact solved it

for case 1, the truncate did not work well. You can use "weed fix" since the .dat file is correct.

Case 1

Seems OK after running weed fix, again after version upgrade.
Previously I unmarked it as read only earlier, not sure if that somehow changed anything.

[admin@MikroTik Core] > /log/print where message~"depo_55"

 07:24:23 container,info,debug I0315 06:24:23.995707 volume_loading.go:142 loading memory index /data/depo_55.idx to memory
 07:24:23 container,info,debug I0315 06:24:23.997279 disk_location.go:182 data file /data/depo_55.dat, replication=000 v=3 size=11449684024 ttl=
> volume.fsck -collection depo -verifyNeedles -volumeId 55
total 5920 directories, 25770 files

Total           entries:5479    orphan:0        0.00%   0B
This could be normal if multiple filers or no filers are used.
no orphan data

Case 3

Similar to case 1 and 2 but It's on volume server that's easier to work on for me.

I tried compact but it didn't help.
Note, time inside container is GMT+0 and on host which weed compact is executed is GMT+1

root@nas:/mnt/cache/ssd/seaweed# ./weed363 compact -volumeId 93069 -collection depo -dir /mnt/cache/ssd/seaweed/volume/

I0315 07:43:19.386159 volume_loading.go:91 readSuperBlock volume 93069 version 3
W0315 07:43:19.386292 volume_checking.go:121 data file /mnt/cache/ssd/seaweed/volume/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes!
I0315 07:43:19.386358 volume_loading.go:128 volumeDataIntegrityChecking failed data file /mnt/cache/ssd/seaweed/volume/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes
I0315 07:43:19.387426 volume_loading.go:91 readSuperBlock volume 93069 version 3
docker logs -f seaweedfs-volume-nvme |& grep depo_93069

W0315 06:44:09.712158 volume_checking.go:121 data file /data/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes!
I0315 06:44:09.712168 volume_loading.go:128 volumeDataIntegrityChecking failed data file /data/depo_93069.dat actual 10910368424 bytes expected 10908271200 bytes
I0315 06:44:09.713530 disk_location.go:182 data file /data/depo_93069.dat, replication=000 v=3 size=10910368424 ttl=

CLI docs

Compact default method is 0, but it's not listed in help.

compactMethod = cmdCompact.Flag.Int("method", 0, "option to choose which compact method. use 0 or 1.")

I created PR to fix it
#5379

Questions

Is there any way to list files located by volume ID by weed shell?
I'll try to check if files on problematic volumes are ok but I need to locate them first.

I was able to move out files from all my Seaweedfs collections.
All my collections besides one weren't frequently changed.
Only one ~20GB file was lost caused to I/O problems in logs.

Last remaining collection, very active (frequent creating and deleting) got multiple I/O problems and probably multiple small files were lost.
I'm guessing that tailing volume when vacuum run was the problem in 3.62.

I don't use Seaweedfs anymore, I migrated one small bucket to minio so I'm marking this as closed.