openzfs / zfs

OpenZFS on Linux and FreeBSD

Home Page:https://openzfs.github.io/openzfs-docs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

unable to import pool by 0.6.2 after resilver by 2014-01-16 git version

AndCycle opened this issue · comments

tl;dr

after resilver by git version,
the pool is no longer be able to be imported by zol 0.6.2/freebsd/solaris,
report as pool metadata is corrupted

longer version

here is raw dd dump for analysis and some related info,
you can use the raw dd dump for detail diagnosis,
http://www.andcycle.idv.tw/~andcycle/tmp/tmp/zol/20140201/

I got similar issue as this thread discussed, I don't have label issue,
http://comments.gmane.org/gmane.linux.file-systems.zfs.user/12105

Gentoo
sys-kernel/gentoo-sources-3.12.7
sys-kernel/gentoo-sources-3.12.8

timeline

longtime ago

zfs-0.6.2
create zroot
create zmess

2013/12/19

create ztmp

2013/12/21

zfs-0.6.2-r1

2014/01/16

zfs-0.6.2-r2

2014/01/16

zfs-9999

2014/01/23

zroot resilvered

2014/01/27

ztmp resilvered

2014/01/30

create zstor-past

2014/02/01

zfs-0.6.2-r3
zroot state FAULTED, The pool metadata is corrupted.
ztmp state FAULTED, The pool metadata is corrupted.
zmess state ONLINE
zstor-past state ONLINE

2014/02/01

zfs-9999
zroot state ONLINE
ztmp state ONLINE
zmess state ONLINE
zstor-past state ONLINE

that's all, I think,
I just messed up my production server,
now I am trapped with git version,
tell me if there is anymore information needed

@AndCycle I'm going to try to recreate this on my own while I'm downloading your sda2.xz image but I'd like to clarify the steps that caused the corruption. Am I correct in saying that your "ztmp" pool was created to demonstrate the problem and that the problem was caused by running a "zpool scrub" on it? Did the pool import properly under 0.6.2/FreeBSD/Solaris prior to running the scrub? Also, could you please post the exact version of the code you're running with cat /sys/module/zfs/version.

What did you do with the "ztmp" pool other than creating the 2 filesystems on it before running the scrub? Did you copy any files on to it?

@AndCycle I was able to reproduce the problem rather easily on my test system. No need for your image or anything else. I'm going to look into this issue now that I can reproduce it.

The problem is that in 1421c89 the size of a zbookmark_t was expanded. Unfortunately, the size of a zbookmark_t is, effectively, part of the expected on-disk format by the current ZFS code base. I've worked up a patch in dweeezil/zfs@c5223f5 which should only be considered a work-in-progress at this point. I have, however, tested it and it seems just fine. Read the commit message for details.

@dweeezil thanks for looking into this problem, probably will tryout your patch when I get free time :)

@dweeezil Nice work. This was on the short list of things that I wanted to debug before tagging a new Gentoo patch set. I planned to debug it before filing an issue, but @AndCycle beat me to reporting it and you appear to have beaten me to debugging it. :)

@AndCycle Nice description of the issue. Your report contains information that is far more useful than the information that I had to use. Last Monday, I discovered that HEAD broke backward compatibility. That put this on my short list of things to debug, but I had no idea what the trigger was and would not have spent much time on it until next week.

That being said, I am in a position to test and review this, but I might not have time until Monday.

Merged as:

3965d2e Merge branch 'issue-2094'
4f2dcb3 Add erratum for issue #2094
ffe9d38 Add generic errata infrastructure
731782e Use expected zpool_status_t type
ed9e836 Revert changes to zbookmark_t
a16bc6b Add zimport.sh compatibility test script