Silly values on web status interface during 3GB raid1 resync (overflow?)
GoogleCodeExporter opened this issue · comments
What steps will reproduce the problem?
1. Make a dual 3GB raid 1.
2. It starts resyncing.
3. Look at the web interface.
What is the expected output? What do you see instead?
I get on the web interface:
RAID
Dev. Capacity Level State Status Action Done ETA
md0 2794.0 GB raid1 active OK resync
107% -10.7min
This corresponds (roughly) to /proc/mdstat:
$ cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda2[1] sdb2[0]
2929740112 blocks super 1.2 [2/2] [UU]
[======>..............] resync = 30.2% (885531648/2929740112) finish=300.7min speed=113284K/sec
bitmap: 16/22 pages [64KB], 65536KB chunk
unused devices: <none>
$
What Alt-F version are you using? Have you flashed it?
Alt-F 0.1RC3 Flashed.
What is the box hardware revision level? A1, B1 or C1? (look at the label
at the box bottom)
N/A
What is your disk configuration? Standard, RAID (what level)...
Raid1
What operating system are you using on your computer? Using what browser?
Chrome linux.
Please provide any additional information below.
Original issue reported on code.google.com by brian.br...@gmail.com
on 1 May 2013 at 1:53
It is an arithmetic issue (big numbers with 3TB disks, probably awk %d should
be replaced with a %f).
The issue must be at /usr/www/cgi-bin/status.cgi, at around line 295 (where
$mdev is md0 in your case)
compl=$(drawbargraph $(awk '{printf "%d", $1 * 100 / $3}'
/sys/block/$mdev/md/sync_completed))
speed=$(cat /sys/block/$mdev/md/sync_speed)
exp=$(awk '{printf "%.1fmin", ($3 - $1) * 512 / 1000 / '$speed' / 60}'
/sys/block/$mdev/md/sync_completed 2> /dev/null)
If it is still resync can you please post the output of
cat /sys/block/md0/md/sync_completed
cat /sys/block/md0/md/sync_speed
Thanks
Original comment by whoami.j...@gmail.com
on 1 May 2013 at 2:46
Still going. Web currently says md0 2794.0 GB raid1 active OK resync 20%
142.2min
which I would think was ok except /proc/mdstat sadly disagrees :-(
$ cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda2[1] sdb2[0]
2929740112 blocks super 1.2 [2/2] [UU]
[===============>.....] resync = 78.6% (2303036672/2929740112) finish=108.9min speed=95859K/sec
bitmap: 6/22 pages [24KB], 65536KB chunk
unused devices: <none>
$ cat /sys/block/md0/md/sync_completed
310542592 / 1564512928
$ cat /sys/block/md0/md/sync_speed
89338
awk saying 20% is about right for that sync_completed numbers.
And I just manually tried some big numbers in awk, and it doesn't seem to
overflow, so I think awk must be using fp or longs for those calculations
already.
So looks like we have a kernel overflow issue here...
Yeah, I just had a look at kernel source md.c, sync_completed_show function.
It uses unsigned long in 2.6.25, and has been fixed to long long sometime since.
Might be wise to change the web script to parse it out of /proc/mdstat instead!
Original comment by brian.br...@gmail.com
on 1 May 2013 at 6:12
/proc/mdstat contains very different type/formated information, it is difficult
to parse it.
I'm trying to port Alt-F to a more recent kernel, 3.8.11, and perhaps that will
solve the issue.
Original comment by whoami.j...@gmail.com
on 24 May 2013 at 11:48
From what I saw in the kernel source, it was definitely fixed by that
version.
Original comment by brian.br...@gmail.com
on 25 May 2013 at 11:31
I can confirm this on my recently flashed DLINK DNS-323 running Alt-F 0.1RC3. I
built a 2x3TB array Raid1 and am seeing he same here - Currently:
RAID
Dev. Capacity Level State Status Action Done ETA
md0 2794.0 GB raid1 active OK resync 210% -152.4min
Original comment by crazymac...@gmail.com
on 28 Jun 2013 at 10:47
Original comment by whoami.j...@gmail.com
on 29 Jun 2013 at 4:27
- Changed state: Accepted
Same here, see my ticket on sourceforge for details:
https://sourceforge.net/p/alt-f/tickets/10/
RAID
Dev. Capacity Level State Status Action Done ETA
md0 2794.0 GB raid1 active OK resync 158% -6517.8min
how can I make sure this is a false positive and that the resyncing is actually
done?
Stephane
Original comment by stephane...@gmail.com
on 2 Sep 2013 at 6:29
[deleted comment]
I tried the same commands on my box. our problem looks similar
$ cat /sys/block/md0/md/sync_completed
2564345088 / 1564512928
$ cat /sys/block/md0/md/sync_speed
150809
$ cat /proc/mdstat
Personalities : [linear] [raid1]
md0 : active raid1 sda2[1] sdb2[0]
2929740112 blocks super 1.2 [2/2] [UU]
[========>............] resync = 43.8% (1285270528/2929740112) finish=189.7min speed=144439K/sec
bitmap: 13/22 pages [52KB], 65536KB chunk
unused devices: <none>
Original comment by stephane...@gmail.com
on 2 Sep 2013 at 6:47
Don't worry, cat /proc/mdstat is telling the truth about what's happening, its
only the other numbers used by the web interface that are overflowing. (We
found there was a fixed kernel bug)
Original comment by brian.br...@gmail.com
on 2 Sep 2013 at 7:32
Yep I realized that. Thanks Brian!
Original comment by stephane...@gmail.com
on 2 Sep 2013 at 9:46