ymnk / jzlib

re-implementation of zlib in pure Java

Home Page:http://www.jcraft.com/jzlib/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ArrayIndexOutOfBoundsException in Tree.d_code

jglick opened this issue · comments

Filed upstream as JENKINS-19473. We switched to JZlib to avoid an apparent livelock in java.util.zip. Unfortunately this has caused errors when sending certain (large) files over a GZip-compressed stream using JZlib 1.1.2 (or 1.1.1).

The root issue appears to be this exception:

java.lang.ArrayIndexOutOfBoundsException: 677
    at com.jcraft.jzlib.Tree.d_code(Tree.java:149)
    at com.jcraft.jzlib.Deflate.compress_block(Deflate.java:696)
    at com.jcraft.jzlib.Deflate._tr_flush_block(Deflate.java:902)
    at com.jcraft.jzlib.Deflate.flush_block_only(Deflate.java:777)
    at com.jcraft.jzlib.Deflate.deflate_slow(Deflate.java:1200)
    at com.jcraft.jzlib.Deflate.deflate(Deflate.java:1586)
    at com.jcraft.jzlib.Deflater.deflate(Deflater.java:140)
    at com.jcraft.jzlib.DeflaterOutputStream.deflate(DeflaterOutputStream.java:129)
    at com.jcraft.jzlib.DeflaterOutputStream.write(DeflaterOutputStream.java:102)
    at com.jcraft.jzlib.DeflaterOutputStream.write(DeflaterOutputStream.java:85)

dist >= 0x8000 causes this because of

_dist_code[256+((dist)>>>7)]

but the definition of that distance

((pending_buf[d_buf+lx*2]<<8)&0xff00)|
  (pending_buf[d_buf+lx*2+1]&0xff)

could presumably produce values up to 0xffff. Somehow that does not happen normally—only for certain files.

The originally reported bug is actually a different stack trace:

java.lang.ArrayIndexOutOfBoundsException: 65536
    at com.jcraft.jzlib.Deflate._tr_tally(Deflate.java:635)
    at com.jcraft.jzlib.Deflate.deflate_slow(Deflate.java:1177)
    at com.jcraft.jzlib.Deflate.deflate(Deflate.java:1586)
    at com.jcraft.jzlib.Deflater.deflate(Deflater.java:140)
    at com.jcraft.jzlib.DeflaterOutputStream.deflate(DeflaterOutputStream.java:129)
    at com.jcraft.jzlib.DeflaterOutputStream.write(DeflaterOutputStream.java:102)

which I can reproduce inside Jenkins but not in a standalone test case (perhaps due to differences in buffering?). In this code

pending_buf[l_buf+last_lit] = (byte)lc;

pending_buf is of length 0x10000, yet l_buf is 0xc000 and last_lit is 0x4000.

The bug only seems to affect certain highly repetitive, large files. For example, ubuntu-13.04-server-amd64.iso at 702Mb ran through fine as long as I let it go (for several minutes).

I am able to reproduce the problem in the form of a JUnit test (in Java, sorry, not brushing up on Scala just for this!):

package com.jcraft.jzlib;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.util.zip.CheckedOutputStream;
import java.util.zip.Checksum;
import java.util.zip.CRC32;
import java.util.zip.CheckedInputStream;
import org.junit.Test;
import static org.junit.Assert.*;

public class DeflateTest {

    @Test public void JENKINS_19473() throws Exception {
        PipedOutputStream pos = new PipedOutputStream();
        InputStream pis = new PipedInputStream(pos);
        Checksum csOut = new CRC32();
        OutputStream gos = new GZIPOutputStream(pos);
        final OutputStream cos = new CheckedOutputStream(gos, csOut);
        Thread t = new Thread() {
            @Override public void run() {
                try {
                    InputStream fis = new FileInputStream("…/jzlib.fail");
                    try {
                        int c;
                        while ((c = fis.read()) != -1) {
                            cos.write(c);
                        }
                    } finally {
                        fis.close();
                    }
                    cos.close();
                } catch (IOException x) {
                    x.printStackTrace();
                }
            }
        };
        t.start();
        InputStream gis = new GZIPInputStream(pis);
        Checksum csIn = new CRC32();
        InputStream cis = new CheckedInputStream(gis, csIn);
        while (cis.read() != -1) {/* discard */}
        t.join();
        assertEquals(csOut.getValue(), csIn.getValue());
    }

}

The test file is available in compressed form here. (Use gunzip before trying to use.)

Thank you for your feedback. I'll investigate this issue.

It seems the following change will work around that problem,

// OutputStream gos = new GZIPOutputStream(pos);
OutputStream gos = 
  new GZIPOutputStream(
    pos,
    new com.jcraft.jzlib.Deflater(6, 15+16, 9),   // use 9 for memLevel
    512,
    true
  );

Please try it. I'll continue the efforts to fix this issue.

Thanks, the workaround helps with the test file mentioned. Do you think it would help with other files triggering the bug as well?

Unfortunately, I guess that there are possibilities other files may cause this issue.
I have recognized where is a bug, and I hope it will be fixed in a few days.

I think the issue has been fixed. Please try the commit 8b205d6 with

 OutputStream gos = new GZIPOutputStream(pos);

If there is not a problem, I'll push it to the maven central repository.

Have not forgotten this, have just been too busy to test; will try to do soon.

I'm having a similar issue, although it seems slightly different.

java.lang.ArrayIndexOutOfBoundsException: -1
at com.jcraft.jzlib.Deflate.deflate_slow(Deflate.java:1209)
at com.jcraft.jzlib.Deflate.deflate(Deflate.java:1586)
at com.jcraft.jzlib.Deflater.deflate(Deflater.java:140)
at com.jcraft.jzlib.DeflaterOutputStream.deflate(DeflaterOutputStream.java:129)
at com.jcraft.jzlib.DeflaterOutputStream.write(DeflaterOutputStream.java:102)

I'm going to try this fix and see whether it has been solved.

It seems it is not related to this issue.

I'm having a similar issue, although it seems slightly different.

Is it possible to provide data to reproduce that problem?

Indeed when a build based on 8b205d6 is embedded in Jenkins I am able to revert my workaround of JENKINS-19473 (in FilePath) and still archive the jzlib.fail test file.

I also tried archiving an Arquillian source checkout complete with build products, all of which gets tarred up and gzipped on the slave before being transferred to the master for unpacking, and that worked as well: ~443Mb on disk, ~365Mb compressed as by GNU tar cz.

So from my perspective the fix should be released.

So from my perspective the fix should be released.

We have been preparing for the next release. It will appear on the maven central repository soon.

Still see nothing there; I guess you will close this issue when it appears?

Yes, I'll close this issue when it appears.

It has appeard there.

FlxRobin, if the reported problem is reproducible, please open the new issue.