libarchive / libarchive

Multi-format archive and compression library

Home Page:http://www.libarchive.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read hive default generated deflate compression file failed

MisterRaindrop opened this issue · comments

Read hive default generated deflate compression file failed

Hive SQL
my hive version apache-hive-3.1.3

set hive.exec.compress.output=true;
set  mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec;
drop table hive_example;
CREATE TABLE hive_example
(
    id  int,
    name string
)
STORED AS TEXTFILE;
INSERT INTO TABLE hive_example values(1, "aaaaabbbb");

will create deflate file in hdfs

/usr/hive/hive_example/000000_0.deflate

this file I used zlib can read but used libarchive read failed

My example code

#include <archive.h>
#include <archive_entry.h>
#include <stdio.h>
int main() {
    struct archive *a;
    struct archive_entry *entry;
    int r;
    a = archive_read_new();
    archive_read_support_filter_all(a);
    archive_read_support_format_all(a);

    r = archive_read_open_filename(a, "/opt/share/000000_0.deflate", 10240); 
    if (r != ARCHIVE_OK) {
        printf("Failed to open archive.\n");
        return 1;
    }

    while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
        printf("File name: %s\n", archive_entry_pathname(entry));

        const void *buff;
        size_t size;
        off_t offset;
        while (archive_read_data_block(a, &buff, &size, &offset) == ARCHIVE_OK) {
            printf("Data: %s", (const char *)buff);
        }
    }
    archive_read_close(a);
    archive_read_free(a);
    return 0;
}

build

g++ example_archive_read.cpp -o example_archive_read -g -O0 -larchive

My env

I build release tar libarchive-3.7.4 in centos7 and link zlib

ldd -r libarchive.so.13
	linux-vdso.so.1 =>  (0x00007ffdf45ab000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fc6edb8f000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fc6ed969000)
	libzstd.so.1 => /lib64/libzstd.so.1 (0x00007fc6ed6ae000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fc6ed49f000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007fc6ed28f000)
	libz.so.1 => /lib64/libz.so.1 (0x00007fc6ed079000)
	libxml2.so.2 => /lib64/libxml2.so.2 (0x00007fc6ecd0f000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fc6ec941000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fc6ec73d000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc6ec521000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fc6ec21f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc6edff2000)

zlib shared library libz.so.1 already link

But I read hive default generated deflate compression file failed! Anybody know why?? libarchive does not support reading zlib’s deflate format??

No, libarchive does not and cannot support the zlib deflate format. Libarchive requires that any format it supports have a distinctive way to identify the file format. Zlib deflate format does not have a "magic value" identifying the format, so there is no reliable way for libarchive to identify this format.