Truncated 7-Zip file body (error code: -30) on archive_read_data with 7z archives containing larger (>= 32 MiB) files when skipping entries
mxmlnkn opened this issue · comments
Hello,
Thanks for this widely useful project! I'm trying to incorporate it into ratarmount via python-libarchive-c.
After making the new backend work successfully with smaller archives, I stumbled upon a weird problem with a larger test file.
Create test files:
# Large archive with two files to test seekability and independence of opened files, which reproduces the bug.
> spaces-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%1024s' $'\n' >> spaces-32-MiB.txt; done
> zeros-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%01023d\n' 0 >> zeros-32-MiB.txt; done
7z a two-large-files.7z spaces-32-MiB.txt zeros-32-MiB.txt
# Slightly smaller file that I accidentally created before because of a bug, which for some reason works fine!
> spaces-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%1023s' $'\n' >> spaces-32-MiB.txt; done
> zeros-32-MiB.txt; for i in $( seq $(( 32 * 1024 )) ); do printf '%01023d\n' 0 >> zeros-32-MiB.txt; done
7z a two-slightly-less-large-files.7z spaces-32-MiB.txt zeros-32-MiB.txt
(The slightly smaller version calls printf '%1023s' $'\n'
instead of printf '%1024s' $'\n'
)
Python code triggering the issue
import libarchive
def listFiles(path):
print("\nList all entries of:", filePath)
with libarchive.file_reader(path) as archive:
for entry in archive:
print(entry)
for block in entry.get_blocks():
assert len(block) > 0
def readNthEntry(path, entryIndex):
print(f"\nGet contents of file {entryIndex} of archive: {path}")
with libarchive.file_reader(path) as archive:
entryCount = 0
for entry in archive:
if entryCount == entryIndex:
print(entry)
readSize = 0
for block in entry.get_blocks():
readSize += len(block)
print(f" Read file contents: {readSize} B")
entryCount += 1
filePath = "two-large-files.7z"
filePath2 = "two-slightly-less-large-files.7z"
listFiles(filePath) # No error
readNthEntry(filePath2, 1) # No error
# libarchive.exception.ArchiveError: Truncated 7-Zip file body (errno=84, retcode=-30, archive_p=...)
readNthEntry(filePath, 1)
C++ code triggering the issue
#include <array>
#include <iostream>
#include <set>
#include <sstream>
#include <stdexcept>
#include <string>
#include <utility>
#include <archive.h>
#include <archive_entry.h>
class Libarchive
{
public:
Libarchive( const std::string& path )
{
archive_read_support_filter_all( m_archive );
archive_read_support_format_all( m_archive );
auto returnCode = archive_read_open_filename( m_archive, path.c_str(), 10240 );
if ( returnCode != ARCHIVE_OK ) {
std::stringstream message;
message << "[Libarchive] Open " << path << " failed with: " << archive_error_string( m_archive )
<< " (error code: " << std::to_string( returnCode ) << ")";
throw std::runtime_error( std::move( message ).str() );
}
}
~Libarchive()
{
const auto returnCode = archive_read_free( m_archive );
if ( returnCode != ARCHIVE_OK ) {
std::cerr << "Freeing archive failed with: " << returnCode << "\n";
}
}
[[nodiscard]] archive*
pointer() const noexcept
{
return m_archive;
}
private:
archive* const m_archive{ archive_read_new() };
};
class LibarchiveEntry
{
public:
~LibarchiveEntry()
{
archive_entry_free( m_entry );
}
[[nodiscard]] archive_entry*
pointer() const noexcept
{
return m_entry;
}
private:
archive_entry* const m_entry{ archive_entry_new() };
};
void
listFiles( const std::string& path )
{
Libarchive archive{ path };
archive_entry* entry{ nullptr };
while ( archive_read_next_header( archive.pointer(), &entry ) == ARCHIVE_OK ) {
std::cout << archive_entry_pathname( entry ) << "\n";
//archive_read_data_skip(a); // not necessary as the Wiki says
}
}
void
readNthEntries( const std::string& path,
const std::set<size_t>& entryIndexes )
{
std::cout << "\nGet contents of files";
for ( const auto i : entryIndexes ) {
std::cout << " " << i;
}
std::cout << " in archive: " << path << "\n";
Libarchive archive{ path };
size_t entryCount{ 0 };
LibarchiveEntry entry;
while ( true ) {
/* I also tried with archive_read_next_header, but the bug persists. */
if ( archive_read_next_header2( archive.pointer(), entry.pointer() ) != ARCHIVE_OK ) {
break;
}
if ( entryIndexes.contains( entryCount ) ) {
std::cout << archive_entry_pathname( entry.pointer() ) << "\n";
std::array<char, 32 * 1024> buffer{};
size_t readSize{ 0 };
while ( true ) {
const auto readSizePerCall = archive_read_data( archive.pointer(), buffer.data(), buffer.size() );
if ( readSizePerCall < 0 ) {
std::stringstream message;
message << "[Libarchive] Read data failed with: " << archive_error_string( archive.pointer() )
<< " (error code: " << std::to_string( readSizePerCall ) << ")";
//continue; // Works fine (amount of returned data is correct) to simply ignore the error!?
throw std::runtime_error( std::move( message ).str() );
}
if ( readSizePerCall == 0 ) {
break;
}
readSize += readSizePerCall;
}
std::cout << " Read file contents: " << readSize << " B\n";
} else {
//archive_read_data_skip( archive.pointer() ); // Uncommenting this does not help.
}
++entryCount;
}
}
int main()
{
static const std::string filePath = "two-large-files.7z";
static const std::string filePath2 = "two-slightly-less-large-files.7z";
std::cout << "\nList all entries of: " << filePath << "\n";
listFiles( filePath );
/* Works fine with the slightly smaller file. */
readNthEntries( filePath2, { 0 } );
readNthEntries( filePath2, { 1 } );
/* Works fine when not skipping any entry. */
readNthEntries( filePath2, { 0, 1 } );
readNthEntries( filePath, { 0, 1 } );
readNthEntries( filePath, { 0 } );
/* Read data failed with: Truncated 7-Zip file body (error code: -30) */
readNthEntries( filePath, { 1 } );
return 0;
}
Compiled with:
g++ -Wall -Wextra -Wshadow -std=c++20 -o libarchive-entry-skipping-issue{,.cpp} -larchive && ./libarchive-entry-skipping-issue
Output:
List all entries of: two-large-files.7z
spaces-32-MiB.txt
zeros-32-MiB.txt
Get contents of files 0 in archive: two-slightly-less-large-files.7z
spaces-32-MiB.txt
Read file contents: 33521664 B
Get contents of files 1 in archive: two-slightly-less-large-files.7z
zeros-32-MiB.txt
Read file contents: 33554432 B
Get contents of files 0 1 in archive: two-slightly-less-large-files.7z
spaces-32-MiB.txt
Read file contents: 33521664 B
zeros-32-MiB.txt
Read file contents: 33554432 B
Get contents of files 0 1 in archive: two-large-files.7z
spaces-32-MiB.txt
Read file contents: 33554432 B
zeros-32-MiB.txt
Read file contents: 33554432 B
Get contents of files 0 in archive: two-large-files.7z
spaces-32-MiB.txt
Read file contents: 33554432 B
Get contents of files 1 in archive: two-large-files.7z
zeros-32-MiB.txt
terminate called after throwing an instance of 'std::runtime_error'
what(): [Libarchive] Read data failed with: Truncated 7-Zip file body (error code: -30)
Aborted
Observations:
- Note that I was very close to reporting this at python-libarchive-c instead of here because I was unable to reproduce the bug with the C++ code at first. It turns out that I forgot the return code check of
archive_read_data
and it also turns out that ignoring that error (see commented-out code) seems to result in the correct amount of data being returned in subsequentarchive_read_data
calls! - I had a slightly smaller file at first because of printf peculiarities. Everything works fine with that file
two-slightly-less-large-files.7z
. It only happens withtwo-large-files.7z
. - It also does not happen when not skipping entries, i.e., when calling
archive_read_data
for all entries.
Do you see the same issue with this?
bsdtar -tvf two-large-files.7z
Note: The -t
option to bsdtar
skips the entry bodies to produce its listing.
@kientzle So, it works the same as my listFiles
implementations, i.e., archive_read_data
is not even called and therefore this bug should not happen. I tried it, and it works without error, same as my implementations. It only happens when skipping the first and then trying to read the second entry.
bsdtar -tvf two-large-files.7z
# -rwx------ 0 0 0 33554432 Apr 1 13:16 spaces-32-MiB.txt
# -rwx------ 0 0 0 33554432 Mar 31 23:27 zeros-32-MiB.txt
I can reproduce the bug with bsdtar like this:
bsdtar -x --exclude spaces-32-MiB.txt -f two-large-files.7z
# zeros-32-MiB.txt: Truncated 7-Zip file body: File exists
# bsdtar: Error exit delayed from previous errors.
While it works fine when excluding the other file:
bsdtar -x --exclude zeros-32-MiB.txt -f two-large-files.7z