google / re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Three NULL Pointer Dereference bugs found in re2-2023-09-01

TimChan2001 opened this issue · comments

We found 3 null pointer dereference bugs in the re2 version 2023-09-01 using the testing method of Google Fuzzer Test Suite. We used the latest version of abseil-cpp (Abseil LTS 20230802.1) for compilation, and the testing environment was 64-bit Ubuntu 18.04. We believe these might be issues with re2 rather than abseil, but we're not sure for now.

  1. The POC can be found here. POC1
======================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==59798==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000574ade bp 0x7ffc0ea74530 sp 0x7ffc0ea74460 T0)
==59798==The signal is caused by a READ memory access.
==59798==Hint: address points to the zero page.
    #0 0x574ade in HashSetIteratorGenerationInfoEnabled /usr/local/include/absl/container/internal/raw_hash_set.h:876:54
    #1 0x574ade in iterator /usr/local/include/absl/container/internal/raw_hash_set.h:1640:11
    #2 0x574ade in end /usr/local/include/absl/container/internal/raw_hash_set.h:1902:12
    #3 0x574ade in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashSetPolicy<re2::DFA::State*>, re2::DFA::StateHash, re2::DFA::StateEqual, std::allocator<re2::DFA::State*> >::iterator absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashSetPolicy<re2::DFA::State*>, re2::DFA::StateHash, re2::DFA::StateEqual, std::allocator<re2::DFA::State*> >::find<re2::DFA::State*>(re2::DFA::State* const&, unsigned long) /usr/local/include/absl/container/internal/raw_hash_set.h:2342:52
    #4 0x54d350 in find<re2::DFA::State *> /usr/local/include/absl/container/internal/raw_hash_set.h:2350:12
    #5 0x54d350 in re2::DFA::CachedState(int*, int, unsigned int) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:753:40
    #6 0x54cece in re2::DFA::WorkqToCachedState(re2::DFA::Workq*, re2::DFA::Workq*, unsigned int) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:736:18
    #7 0x56c258 in re2::DFA::AnalyzeSearchHelper(re2::DFA::SearchParams*, re2::DFA::StartInfo*, unsigned int) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:1739:11
    #8 0x56b95b in re2::DFA::AnalyzeSearch(re2::DFA::SearchParams*) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:1693:8
    #9 0x56c762 in re2::DFA::Search(absl::lts_20230802::string_view, absl::lts_20230802::string_view, bool, bool, bool, bool*, char const**, re2::SparseSetT<void>*) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:1772:8
    #10 0x56d5c7 in re2::Prog::SearchDFA(absl::lts_20230802::string_view, absl::lts_20230802::string_view, re2::Prog::Anchor, re2::Prog::MatchKind, absl::lts_20230802::string_view*, bool*, re2::SparseSetT<void>*) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:1884:23
    #11 0x50297d in re2::RE2::Match(absl::lts_20230802::string_view, unsigned long, unsigned long, re2::RE2::Anchor, absl::lts_20230802::string_view*, int) const /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:838:19
    #12 0x50080f in re2::RE2::DoMatch(absl::lts_20230802::string_view, re2::RE2::Anchor, unsigned long*, re2::RE2::Arg const* const*, int) const /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:935:8
    #13 0x5003b2 in re2::RE2::FullMatchN(absl::lts_20230802::string_view, re2::RE2 const&, re2::RE2::Arg const* const*, int) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:410:13
    #14 0x4f9c40 in Apply<bool (*)(absl::lts_20230802::string_view, const re2::RE2 &, const re2::RE2::Arg *const *, int), absl::lts_20230802::string_view> /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.h:353:12
    #15 0x4f9c40 in FullMatch<> /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.h:401:12
    #16 0x4f9c40 in LLVMFuzzerTestOneInput /root/fuzzer-test-suite/re2-2023-09-01/build-latest/../target.cc:27:5
    #17 0x63df89 in main /root/libfuzzer-workshop/libFuzzer/Fuzzer/afl/afl_driver.cpp:287:7
    #18 0x7f0137bddc86 in __libc_start_main /build/glibc-CVJwZb/glibc-2.27/csu/../csu/libc-start.c:310
    #19 0x420e69 in _start (/root/fuzzer-test-suite/re2-2023-09-01/build-latest/test-re2+0x420e69)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /usr/local/include/absl/container/internal/raw_hash_set.h:876:54 in HashSetIteratorGenerationInfoEnabled
==59798==ABORTING
  1. The POC can be found here. POC2
======================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==2776==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000054a001 bp 0x7fffeece8c50 sp 0x7fffeece8ac0 T0)
==2776==The signal is caused by a READ memory access.
==2776==Hint: address points to the zero page.
    #0 0x54a001 in HashSetIteratorGenerationInfoEnabled /usr/local/include/absl/container/internal/raw_hash_set.h:876:54
    #1 0x54a001 in iterator /usr/local/include/absl/container/internal/raw_hash_set.h:1631:11
    #2 0x54a001 in iterator_at /usr/local/include/absl/container/internal/raw_hash_set.h:2708:12
    #3 0x54a001 in begin /usr/local/include/absl/container/internal/raw_hash_set.h:1897:15
    #4 0x54a001 in re2::DFA::ClearCache() /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:801:43
    #5 0x549d12 in re2::DFA::~DFA() /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:469:3
    #6 0x56d119 in re2::Prog::DeleteDFA(re2::DFA*) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/dfa.cc:1826:3
    #7 0x5a2f31 in re2::Prog::~Prog() /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/prog.cc:127:3
    #8 0x4ff049 in re2::RE2::~RE2() /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:301:3
    #9 0x4f9c52 in LLVMFuzzerTestOneInput /root/fuzzer-test-suite/re2-2023-09-01/build-latest/../target.cc:29:1
    #10 0x63df89 in main /root/libfuzzer-workshop/libFuzzer/Fuzzer/afl/afl_driver.cpp:287:7
    #11 0x7f9df006ac86 in __libc_start_main /build/glibc-CVJwZb/glibc-2.27/csu/../csu/libc-start.c:310
    #12 0x420e69 in _start (/root/fuzzer-test-suite/re2-2023-09-01/build-latest/test-re2+0x420e69)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /usr/local/include/absl/container/internal/raw_hash_set.h:876:54 in HashSetIteratorGenerationInfoEnabled
==2776==ABORTING
  1. The POC can be found here. POC3
======================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==18149==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000533961 bp 0x7ffec430a770 sp 0x7ffec430a4e0 T0)
==18149==The signal is caused by a READ memory access.
==18149==Hint: address points to the zero page.
    #0 0x533961 in re2::Compiler::CachedRuneByteSuffix(unsigned char, unsigned char, bool, int) /usr/local/include/absl/container/internal/raw_hash_set.h
    #1 0x536bfa in re2::Compiler::AddRuneRangeUTF8(int, int, bool) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/compile.cc:782:14
    #2 0x53cc5b in AddRuneRange /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/compile.cc:634:7
    #3 0x53cc5b in re2::Compiler::PostVisit(re2::Regexp*, re2::Frag, re2::Frag, re2::Frag*, int) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/compile.cc:948:9
    #4 0x5464b1 in re2::Regexp::Walker<re2::Frag>::WalkInternal(re2::Regexp*, re2::Frag, bool) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/./re2/walker-inl.h:210:13
    #5 0x53eac9 in WalkExponential /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/./re2/walker-inl.h:243:10
    #6 0x53eac9 in re2::Compiler::Compile(re2::Regexp*, bool, long) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/compile.cc:1129:16
    #7 0x4fb512 in re2::RE2::Init(absl::lts_20230802::string_view, re2::RE2::Options const&) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:254:27
    #8 0x4fd5da in re2::RE2::RE2(absl::lts_20230802::string_view, re2::RE2::Options const&) /root/fuzzer-test-suite/re2-2023-09-01/build-latest/BUILD/re2/re2.cc:153:3
    #9 0x4f9ba7 in LLVMFuzzerTestOneInput /root/fuzzer-test-suite/re2-2023-09-01/build-latest/../target.cc:25:7
    #10 0x63df89 in main /root/libfuzzer-workshop/libFuzzer/Fuzzer/afl/afl_driver.cpp:287:7
    #11 0x7fa924ab3c86 in __libc_start_main /build/glibc-CVJwZb/glibc-2.27/csu/../csu/libc-start.c:310
    #12 0x420e69 in _start (/root/fuzzer-test-suite/re2-2023-09-01/build-latest/test-re2+0x420e69)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /usr/local/include/absl/container/internal/raw_hash_set.h in re2::Compiler::CachedRuneByteSuffix(unsigned char, unsigned char, bool, int)
==18149==ABORTING

Thanks for the report! I'm guessing that poc-re2-2023-09-01-1 et al. are the fuzzer input files, so you will also have to share the fuzzer source code in order for me to reproduce these crashes.

We used the fuzzing process for re2-2014-12-09 from the Google fuzzer test suite (https://github.com/google/fuzzer-test-suite/tree/master/re2-2014-12-09) to fuzz re2-2023-09-01. Specifically, we set the FUZZING_ENGINE to afl and modified the build.sh with the following code:

#!/bin/bash
# Copyright 2017 Google Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
. $(dirname $0)/../custom-build.sh $1 $2
. $(dirname $0)/../common.sh

CXXFLAGS="${CXXFLAGS}"

build_lib() {
  rm -rf BUILD
  cp -rf SRC BUILD
  (cd BUILD && make clean &&  make -j $JOBS obj/libre2.a)
}

build_lib
build_fuzzer

if [[ $FUZZING_ENGINE == "hooks" ]]; then
  # Link ASan runtime so we can hook memcmp et al.
  LIB_FUZZING_ENGINE="$LIB_FUZZING_ENGINE -fsanitize=address"
fi
set -x
$CXX $CXXFLAGS ${SCRIPT_DIR}/target.cc -I BUILD/ BUILD/obj/libre2.a -labsl_raw_hash_set -labsl_hash -labsl_city -labsl_low_level_hash -labsl_str_format_internal -labsl_synchronization -labsl_graphcycles_internal -labsl_kernel_timeout_internal -labsl_stacktrace -labsl_symbolize -labsl_debugging_internal -labsl_demangle_internal -labsl_malloc_internal -labsl_time -labsl_strings -labsl_string_view -labsl_base -labsl_spinlock_wait -labsl_int128 -labsl_throw_delegate -labsl_raw_logging_internal -labsl_time_zone -lpthread $LIB_FUZZING_ENGINE -o $EXECUTABLE_NAME_BASE

Thanks for these details. If you aren't building Abseil for fuzzing as well, I believe abseil/abseil-cpp#1524 (comment) applies.