google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pkg/cover: symbolization

tarasmadan opened this issue · comments

Is your feature request related to a problem? Please describe.
First /cover request with lazy symbolization - 19s.
Time to get updated numbers (after 5 seconds) - 17s.
RAM consumption 40G.

Describe the solution you'd like
Full symbolization costs 50 seconds and is comparable with syzkaller startup time (with QEMU).
Symbolizing all callbacks before first /cover call we can reduce its generation time to 3 seconds and memory consumption to 0G.

There are 2 potential solutions:

  1. Symbolize everything in background on syzkaller start.
  2. Symbolize all callbacks after/during the kernel build process and use it as a build artefact. GZIPped data will cost ~30M.

Second approach looks better but will cost more.

@dvyukov proposed third option. Let's remove addr2line dependency and parse DWARF data.
His prototype:

package main

import (
	"debug/dwarf"
	"debug/elf"
	"fmt"
	"io"
	"os"
	"bufio"
	"time"
	"strconv"
)

func main() {
	start := time.Now()
	pcs := make(map[uint64]struct{})
	for s := bufio.NewScanner(os.Stdin); s.Scan(); {
		n, err := strconv.ParseUint(s.Text(), 16, 64)
		if err != nil {
			panic(err)
		}
		pcs[n] = struct{}{}
	}
	fmt.Printf("read %v pcs in %v\n", len(pcs), time.Since(start))

	f, err := elf.Open(os.Args[1])
	if err != nil {
		panic(err)
	}
	data, err := f.DWARF()
	if err != nil {
		panic(err)
	}
	matched, total := 0, 0
	for r := data.Reader(); ; {
		ent, err := r.Next()
		if err != nil {
			panic(err)
		}
		if ent == nil {
			break
		}
		if ent.Tag != dwarf.TagCompileUnit {
			panic(fmt.Errorf("found unexpected tag %v on top level", ent.Tag))
		}
		lr, err := data.LineReader(ent)
		if err != nil {
			panic(err)
		}
		var entry dwarf.LineEntry
		for {
			if err := lr.Next(&entry); err != nil {
				if err == io.EOF {
					break
				}
				panic(err)
			}
			total++
			if _, ok := pcs[entry.Address]; !ok {
				continue
			}
			matched++
			//fmt.Printf("pc %x %v:%v:%v\n", entry.Address, entry.File.Name, entry.Line, entry.Column)
		}
		r.SkipChildren()
	}
	fmt.Printf("total %v, matched %v\n", total, matched)
}

His prototype:

It turns out to be not that easy. LineReader has info about inlined frames, but only file:line, not the function name. And we need inline function names in both pkg/report and pkg/cover.
Inlined function names has something to do with TagInlinedSubroutine, but I have not figure out how exactly these tags should be processed. llvm-addr2line code can be used as a reference source.

File:line to function name looks doable having the source code itself.
Any chances to get the StartLine:StartPos - EndLine:EndPos?