das-labor / panopticon

This is tracking issue for performance issues in panopticon.

Currently, memory usage is spiraling out of control ;) running panop on libc.so.6 (a 1.9 MB binary), panopticon uses 2.1GB to perform a full disassembly.

This obviously will not scale to larger binaries.

The current hotspots are:

(MAJOR) function.rs:451 - 1.2GB of 2.1GB comes from the clone on this line:

panopticon/core/src/function.rs

Line 451 in b7802d0

let bb = BasicBlock::from_vec(bblock.clone());

i've verified that commenting out this line with the clone reduces memory usage (and runtime) by half, though obviously this is not an appropriate solution
500MB data-flow/ssa.rs:172 -

panopticon/data-flow/src/ssa.rs

Line 172 in b7802d0

let mne = Mnemonic::new(
amd64/disassembler.rs:277 - There's a lot of slowdown allocating the RREIL statement vectors during disassembly, accounting for about 20-30% of peak allocations, specifically, right here:

panopticon/amd64/src/disassembler.rs

Line 277 in b7802d0

Register::EDX => "EDX",

I've been using heaptrack via something like:

cargo build && heaptrack ../target/debug/panop -- /lib/libc.so.6  > /dev/null

with the following patch applied (requires nightlyt):

diff --git a/abstract-interp/src/lib.rs b/abstract-interp/src/lib.rs
index df2d502..d1432e5 100644
--- a/abstract-interp/src/lib.rs
+++ b/abstract-interp/src/lib.rs
@@ -27,6 +27,10 @@
 //! Adding a positive and a negative sign yields an abstract value representing both signs (called
 //! join).
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
+
 #[macro_use]
 extern crate log;
 
diff --git a/amd64/src/lib.rs b/amd64/src/lib.rs
index b38d736..7d9f889 100644
--- a/amd64/src/lib.rs
+++ b/amd64/src/lib.rs
@@ -35,6 +35,10 @@
 
 #![allow(missing_docs)]
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
+
 #[macro_use]
 extern crate log;
 
diff --git a/analysis/src/lib.rs b/analysis/src/lib.rs
index 99da66e..8e45a61 100644
--- a/analysis/src/lib.rs
+++ b/analysis/src/lib.rs
@@ -18,6 +18,9 @@
 
 //! Disassembly-Analysis loop.
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 #[macro_use]
 extern crate log;
 
diff --git a/avr/src/lib.rs b/avr/src/lib.rs
index cafd96d..520e6c6 100644
--- a/avr/src/lib.rs
+++ b/avr/src/lib.rs
@@ -22,6 +22,9 @@
 
 #![allow(missing_docs)]
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 #[macro_use]
 extern crate log;
 
diff --git a/cli/src/main.rs b/cli/src/main.rs
index f0a7261..96ba4d6 100644
--- a/cli/src/main.rs
+++ b/cli/src/main.rs
@@ -1,3 +1,6 @@
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 extern crate structopt;
 #[macro_use]
 extern crate structopt_derive;
diff --git a/core/src/lib.rs b/core/src/lib.rs
index 89d1740..ab1bc59 100644
--- a/core/src/lib.rs
+++ b/core/src/lib.rs
@@ -75,6 +75,9 @@
 #![recursion_limit="100"]
 #![warn(missing_docs)]
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 #[macro_use]
 extern crate log;
 
diff --git a/data-flow/src/lib.rs b/data-flow/src/lib.rs
index 52b628f..b24fd50 100644
--- a/data-flow/src/lib.rs
+++ b/data-flow/src/lib.rs
@@ -21,6 +21,9 @@
 //! This module contains algorithms to convert RREIL code into SSA form. Aside from SSA form this
 //! module implements functions to compute liveness sets and basic reverse data flow information.
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 extern crate panopticon_core;
 extern crate panopticon_graph_algos;
 
diff --git a/graph-algos/src/lib.rs b/graph-algos/src/lib.rs
index e73eb2f..ca4d3ab 100644
--- a/graph-algos/src/lib.rs
+++ b/graph-algos/src/lib.rs
@@ -16,6 +16,9 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#![feature(alloc_system)]
+extern crate alloc_system;
+
 mod traits;
 pub mod search;
 pub mod dominator;

Oh dear; I believe I've just found the issue, and it's right here in this equation:

224 * 3000 * 1800 = 1209600000 = 1.2GB

for libc, I modified cli to output some stats:

sizeof Rvalue: 72
sizeof Lvalue: 64
sizeof Mnemonic: 112
sizeof Operation: 160
sizeof Statement: 224
avg size in bytes of function: 1003.6297 = total funs 3208
avg #mnemonics: 592.8117
avg #statements: 1877.2189

Basically, almost all of the memory in panopticon is used by Statement vectors.

Of course an obvious solution to this problem is to reduce the size_of a statement, tho I'm not sure how.

Thanks for the debugging! I guess we have to come up with a way to encode RREIL in a more space saving manner.

Soooo I already did this cause I got bored one fine Sunday. I'll post the work in progress here, or open a new issue so you can see what I wa thinking. But as of now it does reduce the statement to 24 bytes, which is the minimum target it needs to hit imho.

I recommend if/when we do this, we change the Vec to initially a type alias like Statements = Vec

Why? I've reviewed the required refactor and it's going to suck, mostly because vec of statements is everywhere (and also a string table needs to e added everywhere but that's not so bad)

When we do the refactor, if we change the statement vector to our type alias, or even better a struct wrapper, with a custom accessor, we should be free to explore encoding the vector itself.

Why would we want to do that? Well, if we also encode the vector, I'm pretty sure we can achieve some incredible amortized space savings by using leb128 for most values in the RREIL IR.

Basically this observation comes from the fact that many values are indexed which will be 1-200, say for register names, etc. this will be a single byte instead of 8. This will add up quickly, and it's not just for string table indexes, but addition in operands, etc.

If we use the type alias/wrapper we can then add this additional optimization later on, which essentially should be completely transparent to al other panopticon code except for less memory usage

Good idea. As long as Statements allows iteration, removal and insertion it should be easy to replace all uses of Vec.

Good idea, in fact, when we do the refactor we should make statements have that exact API so any changes underneath are completely invisible to clients. For now using Vec it will just use the Vec impl for those. We can then do crazy optimization with variable length integers later on if memory pressure becomes problematic again

Here is the experimental "binary RREIL" encoding I mentioned, which uses 27 bytes per Statement (compared to previous 220):

use scroll::{Pread, Gread, Pwrite, Gwrite};
use {Result, Rvalue, Lvalue, Statement};
use serde::{Serialize,Deserialize};
use std::fmt::{self, Formatter, Debug};
use std::collections::HashMap;

/// 
pub struct StrTable {
    data: HashMap<String, u32>,
    idx: u32,
}

impl StrTable {
    /// 
    pub fn new() -> Self {
        StrTable { data: HashMap::<String, u32>::new(), idx: 1 }
    }

    ///
    pub fn get(&self, key: &str) -> Option<&u32> {
        self.data.get(key)
    }

    ///
    pub fn find(&self, idx: u32) -> Option<&String> {
        for (name, name_idx) in self.data.iter() {
            if *name_idx == idx {
                return Some(name)
            }
        }
        None
    }
    ///
    pub fn push(&mut self, key: &str) {
        self.data.insert(key.to_owned(), self.idx);
        self.idx += 1;
    }
}

/// 
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct RvalueBits {
    bits: [u8; 9]
}

impl fmt::Debug for RvalueBits {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        for byte in self.bits.iter() {
            write!(f, "{:08b}", byte)?;
        }
        Ok(())
    }
}

impl RvalueBits {
    /// bits
    pub fn bits(&self) -> &[u8] {
        &self.bits[..]
    }
    ///
    pub fn into(self, strtable: &StrTable) -> Result<Rvalue> {
        let tag = self.bits[0];
        if tag == 0xff {
            Ok(Rvalue::Undefined)
        } else if tag & 0x1 == 0 { // constant
            let size = ((tag & 0xfe) >> 1) as usize;
            let value: u64 = self.bits.pread(1)?;
            Ok(Rvalue::Constant { value, size })
        } else if tag & 0x1 == 1 { // variable
            let size = ((tag & 0xfe) >> 1) as usize;
            let offset = self.bits[1] as usize;
            let idx = self.bits.pread::<u32>(2)?;
            let subscript = {
                let s = self.bits.pread::<u16>(6)?;
                if s == 0 { None } else { Some(s as usize) }
            };
            let name = strtable.find(idx).unwrap();
            Ok(Rvalue::Variable { name: name.clone().into(), subscript, offset, size })
        } else {
            Err(format!("Bad bits {:?}", self.bits).into())
        }
    }
    /// 
    pub fn from_bytes(bytes: &[u8]) -> Result<Self> {
        if bytes.len() < 9 {
            Err("Rvalue requires 9 byte array".into())
        } else {
            let mut bits = [0; 9];
            bytes.gread_inout(&mut 0, &mut bits[..])?;
            Ok(RvalueBits {
                bits
            })
        }
    }
    ///
    pub fn new(rvalue: Rvalue, strtable: &StrTable) -> Result<Self> {
        // s-vvvv_vvvv
        let bits = match rvalue {
            Rvalue::Undefined => [0xff; 9],
            Rvalue::Constant { value, size } => {
                let mut scratch = [0; 9];
                let size = size as u8;
                let o = &mut 0;
                scratch.gwrite((size << 1) & 0xfe, o)?;
                scratch.gwrite(value, o)?;
                scratch
            },
            Rvalue::Variable { name, subscript, offset, size } => {
                let mut scratch = [0; 9];
                let size = size as u8;
                let offset = offset as u8;
                let subscript = subscript.unwrap_or(0) as u16;
                let idx = *strtable.get(&*name).unwrap_or(&0);
                let o = &mut 0;
                scratch.gwrite((size << 1) | 0x1, o)?;
                scratch.gwrite(offset, o)?;
                scratch.gwrite(idx, o)?;
                scratch.gwrite(subscript, o)?;
                scratch
            },
        };
        Ok(RvalueBits { bits })
    }
}

/// 
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct LvalueBits {
    bits: [u8; 7]
}

impl Debug for LvalueBits {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        for byte in self.bits.iter() {
            write!(f, "{:08b}", byte)?;
        }
        Ok(())
    }
}

impl LvalueBits {
    /// bits
    pub fn bits(&self) -> &[u8] {
        &self.bits[..]
    }
    ///
    pub fn into(self, strtable: &StrTable) -> Result<Lvalue> {
        let tag = self.bits[0];
        if tag == 0xff {
            Ok(Lvalue::Undefined)
        } else if tag & 0x1 == 1 { // variable
            let size = ((tag & 0xfe) >> 1) as usize;
            let idx = self.bits.pread::<u32>(1)?;
            let subscript = {
                let s = self.bits.pread::<u16>(5)?;
                if s == 0 { None } else { Some(s as usize) }
            };
            let name = strtable.find(idx).unwrap();
            Ok(Lvalue::Variable { name: name.clone().into(), subscript, size })
        } else {
            Err(format!("Lvalue bad bits {:?}", self.bits).into())
        }
    }
    /// 
    pub fn from_bytes(bytes: &[u8]) -> Result<Self> {
        if bytes.len() < 7 {
            Err("Lvalue requires 7 byte array".into())
        } else {
            let mut bits = [0; 7];
            bytes.gread_inout(&mut 0, &mut bits[..])?;
            Ok(LvalueBits {
                bits
            })
        }
    }
    ///
    pub fn new(lvalue: Lvalue, strtable: &StrTable) -> Result<Self> {
        let bits = match lvalue {
            Lvalue::Undefined => [0xff; 7],
            Lvalue::Variable { name, subscript, size } => {
                let mut scratch = [0; 7];
                let size = size as u8;
                let subscript = subscript.unwrap_or(0) as u16;
                let idx = *strtable.get(&*name).unwrap_or(&0);
                let o = &mut 0;
                scratch.gwrite((size  << 1) | 0x1, o)?; // always set this to variable tag
                scratch.gwrite(idx, o)?;
                scratch.gwrite(subscript, o)?;
                scratch
            },
        };
        Ok(LvalueBits { bits })
    }
}

///
pub const ADD_OPCODE: u8 = 1;
///
pub const SUBTRACT_OPCODE: u8 = 2;
///
pub const MULTIPLY_OPCODE: u8 = 3;
///
pub const DIVIDEUNSIGNED_OPCODE: u8 = 4;
///
pub const DIVIDESIGNED_OPCODE: u8 = 5;
///
pub const SHIFTLEFT_OPCODE: u8 = 6;
///
pub const SHIFTRIGHTUNSIGNED_OPCODE: u8 = 7;
///
pub const SHIFTRIGHTSIGNED_OPCODE: u8 = 8;
///
pub const MODULO_OPCODE: u8 = 9;
///
pub const AND_OPCODE: u8 = 10;
///
pub const INCLUSIVEOR_OPCODE: u8 = 11;
///
pub const EXCLUSIVEOR_OPCODE: u8 = 12;
///
pub const EQUAL_OPCODE: u8 = 13;
///
pub const LESSOREQUALUNSIGNED_OPCODE: u8 = 14;
///
pub const LESSOREQUALSIGNED_OPCODE: u8 = 15;
///
pub const LESSUNSIGNED_OPCODE: u8 = 16;
///
pub const LESSSIGNED_OPCODE: u8 = 17;
///
pub const ZEROEXTEND_OPCODE: u8 = 18;
///
pub const SIGNEXTEND_OPCODE: u8 = 19;
///
pub const MOVE_OPCODE: u8 = 20;
///
pub const CALL_OPCODE: u8 = 21;
///
pub const SELECT_OPCODE: u8 = 22;
///
pub const LOAD_OPCODE: u8 = 23;
///
pub const STORE_OPCODE: u8 = 24;
///
pub const PHI_OPCODE: u8 = 25;

/// 
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct StatementBits {
    bits: [u8; 1+7+9+9] // opcode lvalue rvalue rvalue
}

impl Debug for StatementBits {
    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
        for byte in self.bits.iter() {
            write!(f, "{:08b}", byte)?;
        }
        Ok(())
    }
}

impl StatementBits {
    /// into
    pub fn into(self, strtable: &StrTable) -> Result<Statement> {
        use Operation::*;
        let opcode: u8 = self.bits.pread(0)?;
        let lvalue_bits = LvalueBits::from_bytes(&self.bits[1..8])?;
        let assignee = lvalue_bits.into(strtable)?;
        let op = match opcode {
            ADD_OPCODE => {
                let r1 = RvalueBits::from_bytes(&self.bits[8..17])?;
                let r2 = RvalueBits::from_bytes(&self.bits[17..26])?;
                Add(r1.into(strtable)?, r2.into(strtable)?)
            },
            SUBTRACT_OPCODE => {
                let r1 = RvalueBits::from_bytes(&self.bits[8..17])?;
                let r2 = RvalueBits::from_bytes(&self.bits[17..26])?;
                Subtract(r1.into(strtable)?, r2.into(strtable)?)
            },
            // MULTIPLY_OPCODE => {
            // },
            // DIVIDEUNSIGNED_OPCODE => {
            // },
            // DIVIDESIGNED_OPCODE => {
            // },
            // SHIFTLEFT_OPCODE => {
            // },
            // SHIFTRIGHTUNSIGNED_OPCODE => {
            // },
            // SHIFTRIGHTSIGNED_OPCODE => {
            // },
            // MODULO_OPCODE => {
            // },
            // AND_OPCODE => {
            // },
            // INCLUSIVEOR_OPCODE => {
            // },
            // EXCLUSIVEOR_OPCODE => {
            // },
            // EQUAL_OPCODE => {
            // },
            // LESSOREQUALUNSIGNED_OPCODE => {
            // },
            // LESSOREQUALSIGNED_OPCODE => {
            // },
            // LESSUNSIGNED_OPCODE => {
            // },
            // LESSSIGNED_OPCODE => {
            // },
            // ZEROEXTEND_OPCODE => {
            // },
            // SIGNEXTEND_OPCODE => {
            // },
            // MOVE_OPCODE => {
            // },
            // CALL_OPCODE => {
            // },
            // SELECT_OPCODE => {
            // },
            // LOAD_OPCODE => {
            // },
            // STORE_OPCODE => {
            // },
            // PHI_OPCODE => {
            // },
            opcode => return Err(format!("invalid opcode {}", opcode).into())
        };
        Ok(Statement { assignee, op })
    }

    /// 
    pub fn new(statement: Statement, strtable: &StrTable) -> Result<Self> {
        use Operation::*;
        let mut bits = [0; 1+7+9+9];
        let lvalue = LvalueBits::new(statement.assignee, strtable)?;
        bits.pwrite(lvalue.bits(), 1)?;
        match statement.op {
            Add(r1, r2) => {
                bits.pwrite(ADD_OPCODE, 0)?;
                bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
                bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
            },
            Subtract(r1, r2) => {
                bits.pwrite(SUBTRACT_OPCODE, 0)?;
                bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
                bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
            },
            Multiply(r1, r2) => {
                bits.pwrite(MULTIPLY_OPCODE, 0)?;
                bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
                bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
            },
            DivideUnsigned(r1, r2) => {
                bits.pwrite(DIVIDEUNSIGNED_OPCODE, 0)?;
                bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
                bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
            },
            _ => unimplemented!()
        };
        Ok(StatementBits { bits })
    }
}

The StrTable is very much a WIP/prototype for what it might look like, not much thought was put into it; we probably want something like shawshank for string interning, but the returned indices must be stable I think; we should also be able to approximate this behavior with the current model.

perf tracking issues