gdshaw / codemancer

Publish-only mirror for Codemancer, a retargettable disassembler for reverse engineering

Home Page:http://www.codemancer.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

# This file is part of Codemancer.
# Copyright 2014-2015 Graham Shaw.
# Distribution and modification are permitted within the terms of the
# GNU General Public License (version 3 or any later version).

-----------------------------------------------------------------------
Codemancer

Version 0.0.0-rc0 (2015-01-02)
-----------------------------------------------------------------------

Introduction

  Codemancer is ultimately intended to be an Open Source interactive
  disassembler for reverse engineering of executable code. Planned
  features include:

  - support for multiple instruction set architectures by means of an
    XML-based description language;
  - a database for recording information about the behaviour and purpose
    of code which has been disassembled;
  - an inference engine for augmenting the content of the database; and
  - an interactive front end for the user to supply information which
    cannot be deduced automatically.

  Since these goals will likely take several years to realise, features
  will be released incrementally as they are developed.

Current status

  Disassembler

  - plain disassembly: working
  - semantic modelling: working, but untested and accessible only via API
  - control flow analysis: incomplete, experimental
  - data flow analysis: incomplete, experimental
  - interactive user interface: not started

  Architectures

  - 6500: complete, including 65C00, R65C00 and W65C00S
  - 6800: complete
  - 6805: complete
  - ARM: complete to armv4 but not beyond
  - PIC (14-bit core): complete
  - x86: incomplete
  - Z80: complete

  Object file formats:

  - ELF: working, but no relocations
  - COFF: working, but no relocations
  - AOF: working, but no relocations
  - more to follow
  - integration with disassembler: not started

Building Codemancer

  The source code can be checked out from the public repository using git:

    git clone git://codemancer.org/codemancer.git

  The result will be placed in a newly-created directory named 'codemancer'.
  Subsequent changes can be fetched using the git pull command from within
  that directory:

    git pull

  To build the code you will need the following:

  - Java (tested with OpenJDK 6)
  - Ant (tested with version 1.7.1)

  Executing the 'ant' command with no arguments will build the default
  target, which is the Codemancer jar file:

    ant

  This is written to the pathname build/jar/codemancer.jar. You can
  additionally run the automated tests by building the 'test' target:

    ant test

Running Codemancer

  To use the jar file it must be listed on the CLASSPATH:

    export CLASSPATH=build/jar/codemancer.jar

  The class org.codemancer.Disassemble is provided to demonstrate the
  ability to disassemble a raw binary file. It can be invoked from the
  command line as follows:

    java org.codemancer.Disassemble <architecture> <pathname>

  for example:

    java org.codemancer.Disassemble 6502 image.bin

  The resulting assembly language listing is written to stdout.

Interface stability

  Neither the API nor the markup language should be considered stable
  at present. It is intended that both should become stable, but the
  current rate of change is such that this would seriously hamper
  progress if attempted now.

Limitations

  Only a small part of the planned funcionality for Codemancer has
  been implemented thus far. Most notably it is only usable as a
  plain disassembler at present, and then only for a small number
  of architectures. Other functionality, to the extent that it has
  been implemented, should be considered experimental.

  Other limitations include the following:

  - Status flags are not properly modelled.
  - There is no easy way to handle registers which are part of larger
    registers (for example H, L and HL on the Z80). Current practice
    is to treat them as separate registers, which will not accurately
    reflect the behaviour of the program.
  - The <memory> element is limited to a single linear address space.
  - Repetition (for example, the Z80 LDIR instruction) is not
    modelled.
  - Architectures where the instruction width is not a multiple of
    8 bits (for example, the PIC 12- and 14-bit architectures) could
    be handled more elegantly.

Control and data flow analysis (experimental)

  Currently the analysis code is able to:

  - identify branches and subroutine calls to fixed addresses;
  - identify basic blocks, extended basic blocks and subroutines.
  - track the content of registers within basic blocks;
  - perform a partial transformation into static single-assignment
    form.

  However, it does not (yet):

  - analyse jump tables;
  - make deductions from knowledge of the calling convention
    (which has not been modelled yet);
  - track the content of stack-resident local variables;
  - track the content of global variables;
  - track the flow of data between basic blocks;
  - attempt to deduce the type or meaning of data values;
  - allow the database to be augmented by the user.

  In principle the algorithms implemented so far are applicable to
  any architecture, however they have only been tested on ARM code
  in an ELF object file. The algorithms will need to be made
  idempotent, however that has not been arranged yet.

  If you would like to experiment with the analysis code then the
  following must be accessible via the CLASSPATH:

  - Apache Derby (tested with 10.3.2.1);
  - Hibernate (tested with 3.6.9).

  From Derby you need both derby.jar and derbyclient.jar. From
  Hibernate you need hibernate3.jar, plus the dependencies found
  in the lib/required and lib/jpa subdirectories of the Hibernate
  distribution.

  When a binary is analysed, the results are recorded in a database.
  You can do this using the command:

  java org.codemancer.test.Analyse <projname> <architecture> <pathname>

  where <projname> is a project name of your choosing (used as a
  subdirectory name in which to store the database), <architecture>
  is the relevant machine architecture, and <pathname> is the binary
  to be analysed.

  Be aware that this currently quite a slow process: expect throughput
  of a few kilobytes per minute on mid-range hardware, so anything
  larger than a megabyte is likely to take many hours.

  When the analysis is complete, the content of the database can be
  listed using the command:

  java org.codemancer.test.ListDb <projname>

  The result is not expected to be useful yet, however it shows that
  the disassembler is able to partially understand the behaviour of
  the executable, and track values as they are passed from register
  to register. Future versions will attempt to take this analysis
  further, provide an interactive user interface, and improve
  throughput if it is feasible to do so.

About

Publish-only mirror for Codemancer, a retargettable disassembler for reverse engineering

http://www.codemancer.org/

License:GNU General Public License v3.0


Languages

Language:Java 97.6%Language:JavaScript 1.9%Language:CSS 0.3%Language:HTML 0.1%Language:Assembly 0.0%Language:C 0.0%