Decompiling class files from Java bytecode source

Question

Decompiling class files from Java bytecode source

ambergorzynski opened this issue a year ago · comments

ambergorzynski commented a year ago

CFR version

CFR 0.152

Compiler

OpenJDK 17.0.8.1 23-08-24 running on x86_64 Ubuntu 22.04

Description

Hello, I am running some tests in Java bytecode (using Jasmin) and attempting to decompile the resulting class files using CFR. I have run into a problem: CFR is not able to decompile class files that include some configurations of nested lookupswitch instructions. The class files can be run without issue using OpenJDK 17.0.8.1 .

I understand that this is not the typical intended usage for CFR, but it would be interesting to know whether the decompilation failure is due to the bytecode being constructed in a way that is out-of-scope for CFR, or something else.

Thanks!

Example

A simple example of a bytecode program whose class file cannot be decompiled is below, with the class file and CFR output attached as a zip file here

.class public TestCase
.super java/lang/Object

; default constructor
.method public <init>()V
    aload_0
    invokespecial java/lang/Object/<init>()V
    return
.end method

.method public static main([Ljava/lang/String;)V
    .limit stack 2

block_0:

    bipush 1

    lookupswitch
        0: block_1
        1: block_2
        default : block_1

block_1: 

    bipush 1

    lookupswitch
        0: block_3
        1: block_4
        default : block_3

block_2: 
    return

block_3: 
    return
 
block_4: 
    return
       
.end method

Lee Benfield · Answer 1 · Tue Sep 05 2023 14:23:42 GMT+0800 (China Standard Time)

Hey,

It's not really out-of-scope, in as much as I never set a scope ;) In general raw switch statements allow impossible (in java) control flow to be created quite easily, so while I attempt to do my best, there are places where I don't get it right :(

In this case there are two things going on - an accounting failure caused by an unexpected rewrite

    private static void moveJumpsToTerminalIfEmpty(Op03SimpleStatement switchStatement, List<Op03SimpleStatement> statements) {
++        if (!(switchStatement.getStatement() instanceof SwitchStatement)) return;
        SwitchStatement swatch = (SwitchStatement) switchStatement.getStatement();

(haven't run that through regression tests yet though)

The big one is that the branches of the second switch aren't falling inside the body of the first (which javac emits); this is usually fairly easily fixable with the topsort stage (and in this case when done by eye, it's pretty obvious!), but there are some heuristics to try to avoid mashing multiple switches together which have probably gone wrong here.

Will see if I can improve this case, but in general you can always produce impossible code by jumping into the middle of a nested switch (psuedo jasmin) eg

switch (a)
  case 0 :
     A
     if (x) goto label1
     B
     break
 case 1:
  switch (b)
    case 1: 
     C
     label1:
     D
    case 2:
     E
  }
}

(in that case if D + E aren't too big I might try to duplicate them, but you see my point)

WRT this case (hah) If you fix the first bug, you can see that it's not pulled the branches into the correct place this time, will see how fixable that is, but it'll never be perfect from jasmin ;) (or obfuscated control flow!)

/*
 * Decompiled with CFR 0.153-SNAPSHOT (d6f6758).
 */
public class TestCase {
    /*
     * Unable to fully structure code
     * Enabled aggressive block sorting
     */
    public static void main(String[] var0) {
        switch (1) {
            default: {
                ** GOTO lbl6
                ** case 1:
lbl5:
                // 1 sources

                return;
lbl6:
                // 1 sources

                switch (1) {
                    default: {
                        return;
                    }
                    case 1: 
                }
                return;
            }
        }
    }
}

ambergorzynski · Answer 2 · Thu Sep 07 2023 18:58:39 GMT+0800 (China Standard Time)

Yes I suspected that the differences in control flow restrictions between bytecode/Java was somehow responsible for the issue, but I wasn't sure why this would affect the simple example there. Thanks for looking into this!