Decompiling class files from Java bytecode source
ambergorzynski opened this issue · comments
CFR version
CFR 0.152
Compiler
OpenJDK 17.0.8.1 23-08-24 running on x86_64 Ubuntu 22.04
Description
Hello, I am running some tests in Java bytecode (using Jasmin) and attempting to decompile the resulting class files using CFR. I have run into a problem: CFR is not able to decompile class files that include some configurations of nested lookupswitch instructions. The class files can be run without issue using OpenJDK 17.0.8.1 .
I understand that this is not the typical intended usage for CFR, but it would be interesting to know whether the decompilation failure is due to the bytecode being constructed in a way that is out-of-scope for CFR, or something else.
Thanks!
Example
A simple example of a bytecode program whose class file cannot be decompiled is below, with the class file and CFR output attached as a zip file here
.class public TestCase
.super java/lang/Object
; default constructor
.method public <init>()V
aload_0
invokespecial java/lang/Object/<init>()V
return
.end method
.method public static main([Ljava/lang/String;)V
.limit stack 2
block_0:
bipush 1
lookupswitch
0: block_1
1: block_2
default : block_1
block_1:
bipush 1
lookupswitch
0: block_3
1: block_4
default : block_3
block_2:
return
block_3:
return
block_4:
return
.end method
Hey,
It's not really out-of-scope, in as much as I never set a scope ;) In general raw switch statements allow impossible (in java) control flow to be created quite easily, so while I attempt to do my best, there are places where I don't get it right :(
In this case there are two things going on - an accounting failure caused by an unexpected rewrite
private static void moveJumpsToTerminalIfEmpty(Op03SimpleStatement switchStatement, List<Op03SimpleStatement> statements) {
++ if (!(switchStatement.getStatement() instanceof SwitchStatement)) return;
SwitchStatement swatch = (SwitchStatement) switchStatement.getStatement();
(haven't run that through regression tests yet though)
The big one is that the branches of the second switch aren't falling inside the body of the first (which javac emits); this is usually fairly easily fixable with the topsort stage (and in this case when done by eye, it's pretty obvious!), but there are some heuristics to try to avoid mashing multiple switches together which have probably gone wrong here.
Will see if I can improve this case, but in general you can always produce impossible code by jumping into the middle of a nested switch (psuedo jasmin) eg
switch (a)
case 0 :
A
if (x) goto label1
B
break
case 1:
switch (b)
case 1:
C
label1:
D
case 2:
E
}
}
(in that case if D + E aren't too big I might try to duplicate them, but you see my point)
WRT this case (hah) If you fix the first bug, you can see that it's not pulled the branches into the correct place this time, will see how fixable that is, but it'll never be perfect from jasmin ;) (or obfuscated control flow!)
/*
* Decompiled with CFR 0.153-SNAPSHOT (d6f6758).
*/
public class TestCase {
/*
* Unable to fully structure code
* Enabled aggressive block sorting
*/
public static void main(String[] var0) {
switch (1) {
default: {
** GOTO lbl6
** case 1:
lbl5:
// 1 sources
return;
lbl6:
// 1 sources
switch (1) {
default: {
return;
}
case 1:
}
return;
}
}
}
}
Yes I suspected that the differences in control flow restrictions between bytecode/Java was somehow responsible for the issue, but I wasn't sure why this would affect the simple example there. Thanks for looking into this!