Detecting Code Smells

The aim of this assignment is for you to gain some experience of building a program analysis tool. Your task is to build and analyser to detect Code Smells (also known as "Bad Smells") in Java software systems using JavaParser and employing either static or dynamic analysis (or a combination of both).

This is to be carried out individually and will also contribute 20% to the mark for the class.

More details about code smells can be found in the following links:

There are now dozens of code smells but to keep things simple we are going to restrict our analysis to the following subset, ordered according to difficulty:

Code Smells

Easy (up to 2 marks each)	Medium (up to 5 marks each)	Hard/Impossible (up to 10 marks each)
Long Method	Primitive Obsession	Refused Bequest
Large Class	Data Clumps	Alternative Classes with Different Interfaces
Long Parameter List	Temporary Field	Duplicate Code
Switch Statements	Lazy class	Dead Code
	Data class	Speculative Generality
	Message Chains	Feature Envy
	Large Class	Inappropriate Intimacy
	Long Method	Middle Man

I'm not expecting you to implement solutions for all of these! Far from it. And you can't score more than 20 anyway.

If you are just looking for a pass in the assignment then correctly implementing the first four easy ones will achieve that. If you are looking for more marks then add in one or two of the more challenging ones. Some of the harder ones in particular are very tricky, so don't underestimate the amount of work they can take. I would suggest starting with some of the easier ones and then build things up from there.

A few comments on the code smells above:

Easy- I've classified this group as easy since they main involve counting things which can be easily identified and don't involve much coding at all. They are not all as straightforward as they might first appear. For instance, how do you determine the length of a method (or class)? Number of lines of code is one option which is very easy but also not very accurate. Number of statements is more accurate but harder to calculate (which is why it also appears in the medium category). Also, at what value do you decide when a method is too long or a class too large? This is something which is usually defined within company standards, so to keep things simple for this exercise anything more than 10 statements is a long method, more than 5 parameters is a long parameter list, and a large class is one with more than 100 statements. For the switch statement the parameter must be a user-defined type (note this is much harder than I first imagined so would earn a few more marks).
Medium- This group start to get a little trickier to spot and involve a bit more analysis. You will notice that I've also included Large Class in this group. This is a different interpretation of the problem, where the class is doing too much or has too many responsibilities, and is not just about the length of the class. The challenge here is thinking how you might identify what a class's responsibilities are. The opposite of this is the lazy class.
Hard/Impossible- These either involve more analysis, typically between classes (e.g. Middle Man, Feature Envy, Inappropriate Intimacy or Refused Bequest), are hard to specify and identify (Alternative Classes with Different Interfaces, and Speculative Generality), or are well-know hard problems (Duplicate Code and Dead Code). At first sight these last two may not sound too tough, but duplicate code may involve one or more statements, and these may not be syntactically identical (differing by simple aspects such as changes in variable names, to more complex ones such as the same functionality coded in a total different way). Similarly, Dead Code could be a method that is never called (relatively easy to detect), or could be a section of code that is controlled by a condition that can never evaluate to true (much harder to spot).

The marks awarded will reflect the completeness and accuracy of the solution.

Submission

Your submission needs to include the code (zipped up) along with a short report (I'm only after a few pages) by which includes:

An overview of the chosen problem(s).
An outline of your solution and a high-level overview of your design (a class diagram is fine - just an indication of who you have structured your solution)
Details of any important/interesting parts of your implementation.
Results and evaluation - what it does and how well it does it. You should run your solution over the entire test system provided and summarise the outputs in terms of: a) which smells were accurately detected, b) which were missed, and c) what false positives (incorrectly flagged up by your system as smells) were identified. It is likely that will be the majority of the report.
A statement of the score (out of 20) that you think the work deserves along with a short justification for this (a couple of sentences).

You will also be required to demonstrate your system in week 6/7.

Additional Information on the marking scheme
There are three basic components to how your submission will be assessed: what it does, how well it does it, and how you went about doing it.

What it does: Some tasks are more complicated than others (see the table above)
How well it does it: This ranges from brilliantly - produces the correct results for all the time, to adequate - works reasonably well for a limited number of cases, to... well let's not aim for that.
How you went about doing it: This considers what facilities of the JavaParser framework you employed and also also the quality of your design.

Test systems

test.Bloaters

Long Method - BarnsleyFern (createFern)
Long Method - floodFill
Primitive Obsession / Data clumps - plot, drawLine and paintComponent in BresenhamPanel
Long Parameter List - ManOrBoy - A
Large Class - see Grid in Switch

test.Abusers

Temporary field - BarnsleyFernTwo
Switch - MorpionSolitairePanel switches on an enum Also Grid is Large class and contains long methods, data classes and long parameter lists!
Refused bequest - two cases in ChqAcc and SavingsAcc ACwDI - Alternative classes with different interfaces. There is clearly a lot of commonality between Underling and Manager that should be factored out into an inheritance hierarchy. These two are also good candidates for data classes.

test.Dispensibles

Lazy or Data Classes - Point and Triple in Cipolla and Message Chains in Cipolla
Lazy or Data Class - Node in Eertree (also List is a long method)
There is also duplicate code (at least) between the two BarnsleyFern implementations
Dead Code in Luhn (2 cases)
Duplicate and dead code in Test
Speculative Generality - SeasonalStockItem and ValuableStockItem are examples of this in this package

test.Couplers

Message chains - Munchausen
Message chains - NBodySim
Feature Envy - FeatureEnvy Customer and Phone, and Item and Basket And Item and Phone are also Data Classes
Inappropriate Intimacy - Huffman code accesses the files of HuffmanLeaf, HuffmanNode and HuffmanTree (quite a weak example)
Middle Man - AccountManager (plus Account is a lazy class)

test.FalsePositives

should be nothing to see there
switch in box the compass

in type MorpionSolitairePanel->>

METHOD TOO LONG at mousePressed => it has 19 which is more than 10!

CONSTRUCTOR TOO LONG at MorpionSolitairePanel => it has 26 which is more than 10!

METHOD TOO LONG at run => it has 15 which is more than 10!

METHOD TOO LONG at start => it has 16 which is more than 10!

METHOD TOO LONG at paintComponent => it has 26 which is more than 10!

SWITCH ON ENUM test.Abusers.Switch.MorpionSolitairePanel.State at mousePressed

MIDDLEMAN at method start

in type ManOrBoy->>

LONG PARAMETER LIST at A => it has 6 which is more than 5!

in type Item->>

PRIMITIVE OBSESSION at CLASS Item --> 4 primitives out of 4 fields => that is 100.00 % primitives

in type Eertree->>

METHOD TOO LONG at eertree => it has 28 which is more than 10!

PRIMITIVE OBSESSION at METHOD eertree --> 8 primitives out of 9 variables => that is 88.89 % primitives

in type BoxingTheCompass->>

METHOD TOO LONG at main => it has 13 which is more than 10!

PRIMITIVE OBSESSION at METHOD buildPoints --> 7 primitives out of 7 variables => that is 100.00 % primitives

in type FloodFill->>

METHOD TOO LONG at floodFill => it has 26 which is more than 10!

PRIMITIVE OBSESSION at METHOD floodFill --> 8 primitives out of 13 variables => that is 61.54 % primitives

in type Grid->>

LONG PARAMETER LIST at checkLine => it has 6 which is more than 5!

CLASS TOO LONG at Grid => it has 140 which is more than 100!

METHOD TOO LONG at newGame => it has 12 which is more than 10!

METHOD TOO LONG at draw => it has 36 which is more than 10!

METHOD TOO LONG at playerMove => it has 34 which is more than 10!

METHOD TOO LONG at checkLine => it has 12 which is more than 10!

METHOD TOO LONG at addLine => it has 11 which is more than 10!

PRIMITIVE OBSESSION at METHOD newGame --> 4 primitives out of 4 variables => that is 100.00 % primitives

PRIMITIVE OBSESSION at METHOD draw --> 14 primitives out of 16 variables => that is 87.50 % primitives

PRIMITIVE OBSESSION at METHOD playerMove --> 6 primitives out of 13 variables => that is 46.15 % primitives

PRIMITIVE OBSESSION at METHOD checkLine --> 11 primitives out of 12 variables => that is 91.67 % primitives

PRIMITIVE OBSESSION at CLASS Grid --> 26 primitives out of 29 fields => that is 89.66 % primitives

in type Luhn->>

METHOD TOO LONG at luhnTest => it has 12 which is more than 10!

PRIMITIVE OBSESSION at METHOD luhnTest --> 6 primitives out of 6 variables => that is 100.00 % primitives

in type AccountManager->>

MIDDLEMAN at method GetAccount

in type NBodySim->>

CONSTRUCTOR TOO LONG at NBody => it has 23 which is more than 10!

PRIMITIVE OBSESSION at METHOD decompose --> 5 primitives out of 5 variables => that is 100.00 % primitives

PRIMITIVE OBSESSION at CLASS NBody --> 4 primitives out of 7 fields => that is 57.14 % primitives

in type CipollasAlgorithm->>

METHOD TOO LONG at c => it has 29 which is more than 10!

in type BresenhamPanel->>

METHOD TOO LONG at paintComponent => it has 15 which is more than 10!

METHOD TOO LONG at drawLine => it has 28 which is more than 10!

PRIMITIVE OBSESSION at METHOD paintComponent --> 12 primitives out of 13 variables => that is 92.31 % primitives

PRIMITIVE OBSESSION at METHOD plot --> 10 primitives out of 11 variables => that is 90.91 % primitives

PRIMITIVE OBSESSION at METHOD drawLine --> 13 primitives out of 14 variables => that is 92.86 % primitives

in type HuffmanCode->>

METHOD TOO LONG at printCodes => it has 12 which is more than 10!

in type BarnsleyFern->>

METHOD TOO LONG at createFern => it has 19 which is more than 10!

PRIMITIVE OBSESSION at METHOD createFern --> 8 primitives out of 8 variables => that is 100.00 % primitives

in type Client->>

MIDDLEMAN at method something

MIDDLEMAN at method somethingElse

in type BarnsleyFernTwo->>

METHOD TOO LONG at createFernWithTemp => it has 18 which is more than 10!

PRIMITIVE OBSESSION at METHOD createFernWithTemp --> 6 primitives out of 6 variables => that is 100.00 % primitives

in type Test->>

PRIMITIVE OBSESSION at METHOD meanStdDev --> 4 primitives out of 5 variables => that is 80.00 % primitives

PRIMITIVE OBSESSION at METHOD showHistogram01 --> 5 primitives out of 6 variables => that is 83.33 % primitives

PRIMITIVE OBSESSION at CLASS Test --> 4 primitives out of 4 fields => that is 100.00 % primitives

ioanluca / codesmells