pmd / pmd

An extensible multilanguage static code analyzer.

Home Page:https://pmd.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[core] CPD: `--skip-duplicate-files` has no effect (7.0.0 regression)

C-Otto opened this issue · comments

Affects PMD Version:

7.0.0

Description:

CPD reports duplication for identical files even though --skip-duplicate-files is enabled.
This also happens via Gradle (CPDConfiguration.setSkipDuplicates(true)).

Code Sample demonstrating the issue:

/*
 * Copyright 2019 Andreas Schmid
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package de.aaschmid.test;

import static org.junit.Assert.assertTrue;

public class Test {

    public testCpd() {
        assertTrue(true);
    }
}

Steps to reproduce:

$ cat a/Test.java
(as above)
$ cat b/Test.java
(as above)
$ diff a/Test.java b/Test.java
(no output => no difference)
$ pmd-bin-7.0.0/bin/pmd cpd --skip-duplicate-files --minimum-tokens=10 a/ b/
Found a 6 line (15 tokens) duplication in the following files: 
Starting at line 20 of /tmp/pmd/a/Test.java
Starting at line 20 of /tmp/pmd/b/Test.java

public class Test {

    public testCpd() {
        assertTrue(true);
    }
}

This is not an issue with 6.55.0:

$ ./pmd-bin-6.55.0/bin/run.sh cpd --minimum-tokens 10 --skip-duplicate-files --dir a/ --dir b/
Skipping /tmp/pmd/b/Test.java since it appears to be a duplicate file and --skip-duplicate-files is set

Running PMD through:

CLI, Gradle

I can confirm, that this is broken now. The flag is set on CPDConfiguration but never used.

In PMD 6, the implementation is here:

if (configuration.isSkipDuplicates()) {
// TODO refactor this thing into a separate class
String signature = file.getName() + '_' + file.length();
if (current.contains(signature)) {
System.err.println("Skipping " + file.getAbsolutePath()
+ " since it appears to be a duplicate file and --skip-duplicate-files is set");
return;
}
current.add(signature);
}

Note, that PMD was actually only comparing the simple file name and the file size, but not the content of the files.