[core] CPD: `--skip-duplicate-files` has no effect (7.0.0 regression)
C-Otto opened this issue · comments
Affects PMD Version:
7.0.0
Description:
CPD reports duplication for identical files even though --skip-duplicate-files
is enabled.
This also happens via Gradle (CPDConfiguration.setSkipDuplicates(true)
).
Code Sample demonstrating the issue:
/*
* Copyright 2019 Andreas Schmid
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package de.aaschmid.test;
import static org.junit.Assert.assertTrue;
public class Test {
public testCpd() {
assertTrue(true);
}
}
Steps to reproduce:
$ cat a/Test.java
(as above)
$ cat b/Test.java
(as above)
$ diff a/Test.java b/Test.java
(no output => no difference)
$ pmd-bin-7.0.0/bin/pmd cpd --skip-duplicate-files --minimum-tokens=10 a/ b/
Found a 6 line (15 tokens) duplication in the following files:
Starting at line 20 of /tmp/pmd/a/Test.java
Starting at line 20 of /tmp/pmd/b/Test.java
public class Test {
public testCpd() {
assertTrue(true);
}
}
This is not an issue with 6.55.0:
$ ./pmd-bin-6.55.0/bin/run.sh cpd --minimum-tokens 10 --skip-duplicate-files --dir a/ --dir b/
Skipping /tmp/pmd/b/Test.java since it appears to be a duplicate file and --skip-duplicate-files is set
Running PMD through:
CLI, Gradle
I can confirm, that this is broken now. The flag is set on CPDConfiguration
but never used.
In PMD 6, the implementation is here:
pmd/pmd-compat6/src/main/java/net/sourceforge/pmd/cpd/CPD.java
Lines 57 to 66 in d4b99bb
Note, that PMD was actually only comparing the simple file name and the file size, but not the content of the files.