Include metadata in the parse result for renamed columns
jchen042 opened this issue · comments
Great project!
@pokoli - thanks for the update w.r.t. #982 #129 #956 . Will the lib consider adding the configs for the duplicated header? i.e. enable/disable the automatic renaming while keeping the capacity of reading the right column value, or including the renaming metadata to the ParseResult
so the end developer will have more options to handle this scenario?
My proposal will be including the metadata to each column:
With the following CSV data:
c;c;c;c_1
1;2;3;4
The ParseResult.data
will be like:
[{
"c": {
"originalName": "c",
"value": "1"
},
"c_1": {
"originalName": "c"
"value": "2"
},
"c_2": {
"originalName": "c",
"value": 3
},
"c_3": {
"originalName": "c",
"value": 4
}
}]
Alternatively, the column renaming metadata can be included in ParseResutl.meta
, like:
"columnNameMapping": {
"c": "c",
"c_1": "c",
"c_2": "c",
"c_3": "c",
}
If this is a good idea, I'm happy to create a PR to handle it.
HI @FallingCeilingS,
Thanks for your proposal. I think the best will be to include the renamed columns in metadata, so the original values can be restored back by just reading them.
It will be great if you can create a MR for it.
I will expect that the tests cases are extendend to test that the proper metadata is generate but also that the documentation is extended to explain the new available metadata.
I think the property should be named renamedHeaders
so it can be accessed with ParseResult.meta.renamedHeaders
Thanks for the reply @pokoli - I'll create a PR once I have free time.
Thank you so much for taking the effort!
@pokoli - the PR is ready for review.
CC: @mholt .
@FallingCeilingS I will release a new version once I have soem time.
I close the issue for now as there is nothing to do now.
Just for reference, this was solved with #990
This exactly solves the issue I was trying to solve, thank you! (I originally landed on #129). And thanks to all who have contributed fixes/updates to related functionality to duplicate headers.
I see a new tag is pending. Meanwhile, here is the solution hack I'm using to detect duplicates until this feature is available:
function completeFn(results: Papa.ParseResult<any>): void {
const uniqueFieldNames = new Set();
const duplicateFieldNames: string[] = [];
for (const fieldName of results.meta.fields) {
if (fieldName.slice(-2) === '_1') {
const originalFieldName = fieldName.slice(0, -2);
const isDuplicate = uniqueFieldNames.has(originalFieldName);
if (isDuplicate) {
duplicateFieldNames.push(originalFieldName);
} else {
uniqueFieldNames.add(originalFieldName);
}
} else {
uniqueFieldNames.add(fieldName);
}
}
// At this point, uniqueFieldNames contains all field names that are not duplicates, and duplicateFieldNames contains duplicates.
}
const papaParseConfig: Papa.ParseLocalConfig = {
header: true,
complete: completeFn,
};
This only checks if there is 1 duplicate, and doesn't work correctly if someone had a field with _1
in the suffix. In my use case, those are acceptable risks.
When a tag is made I will upgrade, test, and post back confirmation PR #990 works.