Add diff report build step for years' forms implementations
zakpatterson opened this issue · comments
We should know the impact of a PR in terms of the line difference between the forms' years directories.
The way we are modeling the IRS pdfs, rather than relying on subclassing or some other abstraction, we allow complete copy-paste code duplication between the form implementations of each form for each tax year. This means a form such as Schedule D which has no significant changes from 2020 to 2021 should have no code diff either, whereas a form such as Schedule 2 which had significant changes would have a significant diff.
This means the minimized diff between src/forms/{Y2020, Y2021}/irsForms
should give us the actual changes put out by the IRS for each form
So we should be interested in the size of the diff between these directories.
Examples
diff -r ./src/forms/{Y2020,Y2021}/irsForms
or
ls ./src/forms/Y2020/irsForms/*.ts | xargs -I@ basename "@" | xargs -I@@ diff --color --suppress-common-lines ./src/forms/{Y2020,Y2021}/irsForms/@@'
Schedule D
Reveals that the 2020 implementation is ready for form 8949 reporting with boxes A and B checked, but 2021 is not. this diff should be reduced to zero.
➜ ustaxes git:(feature-transaction-list-importing) diff -r ./src/forms/{Y2020,Y2021}/irsForms/ScheduleD.ts
53,89c53,64
< l1f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxA())
<
< l1bd = (): number =>
< sumFields(this.l1f8949s().map((f) => f.shortTermTotalProceeds()))
< l1be = (): number =>
< sumFields(this.l1f8949s().map((f) => f.shortTermTotalCost()))
<
< l1bg = (): number =>
< sumFields(this.l1f8949s().map((f) => f.shortTermTotalAdjustments()))
< l1bh = (): number =>
< sumFields(this.l1f8949s().map((f) => f.shortTermTotalGain()))
<
< l2f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxB())
< l2d = (): number =>
< sumFields(this.l2f8949s().map((f) => f.shortTermTotalProceeds()))
<
< l2e = (): number =>
< sumFields(this.l2f8949s().map((f) => f.shortTermTotalCost()))
<
< l2g = (): number =>
< sumFields(this.l2f8949s().map((f) => f.shortTermTotalAdjustments()))
<
< l2h = (): number =>
< sumFields(this.l2f8949s().map((f) => f.shortTermTotalGain()))
<
< l3f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxC())
< l3d = (): number =>
< sumFields(this.l3f8949s().map((f) => f.shortTermTotalProceeds()))
<
< l3e = (): number =>
< sumFields(this.l3f8949s().map((f) => f.shortTermTotalCost()))
<
< l3g = (): number =>
< sumFields(this.l3f8949s().map((f) => f.shortTermTotalAdjustments()))
<
< l3h = (): number =>
< sumFields(this.l3f8949s().map((f) => f.shortTermTotalGain()))
---
> l1bd = (): number | undefined => undefined
> l1be = (): number | undefined => undefined
> l1bg = (): number | undefined => undefined
> l1bh = (): number | undefined => undefined
> l2d = (): number | undefined => undefined
> l2e = (): number | undefined => undefined
> l2g = (): number | undefined => undefined
> l2h = (): number | undefined => undefined
> l3d = (): number | undefined => undefined
> l3e = (): number | undefined => undefined
> l3g = (): number | undefined => undefined
> l3h = (): number | undefined => undefined
data/federal.ts
Reveals that capital gains tax brackets have two datapoints for 2020 but three datapoints for 2021. This is probably an error.
< brackets: [40000, 441450]
---
> brackets: [40400, 164925, 441450]
117c117
< brackets: [80000, 496600]
---
> brackets: [80800, 329850, 496600]
120c120
< brackets: [80000, 496600]
---
> brackets: [80800, 329850, 496600]
123c123
< brackets: [40000, 248300]
---
> brackets: [40400, 164925, 250800]
126c126
< brackets: [53600, 469050]
---
> brackets: [54100, 164900, 473750]
Form 1040
If schedule D is not-required checkbox is implemented differently for no reason between 2020 and 2021.
< this.scheduleD === undefined,
---
> this.l7Box(),
The final report could be something like:
2020 ---> 2021
F1040.ts 321 (--> 321) +0
F1040v.ts 14 (--> 16) + 2
F2441.ts 1000 (new file) (--> 200) -800
F2555.ts 142 (---> 142) + 0
ScheduleD.ts 0 (---> 0) +0
Total: 1487 (---> 689) -798
Other cases.
Field Layout
There are other situations where the form does not appear to have signficant changes, but the implementation is vastly different. For example this field grid in IL-1040-WIT
in 2020 is implemented as a 5 x 4 grid (columnwise), with the two field form type column coming after. But in 2021 it's more straightforward as a 5x6 grid (rowwise):
Index shifting
The formgen
output has the field index as part of the method name for all fields. It tries to pick out the fields that are referenced as pdf line numbers and marks them as l<lineNumber> = () : number | undefined
, but these fields are aliased as f<index>
index
corresponds to the index of the field in the PDF. This means if a single field is added from year to year it creates a change for every subsequent method.
We should still look for a simpler implementation of pdf forms that makes these changes easier.
It might make sense to also search for [FilingStatus.S]
and the other statuses. Seems like that's often how thresholds are set and they may change from year to year.