ustaxes / UsTaxes

Tax filing web application

Home Page:https://ustaxes.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add diff report build step for years' forms implementations

zakpatterson opened this issue · comments

We should know the impact of a PR in terms of the line difference between the forms' years directories.

The way we are modeling the IRS pdfs, rather than relying on subclassing or some other abstraction, we allow complete copy-paste code duplication between the form implementations of each form for each tax year. This means a form such as Schedule D which has no significant changes from 2020 to 2021 should have no code diff either, whereas a form such as Schedule 2 which had significant changes would have a significant diff.

This means the minimized diff between src/forms/{Y2020, Y2021}/irsForms should give us the actual changes put out by the IRS for each form

So we should be interested in the size of the diff between these directories.

Examples

diff -r ./src/forms/{Y2020,Y2021}/irsForms

or

ls ./src/forms/Y2020/irsForms/*.ts | xargs -I@ basename "@" | xargs -I@@ diff --color --suppress-common-lines ./src/forms/{Y2020,Y2021}/irsForms/@@'

Schedule D

Reveals that the 2020 implementation is ready for form 8949 reporting with boxes A and B checked, but 2021 is not. this diff should be reduced to zero.

➜  ustaxes git:(feature-transaction-list-importing) diff -r ./src/forms/{Y2020,Y2021}/irsForms/ScheduleD.ts
53,89c53,64
<   l1f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxA())
< 
<   l1bd = (): number =>
<     sumFields(this.l1f8949s().map((f) => f.shortTermTotalProceeds()))
<   l1be = (): number =>
<     sumFields(this.l1f8949s().map((f) => f.shortTermTotalCost()))
< 
<   l1bg = (): number =>
<     sumFields(this.l1f8949s().map((f) => f.shortTermTotalAdjustments()))
<   l1bh = (): number =>
<     sumFields(this.l1f8949s().map((f) => f.shortTermTotalGain()))
< 
<   l2f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxB())
<   l2d = (): number =>
<     sumFields(this.l2f8949s().map((f) => f.shortTermTotalProceeds()))
< 
<   l2e = (): number =>
<     sumFields(this.l2f8949s().map((f) => f.shortTermTotalCost()))
< 
<   l2g = (): number =>
<     sumFields(this.l2f8949s().map((f) => f.shortTermTotalAdjustments()))
< 
<   l2h = (): number =>
<     sumFields(this.l2f8949s().map((f) => f.shortTermTotalGain()))
< 
<   l3f8949s = (): F8949[] => this.f8949.filter((f) => f.part1BoxC())
<   l3d = (): number =>
<     sumFields(this.l3f8949s().map((f) => f.shortTermTotalProceeds()))
< 
<   l3e = (): number =>
<     sumFields(this.l3f8949s().map((f) => f.shortTermTotalCost()))
< 
<   l3g = (): number =>
<     sumFields(this.l3f8949s().map((f) => f.shortTermTotalAdjustments()))
< 
<   l3h = (): number =>
<     sumFields(this.l3f8949s().map((f) => f.shortTermTotalGain()))
---
>   l1bd = (): number | undefined => undefined
>   l1be = (): number | undefined => undefined
>   l1bg = (): number | undefined => undefined
>   l1bh = (): number | undefined => undefined
>   l2d = (): number | undefined => undefined
>   l2e = (): number | undefined => undefined
>   l2g = (): number | undefined => undefined
>   l2h = (): number | undefined => undefined
>   l3d = (): number | undefined => undefined
>   l3e = (): number | undefined => undefined
>   l3g = (): number | undefined => undefined
>   l3h = (): number | undefined => undefined

data/federal.ts

Reveals that capital gains tax brackets have two datapoints for 2020 but three datapoints for 2021. This is probably an error.

<         brackets: [40000, 441450]
---
>         brackets: [40400, 164925, 441450]
117c117
<         brackets: [80000, 496600]
---
>         brackets: [80800, 329850, 496600]
120c120
<         brackets: [80000, 496600]
---
>         brackets: [80800, 329850, 496600]
123c123
<         brackets: [40000, 248300]
---
>         brackets: [40400, 164925, 250800]
126c126
<         brackets: [53600, 469050]
---
>         brackets: [54100, 164900, 473750]

Form 1040

If schedule D is not-required checkbox is implemented differently for no reason between 2020 and 2021.

<       this.scheduleD === undefined,
---
>       this.l7Box(),

The final report could be something like:

                   2020 ---> 2021 
F1040.ts           321 (--> 321) +0
F1040v.ts          14 (--> 16) + 2
F2441.ts           1000 (new file) (--> 200) -800
F2555.ts           142 (---> 142) + 0
ScheduleD.ts       0  (---> 0) +0
Total:             1487  (---> 689)  -798

Other cases.

Field Layout

There are other situations where the form does not appear to have signficant changes, but the implementation is vastly different. For example this field grid in IL-1040-WIT in 2020 is implemented as a 5 x 4 grid (columnwise), with the two field form type column coming after. But in 2021 it's more straightforward as a 5x6 grid (rowwise):

image

Index shifting

The formgen output has the field index as part of the method name for all fields. It tries to pick out the fields that are referenced as pdf line numbers and marks them as l<lineNumber> = () : number | undefined, but these fields are aliased as f<index> index corresponds to the index of the field in the PDF. This means if a single field is added from year to year it creates a change for every subsequent method.

We should still look for a simpler implementation of pdf forms that makes these changes easier.

It might make sense to also search for [FilingStatus.S] and the other statuses. Seems like that's often how thresholds are set and they may change from year to year.