ustaxes / UsTaxes

Tax filing web application

Home Page:https://ustaxes.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Investigate PDF generation slowness

ssigwart opened this issue · comments

I was also looking into why it's quite slow to generate the PDF. I found the following.

This line takes about 1 second:

const r1 = await runAsync(this.f1040Pdfs())

This line takes about 6 seconds:

const r2 = await r1.mapAsync(combinePdfs)

Any thoughts on how to solve that? Do you think it's worth trying to speed up or do you think it should just be masked with a loading overlay?

Reply from @zakpatterson:

Let's make a new issue to investigate the slowness here, I do think we need to change strategy with it a bit.

combinePDFs does a lot of work, and rendering PDFs is pretty slow for sure. But it looks like there could be some duplication of effort in that implementation.

Keep in mind that we have form data on numerous attachments that have to be combined into one attachment. I found a while ago that the only way to do this is to reserialize the PDFDocuments so that the form data is frozen into the forms. As a result the outputted PDF is no longer editable.

So essentially each document is being serialized, then every page is being copied over one at a time into a new document. But I don't know why this should take so long.

I think this could be solved with a kind-of simple refactoring.

As currently implemented each line is a function, which might be dependent on each other function. So each form has this stupid factorial complexity, where L2 can depend on L1, and L3 can depend on L1,L2, etc... Then across many forms each line can be recalculated hundreds of times.

We don't need a complicated caching / memoization scheme because the forms are created as a function of the information object. All the forms should be regenerated if the information changes.

So instead of:

l1 = (): number => ...

It can be:

readonly info: Information
computedForm?: ComputedForm
computed = (): ComputedForm => {
  if(this.computedForm === undefined) {
     this.computedForm = {
        l1: ...,
        ...
     }
   }
   return this.computedForm
}

And then all accesses need to be to the computed form instead of to the actual form.

I think we can mitigate the refactoring a little bit by providing get accessors in the 1040 for each attachment, which points to that attachment's computed object, but the actual line number for each access will be a simple object lookup instead of a function call. So this will still touch a ton of lines.

Edit: NO, this won't work I don't think, because there are circular reference problems. There are fields on the 1040 that point to Schedule 2, and then fields on Schedule 2 that require values from 1040. We can't reliably complete an entire form atomically

Did you mean to post this on #1024?

Edit: NO, this won't work I don't think, because there are circular reference problems. There are fields on the 1040 that point to Schedule 2, and then fields on Schedule 2 that require values from 1040. We can't reliably complete an entire form atomically

Yeah, that's the fun part about trying to do taxes.