Increase performance
pnsafonov opened this issue · comments
What version are you using (fq -v
)?
$ fq -v 0.0.9 (linux amd64)
How was fq installed?
fq is build from source.
My branch:
https://github.com/pnsafonov/fq/tree/postgres
Can you reproduce the problem using the latest release or master branch?
Yes
What did you do?
I am implementing PostgreSQL format parsers. I can offer perfomance fix.
time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8 { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 8064 } { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 384 } real 1m20.079s user 1m21.643s sys 0m0.389s
Then i made fix, and run again:
time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8 { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 8064 } { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 384 } real 0m2.076s user 0m3.452s sys 0m0.242s
This fix decrease execution time 1m20.079s -> 0m2.076s, file decode.go
:
func (d *D) AddChild(v *Value) { v.Parent = d.Value switch fv := d.Value.V.(type) { case *Compound: //if !fv.IsArray { // for _, ff := range fv.Children { // if ff.Name == v.Name { // d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name) // } // } //} fv.Children = append(fv.Children, v) } }
Also i made some perf with go tools. This was how i found fix:
283.22s 84.00% 84.00% 321.80s 95.44% github.com/wader/fq/pkg/decode.(*D).AddChild 36.65s 10.87% 94.87% 36.65s 10.87% memeqbody 2.42s 0.72% 95.58% 4.94s 1.47% runtime.scanobject 0.94s 0.28% 95.86% 3.75s 1.11% runtime.mallocgc 0.44s 0.13% 95.99% 2.98s 0.88% github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn.func1 0.25s 0.074% 96.07% 5s 1.48% runtime.gcDrain 0.13s 0.039% 96.11% 2.58s 0.77% runtime.newobject 0.10s 0.03% 96.14% 323.53s 95.95% github.com/wader/fq/pkg/decode.(*D).FillGaps 0.08s 0.024% 96.16% 3.68s 1.09% github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn 0.03s 0.0089% 96.17% 2.21s 0.66% github.com/wader/fq/pkg/decode.(*D).TryFieldScalarU16 0.03s 0.0089% 96.18% 3.60s 1.07% github.com/wader/fq/pkg/decode.(*D).TryFieldValue 0.02s 0.0059% 96.18% 4.47s 1.33% github.com/wader/fq/pkg/decode.(*D).FieldStruct 0.02s 0.0059% 96.19% 2.25s 0.67% github.com/wader/fq/pkg/decode.(*D).FieldU16 0.02s 0.0059% 96.19% 2.14s 0.63% github.com/wader/gojq.(*compiler).compileModule 0.02s 0.0059% 96.20% 330.10s 97.90% github.com/wader/gojq.(*env).Next 0.01s 0.003% 96.20% 5.86s 1.74% runtime.systemstack 0 0% 96.20% 4.47s 1.33% github.com/wader/fq/format/postgres.decodePgheap 0 0% 96.20% 4.18s 1.24% github.com/wader/fq/format/postgres/flavours/pgproee14.DecodeHeap (inline)
Hey, interesting and nice to see your working on postgres decoders!
I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?
Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.
diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
switch fv := d.Value.V.(type) {
case *Compound:
if !fv.IsArray {
- for _, ff := range fv.Children {
- if ff.Name == v.Name {
- d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
- }
+ if _, ok := fv.Keys[v.Name]; ok {
+ d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
}
+ fv.Keys[v.Name] = struct{}{}
}
fv.Children = append(fv.Children, v)
}
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
IsArray bool
RangeSorted bool
Children []*Value
+ Keys map[string]struct{}
Description string
}
Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.
Link to 16397_8:
https://github.com/pnsafonov/fq_testdata_postgres14/raw/master/16397_8
16397_8 - first 8 mb of 1 GB 16397
Hey, interesting and nice to see your working on postgres decoders!
I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?
Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.
diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go index e899eac3..5ddc834f 100644 --- a/pkg/decode/decode.go +++ b/pkg/decode/decode.go @@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) { switch fv := d.Value.V.(type) { case *Compound: if !fv.IsArray { - for _, ff := range fv.Children { - if ff.Name == v.Name { - d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name) - } + if _, ok := fv.Keys[v.Name]; ok { + d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name) } + fv.Keys[v.Name] = struct{}{} } fv.Children = append(fv.Children, v) } diff --git a/pkg/decode/value.go b/pkg/decode/value.go index 63927f6a..bf82ba04 100644 --- a/pkg/decode/value.go +++ b/pkg/decode/value.go @@ -13,6 +13,7 @@ type Compound struct { IsArray bool RangeSorted bool Children []*Value + Keys map[string]struct{} Description string }Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.
Thank you for patch.
I got memory issues with fq:
#409
I will think how to implement compile time check or cmd option.
Hey, interesting and nice to see your working on postgres decoders!
I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?
Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.
diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go index e899eac3..5ddc834f 100644 --- a/pkg/decode/decode.go +++ b/pkg/decode/decode.go @@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) { switch fv := d.Value.V.(type) { case *Compound: if !fv.IsArray { - for _, ff := range fv.Children { - if ff.Name == v.Name { - d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name) - } + if _, ok := fv.Keys[v.Name]; ok { + d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name) } + fv.Keys[v.Name] = struct{}{} } fv.Children = append(fv.Children, v) } diff --git a/pkg/decode/value.go b/pkg/decode/value.go index 63927f6a..bf82ba04 100644 --- a/pkg/decode/value.go +++ b/pkg/decode/value.go @@ -13,6 +13,7 @@ type Compound struct { IsArray bool RangeSorted bool Children []*Value + Keys map[string]struct{} Description string }Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.
Patch works well:
time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8 { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 8064 } { "lp_flags": "LP_NORMAL", "lp_len": 121, "lp_off": 384 } real 0m2.227s user 0m3.679s sys 0m0.264s
Pull request with map:
#411
Fixed by #411
BTW feel free to open a PR for the postgres decoder. Had a quick look at your branch and it looks impressive 👍 I think it is usually good to do to a PR earlier to get feedback and start iterating.
Also i usually prefer rebase over merging upstream would that be ok with you? personally think it keeps the history clearer and easier to follow