wader / fq

jq for binary formats - tool, language and decoders for working with binary and text formats

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Increase performance

pnsafonov opened this issue · comments

What version are you using (fq -v)?

$ fq -v
0.0.9 (linux amd64)

How was fq installed?

fq is build from source.

My branch:
https://github.com/pnsafonov/fq/tree/postgres

Can you reproduce the problem using the latest release or master branch?

Yes

What did you do?

I am implementing PostgreSQL format parsers. I can offer perfomance fix.

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    1m20.079s
user    1m21.643s
sys     0m0.389s

Then i made fix, and run again:

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    0m2.076s
user    0m3.452s
sys     0m0.242s

This fix decrease execution time 1m20.079s -> 0m2.076s, file decode.go:

func (d *D) AddChild(v *Value) {
	v.Parent = d.Value

	switch fv := d.Value.V.(type) {
	case *Compound:
		//if !fv.IsArray {
		//	for _, ff := range fv.Children {
		//		if ff.Name == v.Name {
		//			d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
		//		}
		//	}
		//}
		fv.Children = append(fv.Children, v)
	}
}

Also i made some perf with go tools. This was how i found fix:

   283.22s 84.00% 84.00%    321.80s 95.44%  github.com/wader/fq/pkg/decode.(*D).AddChild
    36.65s 10.87% 94.87%     36.65s 10.87%  memeqbody
     2.42s  0.72% 95.58%      4.94s  1.47%  runtime.scanobject
     0.94s  0.28% 95.86%      3.75s  1.11%  runtime.mallocgc
     0.44s  0.13% 95.99%      2.98s  0.88%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn.func1
     0.25s 0.074% 96.07%         5s  1.48%  runtime.gcDrain
     0.13s 0.039% 96.11%      2.58s  0.77%  runtime.newobject
     0.10s  0.03% 96.14%    323.53s 95.95%  github.com/wader/fq/pkg/decode.(*D).FillGaps
     0.08s 0.024% 96.16%      3.68s  1.09%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarFn
     0.03s 0.0089% 96.17%      2.21s  0.66%  github.com/wader/fq/pkg/decode.(*D).TryFieldScalarU16
     0.03s 0.0089% 96.18%      3.60s  1.07%  github.com/wader/fq/pkg/decode.(*D).TryFieldValue
     0.02s 0.0059% 96.18%      4.47s  1.33%  github.com/wader/fq/pkg/decode.(*D).FieldStruct
     0.02s 0.0059% 96.19%      2.25s  0.67%  github.com/wader/fq/pkg/decode.(*D).FieldU16
     0.02s 0.0059% 96.19%      2.14s  0.63%  github.com/wader/gojq.(*compiler).compileModule
     0.02s 0.0059% 96.20%    330.10s 97.90%  github.com/wader/gojq.(*env).Next
     0.01s 0.003% 96.20%      5.86s  1.74%  runtime.systemstack
         0     0% 96.20%      4.47s  1.33%  github.com/wader/fq/format/postgres.decodePgheap
         0     0% 96.20%      4.18s  1.24%  github.com/wader/fq/format/postgres/flavours/pgproee14.DecodeHeap (inline)

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

Link to 16397_8:
https://github.com/pnsafonov/fq_testdata_postgres14/raw/master/16397_8

16397_8 - first 8 mb of 1 GB 16397

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

Thank you for patch.
I got memory issues with fq:
#409
I will think how to implement compile time check or cmd option.

Hey, interesting and nice to see your working on postgres decoders!

I wonder if the pgheap decoder produces structs with lots of fields? can i get the test file "16397_8" somewhere?

Also could you try with the patch below? it will use a bit more memory but should speed up that check. I've thought adding something like this to speed up key indexing from jq anyway but haven't bothered as none of the current decoders produce structs with lots of fields.

diff --git a/pkg/decode/decode.go b/pkg/decode/decode.go
index e899eac3..5ddc834f 100644
--- a/pkg/decode/decode.go
+++ b/pkg/decode/decode.go
@@ -721,11 +721,10 @@ func (d *D) AddChild(v *Value) {
        switch fv := d.Value.V.(type) {
        case *Compound:
                if !fv.IsArray {
-                       for _, ff := range fv.Children {
-                               if ff.Name == v.Name {
-                                       d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
-                               }
+                       if _, ok := fv.Keys[v.Name]; ok {
+                               d.Fatalf("%q already exist in struct %s", v.Name, d.Value.Name)
                        }
+                       fv.Keys[v.Name] = struct{}{}
                }
                fv.Children = append(fv.Children, v)
        }
diff --git a/pkg/decode/value.go b/pkg/decode/value.go
index 63927f6a..bf82ba04 100644
--- a/pkg/decode/value.go
+++ b/pkg/decode/value.go
@@ -13,6 +13,7 @@ type Compound struct {
        IsArray     bool
        RangeSorted bool
        Children    []*Value
+       Keys        map[string]struct{}
        Description string
 }

Alternatively we could make that check optional somehow in compile time or runtime as it is quite nice to have it during decoder development.

Patch works well:

time fq -d pgheap -o flavour=postgres14 ".Pages[0].PageHeaderData.pd_linp[0,-1] | tovalue" 16397_8
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 8064
}
{
  "lp_flags": "LP_NORMAL",
  "lp_len": 121,
  "lp_off": 384
}

real    0m2.227s
user    0m3.679s
sys     0m0.264s

Pull request with map:
#411

Fixed by #411

BTW feel free to open a PR for the postgres decoder. Had a quick look at your branch and it looks impressive 👍 I think it is usually good to do to a PR earlier to get feedback and start iterating.

Also i usually prefer rebase over merging upstream would that be ok with you? personally think it keeps the history clearer and easier to follow