brianvoe / gofakeit

Random fake data generator written in go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Concurrency

bgadrian opened this issue · comments

Hello, nice job on the package, it looks neat and has a good documentation.

I have 2 main issues:

  1. is not concurrent safe, it uses rand.* global version
  2. I need one instance for each thread, and all the functions are public.

The refactoring will be big and I don't see a way to make it backward compatible, to make all the functions as methods and using a custom private random generator.

Any ideas?

Can you tell me a little more about your use case? Maybe im missing something but my question would be why care if its using the global rand, how does that effect what your doing?

Thanks.

Hey sure! Thanks for the quick answer.

the Global rand is safe for concurrency, but because it uses a mutex, cannot be used parallel, so performance will be a problem.

2nd issue is that I want to have the ability to seed(), if you do that concurrently and more reads() are done in parallel, the result of 1 operation will not be deterministic, and I want that in my usage. Also the rand.Seed() docs state to not call this function in parallel with any other call.

So basically I want the ability to have 1 instance of a random generator (gofakeit implicitely) for each goroutine, other packages solve these problems like https://github.com/valyala/fastrand
but just using a new instance of gofakeit with a custom new Source should be enough for me.

My usage is to generate hundreds of GB of deterministic (pseudo) random data in a very short amount of time.

Thanks for the explanation. That seems like a pretty interesting challenge and i think your right in order to make that work for your specific situation you could need a specific seeder and that would break backwards compatibility for gofakeit.

Im not against it and i do that other things that I could update that would need a full version bump, but if it breaks the simplicity of doing somethings like gofakeit.address() and requires a bunch of additional initiations or requires to pull any third party stuff for the randomizer it would be a tough sell for me.

Im up for suggestions so let me know your thoughts.

Ok let me know if you want to put something together that i can look at

For now I achieved ~240MB/s of generated data with a few cores on my machine, so I think is enough for now. https://github.com/bgadrian/pseudoservice

Before embarking at this huge refactoring I want to get used to the package and make more small improvements.

image

I've returned with a real example, most of the gofake it calls are made out of the mutex (see pink bottom cells), more predominant at the shuffleInts because it requires many values.

Thats interesting

Today I have found another downside of having Public functions, I cannot list them all using reflection.

I want to give the ability of my API to call all the gofakeit functions with simple variables. Instead of list them manually I want to make a dynamic generator like ["en_name"] = gofakeit.name, and it seems you cannot list the packages functions https://stackoverflow.com/questions/41629293/how-do-i-list-the-public-methods-of-a-package-in-golang

So I will start the refactoring to a struct sooner than later.

Im open to suggestions but im not a big fan of reflections. The great thing about Go is its explicit declaration of usage if we get into the realm of reflections we loose that and the performance of it.

Me either, I am just using reflection once, to populate the map , before compilation, so I do not have to manually write all the gofakeit functions and change them each time gofakeit gets a new functionality.

The problem will be solved when I refactor the global functions as methods of a struct, like I talked about before. I just wanted to state another reason for this refactoring, beside the performance concerns.

ya i hear ya

Hey,
So I've finished earlier than I expected, only took around 4-5 hours, and I managed to make a functional version

bgadrian/fastfaker@f75c357

Please check it out and state your mind. I did not knew how to name the global instance so for now is Global (programming hardest thing is naming).

The update is backward compatible, except on how the actual function calls are made (now as methods).

Oh wow this is big! Ok ill take a look at it and really give it some thought!

Beside the function to method move, I also fixed other small stuff, so it bloated the PR sorry.

The good side that the initial benchmarks looks promising, for parallel processing.

Hey @bgadrian

Just making sure you know i didnt forget about this.

I talked with some other users of the package and showed them the proposal you put together and we all agreed it would be a fine idea to have this in the package had the package started off doing this from the beginning.

The problem we ran into is that for the other 99% of people who use gofakeit, would not benefit from this and would require them to rewrite all of their usage in order to continue using gofakeit. So as it stands right now I dont think i can put this in. That being said for us to convince people that doing this update and rewriting what they currently have we would have to have other features the were a true benefit to them if they were to rewrite to this new implementation.

So im not saying no to this but we would have to have a more significant beneficial reason for people to make the change in this direction.

Thanks!

Thanks, I understand, but the rewrite would consists of a simple Search and Replace in a project ("gofakeit" to "gofakeit.Global").

I will make a hard fork then and if you ever reconsider we can merge later.

Understandable. Thanks anyway.

commented

@brianvoe Would it be possible to link to @bgadrian's fork with a note in README.md of this repo?
New users can make the appropriate decision then.

Ya if you want to do a pr for that update ill accept it. Something subtle that describes its purpose.

I would also need confirmation that @bgadrian would be willing to maintain any updates in his repo as well.

I will bring updates from gofakeit as much as possible. I kept a branch to add and push updates further, the repo is https://github.com/bgadrian/fastfaker

The difference between the gofakeit (beside the path, name and version) is that instead of gofakeit.Name() the calls are fastfaker.Global.Name() or fastfaker.NewFastFaker().Name()

I also plan to add Unicode support ASAP.

Also I do not guarantee that in the future all the deterministic backward compatibility will be kept (as in if you call gofakeit.Seed(11) and gofakeit.Name(), if you use the same seed in the fastfaker the result may be different.