faker-js / faker

Generate massive amounts of fake data in the browser and node.js

Home Page:https://fakerjs.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: Use a single seed value per faker function invocation

ST-DDT opened this issue · comments

Clear and concise description of the problem

Glossary:

  • seed value: direct or indirect invocation of randomizer.next() or equivalents thereof

Currently when invoking a faker function it may consume an unknown number of seed values (0-n).
While that isn't bad by itself, it has the side effect, that any change to the implementation affects the user by generating different subsequent values. This is especially relevant for functions that generate multiple values such as multiple and unique, each of the generated elements affect the following elements generated in them and all elements afterwards.

Suggested solution

Let each method use only a single seed value by deriving it, if it uses more than one.
Each method would be responsible for itself to consume only a single seed value.

function multiple(fakerCore: Faker(Core), generator: (fakerCore, index) => T, options): T[] {
  // consumes a single seed from the original
  const derived = fakerCore.derive(); 
  // consumes a seed value from the derived
  const count = rangeToNumber(derived, options.count); 
  // each call on the generator consumes another seed from derived
  // if the generator would need more than one single value it would derive by itself
  // even if it doesn't the multiple function upholds its contract and still behaves better than a simple for loop
  return Array.from({length: count}, (_, i) => generator(derived, i));
}

Important usage detail, the fakerCore instance must be passed on and used by any nested code.

Deriving an instance does come at a performance cost, but we could make it cheap, if we keep that as a priority during the derive implementation and use standalone functions like teased in the code example.

E.g. by not re-initializing the twister from scratch, but only copy and transforming the state.
derive() {
  let random = this.state.next();
  const stateCopy = this.state.map((old) => old + random + aRandomStaticValue + 0 * (random = old));
  return new Twister(stateCopy);
}

(We should measure though how much "re-initializing" vs "copy and transforming the state" actually has of a performance impact)

It also comes at a cost of additional code, we could hide that in our potential meta framework though.

This section is largely unrelated to this proposal and should just demonstrate, how the derive could be included in the potential meta framework.

function multiple(fakerCore: Faker(Core), generator: (fakerCore, index) => T, options): T[] {
  const count = rangeToNumber(fakerCore, options.count); 
  return Array.from({length: count}, (_, i) => generator(fakerCore, i));
}

--- Autogenerated same file

declare function boundMultiple(generator: (fakerCore, index) => T, options): T[];
[...]
export const multiple = fakerize<typeof multiple, typeof boundMultiple>(multiple, {derive: true, isCallable: ...});
[...]
function fakerize<TRaw, TBound>(fn: TRaw, options): Fakerized<TRaw, TBound> {
  if (options.derive) {
    fn = (fakerCore, ...args) => fn(fakerCore.derive(), ...args);
  }
  fn.bindTo = (fakerCore) => (...args) => fn(fakerCore, ....args) as TBound;
  fn.isCallable = ...;
  return fn;
}

Usage

multiple(fakerEN, (fakerCore) => firstName(fakerCore), 5);
// or multiple(fakerEN, firstName, 5);

const multipleEN = multiple.bindTo(fakerEN);
multipleEN((fakerCore) => firstName(fakerCore), 5);
// or multipleEN(firstName, 5);

// Why not like this?
const firstNameEN = firstName.bindTo(fakerEN);
multipleEN(() => firstNameEN(), 5);
// or
multiple(fakerCore, () => firstNameEN(), 5);

// Because firstNameEN would consume the seeds directly from the bound fakerEN instance
// and thus bypass most benefits from `derive`.

Alternative

Don't change the current behavior.

Additional context

Relevant issues/PRs:

  • #1499 (outdated demonstration of the feature)
  • #1250 (required for performance reasons)

Potentially impacted issues/PRs:

  • #2661 (due to the one seed per invocation feature)

We already talked about this feature quite a bit, but I would like to have this issue, to bring everybody on the same level.
And give everybody the chance to comment and react.

How would that work though with fake patterns since different fake patterns can have different numbers of placeholders?

Fake would assume that the pattern requires multiple seed values and always derive at the start of the fake method.

Team Decision

  • We want to do this conceptually, but aren't sure about the exact implementation requirements.

Blocked by #2667