ropensci / charlatan

Create fake data in R

Home Page:https://docs.ropensci.org/charlatan/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Guidelines about the creation of a new faker/locale?

maelle opened this issue · comments

Based on my experimentations with wikidata in #83 would it make sense to

  • explain in general how to add a new faker

  • explain how to add a locale to a faker (with the aspect of how to find data via e.g. wikidata and how to add the code and data to this package)

It might be helpful for potential contributors?

I was also wondering how the package would scale to adding many new fakers and locales. If too many fakers and locales were added, could charlatan be split in an infrastructure package and locale packages?

Just adding that if the guidelines were very thorough, they could also be published as a tech note as a call for contributions? 🤔

With a catchy title such as "Help us fake (scientific) data" 😂

Thanks @maelle !

Yes, I agree we need better guidance on creating new fakers and locales. Do you think it's best as a vignette? Agree, wouldl be good to do a post on it to call for contribs. Nice title idea!

wrt scaling, i also think about that sometimes. i haven't though yet about how to split it up. I guess one way is the R6 classes and regular functions that wrap those, and just a few locales (English, Spanish, French?), and then all other locales in another package. Or could split up by types of fakers. I think we're still okay for a while though.

Yes a vignette. Btw a pkgdown website would be cool with some grouping in the reference bc there are so many fakers (cf https://ropensci.github.io/dev_guide/building.html#function-grouping 😉 )

good idea

Cool!

  • A pkgdown website would be good. It'd allow you to link to a rendered version of this vignette from CONTRIBUTING.md for instance.
  • When defining provider maybe add a few examples of types of data and instead of "This may involve" maybe something more explicit like "A provider might be based on".
  • In Communication maybe add a more explicit statement i.e. "Open an issue before starting working on a PR".
  • Maybe add some references for folks who'd first need to learn about R6 classes.
  • When listing the address providers files maybe add a link to their GitHub source?
  • Should licencing etc. be discussed when mentioning one can use different data sources for adding locales?
  • at the very beginning maybe underline that contributions are welcome? And maybe a link to https://ropensci.github.io/dev_guide/contributingguide.html?

thanks for the feedback! i'll incorporate