A need arose for large, random-but-real-looking data sets and like any proper software developer I immediately took things too far. I also identified - and took advantage of - an opprotunity to learn a lot more about C#'s impressive reflective capabilites.
As for the name, I'm naturally terrible at naming things so I started simply with just "Random Generator". That shortened into "Rg" which, thanks to high school chemistry, I'd recalled was a chemical symbol. Then I learned it is extremely radioactive, which of course is one of the best naturally-occuring sources of true randomness. So the name stuck.
Implemented as a .NET Core REST API, Rg
can be built & run on any major modern OS.
dotnet build
in the project directory (the same as in which Roentgenium.csproj
lives)
- .NET Core 2.2 or later
- Microsoft Azure account:
- KeyVault can be used to store secrets (such as connection strings)
- Blob storage can be used to store the resulting artifacts
- A Redis instance:
- To use the
stream
output format
- To use the
dotnet run
in the project directory
- the
ASPNETCORE_ENVIRONMENT
environment variable directly controls (via simple substituion) whichappsettings.*.json
file is used *via the interpolationappsettings.{ASPNETCORE_ENVIRONMENT}.json
).
In the simplest, default mode, generated data sets will be persisted only via the Filesystem persistence module with the artifacts written into the working directory.
The build artifacts (Roentgenium.dll
and its brethen in bin/{CONFIG}/netcoreapp2.2
) are relocatable and can be run directly via the dotnet
tool by eliding (oddly enough) the run
verb and directly specifiying the dll
path itself:
dotnet bin/Release/netcoreapp2.2/Roentgenium.dll
Rg
will always look in the working directory for the appropriate appsettings
file, so if run directly from the bin/Release/netcoreapp2.2/
without any settings files, the default configuration (as noted above) will be used.
For interface documentation, Rg
includes Swagger self-description support, always accessible on any running instance via the /swagger
path.
Postman is the recommend way to interact easily with the interface.
This output format is implementation-specific to Rg
, utilizing Redis pub/sub to stream generated data to any number of interested subscribers.
It requires that the Extra
field of the generator configuration structure include an entry named streamId
, which specifies the channel name to be used when publishing each record.
rg.rpjios.com
allows larger data sets but does not persist anything to Azure or currently in any way that is retrievable by the end user! Best for playing with the convenience method.azure.rg.rpjios.com
only allows small data sets but does persist to Azure (per this configuration).
There are many points of extensibility in Rg
and developers wishing to extend its functionality are encouraged to do so and submit a PR any time.
Living here, they're simple serializable classes
implementing ISpecification
which are then
exposed as the supported specifications.
Fields in an ISpecification
are generated based on either the default generator for the field's Type
or a custom generator specified explicitly per-field via the GeneratorTypeAttribute
.
The general IPipeline
interface specifies a feed-forward data pipeline, currently concretely implemented only once by Pipeline.cs
.
ISourceStage
implementation. There currently is only one which generates random data based on the specification & any mutating attributes applied, eventually calling the appropriate field generators to build data sets.
However, the source interface only requires implementation of a single method, so adding different sources would be relatively straightforward though would require addressing a few assumptions that there'd only ever be one.
Of note is that the system assumes that any concrete implemenation is capable of producing infinite IGeneratedRecord
s.
IIntermediateStage
implementations which themselves are just both an ISourceStage
& ISinkStage
at once, having each record "passed through" during execution of the overall pipeline. They are enumerated at runtime to be exposed as the supported filters.
ISinkStage
implementations named according to and with an OutputFormatSinkType
attribute specified, these stages are exposed at runtime as the available output formats.
The stream
format implementation does not use the bonafide stream
data type as it isn't yet widely available.
Implementations of IPersistenceStage
, a specialized stage that exists only to persist the otherwise-ephermal results of the pipeline somewhere else.