activewarehouse / activewarehouse-etl

Extract-Transform-Load library from ActiveWarehouse

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create beginner-level samples

thbar opened this issue · comments

@Stiivi made very good points on the google group:

The sample is too complex for a beginner, even those who know what ETL
is, it has too many files all over the place. Those who do not know
anything about ETL might be really confused. It would be much better
if there were multiple "levels" of sample, from a sample with up to
five lines to such complex.

Example of whole ETL samples:

Example 1: load file from CSV to a database table (nothing more)
Example 2: Ex 1 + do some field transformation
Example 3: use two sources, for example CSV + table --> table
Example 4: ...

Something like this: http://flask.pocoo.org/

I know that in the case of ETL it can not be done with one file, but
anyway - as simple as possible. I've learned that from my tutorials
for Cubes - I thought they were simple enough and obvious. Yeah, sure,
for me and myself.

You might have some steps in the examples, that you will just say "do
this, you will learn later what it is". If you do, then encapsulate
them in one single step for the user - put them in a separate script.
For example data preparation. You can explain that later, in another
example.

Another minimalistic example: http://redis.io/download (see below the "set foo bar" and "get foo" )

is there chosen a format for the documentation of guides/tutorials??
I am to begin on a new project with aw-etl and activewarehouse(maybe) and it would be great if i could help the project and myself. The success of a open source project really depends on the documentation, especially when it is somewhat complex.

@pgericson I absolutely agree with what you said on documentation.

I will take a stab at a first guides website this week-end.

About the format: I plan to make it work exactly like Vagrant, so this:

https://github.com/mitchellh/vagrant/blob/docs/docs/provisioners/chef_solo.md

translate into this:

http://vagrantup.com/docs/provisioners/chef_solo.html

Do you have suggestions on what those guides should cover?

Can I ping you back once the structure is in place?

Ping away :)

Vagrant looks good.

  • Install guide(for gem installonly with bundler as this is standard, to force new users to use it)
    • for gem install instructions - only with bundler as this must be standard for a future proofing projects
    • I am building it on top of a rails app and the 3.2 has another way of installing plugins that should be shown.
  • as @Stiivi suggested
  • text
    • extract
      • single CSV file extract
      • single json file extract
      • single xml file extract
    • transform from file
    • load from file
  • single database table extract, transform, load
  • multiple sources(csv,xml,table) -> extract each,(transform), join somehow -> transform stuff -> load

some kind of flow diagram would speed up the learning process a lot I think

I will be connecting to a lot of API's so maybe I will make a small example of the process of doing that instead of a flat file(even though it is just a job before, but it could show that with a API call it should get some kind of date/user/whatever restriction as to not get everything all the time)

for visualization http://www.omninerd.com/articles/Automating_Data_Visualization_with_Ruby_and_Graphviz could be used... then the visualization code is in the source :)

Hey @pgericson - just to keep you updated a bit:

I don't have much time to finish the docs right now (although I'll get back to this later on), but if you fancy having a look, it's most welcome!