dobachi / ansible-bigdata

Ansible playbooks to construct distributed computing environments

Home Page:http://dobachi.github.io/ansible-bigdata

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

comment and FR: provide sample stack playbooks for stringing conf playbooks together

bayeslearner opened this issue · comments

This is probably the most complete ansible bigdata script I can find on github! Thank you so much for this repo first and foremost!

I looked at these things a few years ago, and I now wonder about several things as someone who is trying to getting back into this world of big data.

  • with Hortonworks gone, cloudera closing doors on open source, Ambari retiring this year, it seems the overall big data (engineering) is fading into the background. My understanding is that companies are going directly to cloud offerings such as databricks, google colab, azure HDinsight etc etc, which require little maintenance engineering, plus reliable cloud storage. Is this correct? For learning individual components and become proficient though, many indie people like me probably do not have resources/money to build a 10 node cluster physical or even VM based.

  • I'm thinking of creating an ansible lab in docker with systemd/ssh-server enabled, like https://github.com/arnabsinha4u/ansible-traininglab/tree/master/master
    then utilizing your repo for a bigdata lab; do you forsee any issues?

  • If I understand your architecture somewhat, I see you have a folder conf for configurations for individual components; a folder for operations; it would be nice if you could provide some sample stacks in another folder.

Thanks again and have a great day!