sheedy/ansible-example

#ansible_test

An experimental lb/app/web cluster built for scale and flexibility

Role server count can be configured within the Vagrantfile for multi-machine config

Installation:

Clone the repo
Enter the folder
Type rake

Manual installation of requirements can be done via Homebrew and the included Brewfile

Vagrant commands:

start machines: vagrant up --no-provision
stop machines: vagrant halt
provision a specified server role (e.g. lb-01): vagrant provision lb-01
Machines are defined in the top section of the Vagrantfile - the base machine memory is 512, bump it up if you've got plenty to spare

Notable config files:

Vagrantfile
playbooks/site.yml - vagrant ansible config file
playbooks/alpha-inventory.yml - example aws ansible inventory - replace with ec2.py for dynamic inventory
playbooks/cluster.yml - external ansible inventory - replace with ec2.py for dynamic inventory

Notable folders:

playbooks/vars/ - ansible config files
playbooks/roles/ - ansible roles
shared/ - vagrant guest mount folder
shared/assets/ - nginx static folder

Tested on OSX, Ubuntu may require some tweaking

URLS:

All urls depend on DNS remotely defined or supplied by xip.io or a hosts entry
Haproxy supplies an HA virtual ip (vip) courtesy of keepalived and defined in the haproxy vars file, so point hosts, DNS or xip to that ip
Default vip: 10.10.10.10

http://app.10.10.10.10.xip.io/sample - jetty war sample app
http://static.10.10.10.10.xip.io/favicon.ico - nginx static sample file

#####TODO:

Correct rolling_update to follow full description below
Restore dynamic haproxy config assembly with static improvements
Investigate Travis for continuous testing
See Future below

Tools and components:

Vagrant 1.6+
excellent local provisioning
Ansible 1.5+
highly flexible and simple config mgmt
simple redirection between varied environments
Haproxy 1.5.1
known to be highly scalable and often deployed in HA
Nginx 1.4.6
known for excellent binary serving
Jetty 6
known for simplicity and scalability
Virtualbox 4.3+ or VMWare Fusion 6.04
virtualization option

Approach

Of the components above, I have prior experience with all but Jetty though I have some prior recent experience with Tomcat - Jetty is commonly seen to be simpler and more performant. I've been looking to test it and this was a good opportunity. I tried in all cases to avoid any hard-coding in favour of dynamic inventory and variables, to allow for future scalability, simplified maintenance.

Two server groups initially:

training (1-6 servers)
prod (6+ servers)

Challenges and proposals:

Shared, scalable persistence for legacy datastore
drbd/syncthing/btsync/s3Fuse
Vagrant/AWS/Jenkins context switch
Variable machine count in dev
Ansible + multi-inventory options
Role separation
Proper security practice
Common hardening and least-privilege
Standard tools like ufw, fail2ban, logwatch and clamav
Sensitive data located on secure host, within environment variables where possible
Reproducible configuration
Ansible with common role architecture, dynamic roles and inventory
Scaling and dynamic inventory
Consider Consul
ec2.py for ansible dynamic inventory
May need to change Ansible execution method to accelerated mode or local provisioning @ hundreds of hosts

Principles:

Effective separation of concerns by role
Least-privilege - port restrictions and specialized user privileges
DRY
Single purpose roles
Secure interaction
Hardened machines
No password auth
Allow for CI-driven provisioning
Scalability
Allow for app/web/lb scaling at varied rates
Allow for special-purpose machine configuration for each role
Dev and prod parity
Consistent deployment, identical role separation
Deployment flexibility - multi or single-server provisioning
Zero downtime
Rolling updates
Redundant and HA architecture
Geographic distribution
Visibility and integration
Event notification
ChatOps integration
Metric and log aggregation
Agility
Configuration, app, assets and environment all individually deployable
Ansible configured to run with restricted tags, specified roles or only tests if desired

Future:

Perfect forward secrecy here
Replace Prevayler with RDS or noSQL
Elastic load balancing/route53 - secondary dns
Elastic IP remap - EC2 API Tools command ec2-associate-address
Resource-based deployment:
OpenShift Origin Cluster
Mesos + Consul + Docker
CoreOS + Fleetctl + etc.d + Docker

Nice to have:

P2P for legacy datastore persistence
New Relic + alerts for APM
Statsd for metric aggregation
Logstash/Logsene for log aggregation

Shared persistence option for legacy datastorage:

2 or 3-phase commit combined with https://github.com/s3fs-fuse/s3fs-fuse is sufficient for shared persistence and consistency given the nature of the application (write light, read heavy)

consider 5GB max for a single file

###Scaling:

Packer to bake Ansible-provisioned AMI or Ansible and snapshot to enable autoscaling
Considered boot with fetch script for war file (cloud-init?) though that would require connection back to CI which is a security risk
Jenkins will push to all servers

Architecture:

LB tier (haproxy) to abstract web/app servers from DNS
LB tier is configured for HA via a heartbeat and floating public IP
LB directs to Nginx for static.domain.com (webserver role)
LB directs to Jetty for app.domain.com (appserver role)
Jetty servers share data via btsync, syncthing, drbd or another
On Vagrant, asset and artifact folders should be mounted from the host
Single subnet haproxy to allow keepalive
Route53 or ELB can be added in front to allow geo-distributed haproxy
Web and App servers can be geo-distributed for fault tolerance

Rolling update method:

Triggered by CI creation of new artifact (could be rollback)
CI hosts assets and artifact
'deploy' user is created to manage interaction with the server without root or vagrant

CI calls Ansible `rolling_update` playbook that runs within each web/app role serially:

Notify chat/new relic of deploy
Chooses a web server and app server from the main pool
Notifies the LB to move each to the maint-* frontend/backend
Gracefully reloads LB
Swings 'last' symlink for assets and artifact on chosen servers from release-2 to previous release
Pushes assets.zip and artifact.war to chosen servers
Extracts the asset zip, jetty extracts the warfile
Swings 'current' symlink for assets and artifact on chosen servers from previous to current release
Run curl to servers to ensure deployment success
Notifies the LB to move each back into the main frontend/backend
Runs a cleanup on releases older than release-2
Progresses to next server in the pool
Notify chat/new relic/sns of success

On failure, trigger rollback:

Restore asset and artifact symlinks
Notifies the LB to move each back into the main frontend/backend
Sends failure notification to chat/new relic/pagerduty/sns etc

Haproxy recycle on changes:

Trigger iptables SYN drop
Wait for completion (port not listening)
Keepalive triggers live haproxy to take over shared IP
Restart haproxy
Open iptables

sheedy / ansible-example

Installation:

Vagrant commands:

Notable config files:

Notable folders:

URLS:

Tools and components:

Approach

Two server groups initially:

Challenges and proposals:

Principles:

Future:

Nice to have:

Shared persistence option for legacy datastorage:

Architecture:

Rolling update method:

CI calls Ansible `rolling_update` playbook that runs within each web/app role serially:

On failure, trigger rollback:

Haproxy recycle on changes:

About

Languages

Installation:

Vagrant commands:

Notable config files:

Notable folders:

URLS:

Tools and components:

Approach

Two server groups initially:

Challenges and proposals:

Principles:

Future:

Nice to have:

Shared persistence option for legacy datastorage:

Architecture:

Rolling update method:

CI calls Ansible rolling_update playbook that runs within each web/app role serially:

On failure, trigger rollback:

Haproxy recycle on changes:

About

Languages

CI calls Ansible `rolling_update` playbook that runs within each web/app role serially: