hw-cookbooks / lxc

Linux Containers via Chef

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

chef-client run never ends

jtzl opened this issue · comments

My goal is to build a workstation that will run test-kitchen and be able to launch LXCs inside it. I'm doing this with vagrant, and the vagrant image is getting provisioned properly with the chef_client provisioner.

From there, the "lxc" and "lxc::containers" recipes are added (in the Vagrantfile), along with a custom wrapper cookbook in which I'm using an lxc_container lwrp, as described in the lxc cookbook readme.

My vagrant image launches, downloads chef, and starts converging. It appears to hang after:

[2013-09-23T20:01:54+00:00] INFO: template[/var/lib/lxc/ubuntu_1204/rootfs/root/.ssh/authorized_keys] created file /var/lib/lxc/ubuntu_1204/rootfs/root/.ssh/authorized_keys
[2013-09-23T20:01:54+00:00] INFO: template[/var/lib/lxc/ubuntu_1204/rootfs/root/.ssh/authorized_keys] updated file contents /var/lib/lxc/ubuntu_1204/rootfs/root/.ssh/authorized_keys
[2013-09-23T20:01:54+00:00] INFO: template[/var/lib/lxc/ubuntu_1204/rootfs/root/.ssh/authorized_keys] mode changed to 600
[2013-09-23T20:01:54+00:00] INFO: ruby_block[lxc lock_default_users] called

And that's it. Here is the full chef run output: https://gist.github.com/jtzl/6678363

Is the apparent hanging related to my usage of the cookbook? Has anyone experienced and/or resolved this issue in their own environment?

Thanks!

Edit:

with VAGRANT_LOG=debug, it finishes lxc-create successfully and then goes to:

DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...
DEBUG ssh: Sending SSH keep-alive...

Over and over. It also did this while waiting on lxc-create, but it will go for over an hour without intervention.

@jtzl Thanks for the report. I'll get a fresh build going and see if I can replicate. Is the host system precise?

@chrisroberts Thanks for following up - I'm now having the same issue on 12.10, though I had the same problem on 12.04 before do-release-upgrade.

@jtzl I haven't been able to get this hang to replicate. Would it be possible to post a gist of the provision with logging turned up to debug?

Here's the end of the debug log from chef:

vagrant@mal-berkshelf:~$ tail -n100  /tmp/chef.log
[2013-12-07T14:55:08+00:00] INFO: Processing ruby_block[lxc lock_default_users] action run (/tmp/vagrant-chef-1/chef-solo-1/cookbooks/lxc/providers/container.rb line 176)
[2013-12-07T14:55:08+00:00] DEBUG: Platform ubuntu version 12.04 found
[2013-12-07T14:55:08+00:00] INFO: ruby_block[lxc lock_default_users] called
[2013-12-07T14:55:08+00:00] INFO: Processing ruby_block[lxc default_password_scrub] action run (/tmp/vagrant-chef-1/chef-solo-1/cookbooks/lxc/providers/container.rb line 197)
[2013-12-07T14:55:08+00:00] DEBUG: Skipping ruby_block[lxc default_password_scrub] due to not_if command `grep 'root:*' /var/lib/lxc/my_container/rootfs/etc/shadow`
[2013-12-07T14:55:08+00:00] INFO: Processing ruby_block[lxc start[my_container]] action run (/tmp/vagrant-chef-1/chef-solo-1/cookbooks/lxc/providers/container.rb line 212)
[2013-12-07T14:55:08+00:00] DEBUG: Platform ubuntu version 12.04 found
[2013-12-07T14:55:08+00:00] DEBUG: sh(lxc-start -n my_container -d)
[2013-12-07T14:55:08+00:00] DEBUG: sh(lxc-info -n my_container)
[2013-12-07T14:55:09+00:00] DEBUG: sh(lxc-info -n my_container)
[2013-12-07T14:55:10+00:00] DEBUG: sh(lxc-info -n my_container)
...
[2013-12-07T14:58:35+00:00] DEBUG: sh(lxc-info -n my_container)
[2013-12-07T14:58:37+00:00] DEBUG: sh(lxc-info -n my_container)

This appears to happen indefinitely ;(

And if I try to run that manually, it says that it isn't running, so I tried to start it:

vagrant@mal-berkshelf:~$ sudo lxc-start -n my_container
lxc-start: No such file or directory - failed to mount '/opt/file_store' on '/usr/lib/lxc/root//opt/file_store'
lxc-start: failed to setup the mounts for 'my_container'
lxc-start: failed to setup the container
lxc-start: invalid sequence number 1. expected 2
lxc-start: failed to spawn 'my_container'

Creating the /opt/file_store directory prior to the lxc-start -n my_container -d fixes this for me and allows the chef run to complete, albeit unsucessfully; stack trace

interesting. i'll have a look at the file store bit. the error you were running into in the stack trace is resolved in the latest release.

That was my mistake, was trying to mount /opt/file_store which I copied from the example not appreciating that it was just a mount, as a result I was trying to mount a non-existent directory. Ooops!

Not had any problems now I'm using the new version, and using it correctly 😀