Access mounted (ip netns ...) network namespaces.

Question

Access mounted (ip netns ...) network namespaces.

tonobo opened this issue 8 years ago · comments

Tim Foerster commented 8 years ago

Submission type

Request for enhancement (RFE)

systemd version the issue has been seen with

229

Used distribution

Ubuntu 16.04 Xenial

Is there any possibilty to join a named network namespace. I'd like to create units with PrivateNetwork=my_custom_namespace. There are some cases where is would be able to join a namespace which is not created or handled by systemd.

Lennart Poettering · Answer 1 · Sat Apr 16 2016 03:29:20 GMT+0800 (China Standard Time)

I can see why people want this, but I must say I really don't like the concept of this, and I am not sure I want to see this supported natively.

However, note that you can simply invoke "/usr/bin ip netns exec…" from the ExecStart= line of a service. Hence I think there's really no need to support this directly in systemd, as you can easily do this anyway without such support.

I hope this makes sense.

Closing.

Dustin Frisch · Answer 2 · Wed Jun 29 2016 16:21:21 GMT+0800 (China Standard Time)

One possible use case is to run distribution defined services in a namespace which requires complex setup.

Having a separate option allows to run existing service definitions without worrying about updates changing the distribution-defined ExecStart= lines.

Tim Foerster · Answer 3 · Sun Jul 03 2016 15:37:23 GMT+0800 (China Standard Time)

Its also not as esay as you written @poettering, when your services shouln't be run as privileged user. Its possible to workaround it with sudo but its quite not the pretty way.

Wilhelm Schuster · Answer 4 · Sun Jul 03 2016 15:59:46 GMT+0800 (China Standard Time)

@dopykuh your particular usecase can be solved by prefixing the Exec-line with ! to run the command as privileged user. See #3493. This will be part of the next release.

Tim Foerster · Answer 5 · Mon Jul 04 2016 16:13:42 GMT+0800 (China Standard Time)

@wlhlm the magic character allows me to run the hole command as privileged user, but i don't know how to switch to the namespace. E.g. You need to launch a webserver within a named network namespace as unprivileged user. How could i do this? "ExecStart=/usr/bin ip netns exec nginx" isn't working and i expect "ExecStart=!/usr/bin ip netns exec nginx" will spawn the nginx as privileged user.

Kevin Landreth · Answer 6 · Wed Jul 20 2016 02:09:53 GMT+0800 (China Standard Time)

So far, the only working solution I have, which I feel circumvents systemd's enrichment, is
ExecStart=/sbin/ip netns exec examplens su -c '/usr/bin/nginx' -s /bin/sh www-data which I don't feel is all that elegant but it does do what I need for now. It would be really nice to have JoinsNamespaceOf= accept the namespace notation for non-systemd services. i.e. net:[4026532211] user:[4026531837]. Seems like a hack though as we need a dependent unit to create those namespaces as well.

Still experimenting but it seems like we might be able to do oneshot service with PrivateNetwork=true which we join in the networking device(s) into the created namespace. Mimicking the behavior of ip netns add

Ian Kelling · Answer 7 · Fri Dec 30 2016 16:53:04 GMT+0800 (China Standard Time)

@dopykuh your particular usecase can be solved by prefixing the Exec-line with ! to run the command as privileged user. See #3493. This will be part of the next release.

Note, the syntax changed from ! to +.

fabiocannizzo · Answer 8 · Thu Jan 19 2017 12:54:46 GMT+0800 (China Standard Time)

As shown in my post #5101 and as mentioned by @CrackerJackMack the ! magic does not address the use case where one or more services are to be run as unprivileged user in a separate network namespace, connected to the main host via a veth interface.

Fortunately a workaround exists, although very ugly: it requires combining in ExecStart both ip netns exec and sudo.

I believe there would be value in adding this feature, as it would allow an elegant and concise syntax to solve a legitimate use case, instead of being forced to achieve the same goal via an horrible workaround.

Ian Kelling · Answer 9 · Thu Jan 19 2017 18:15:50 GMT+0800 (China Standard Time)

As shown in my post #5101 and as mentioned by @CrackerJackMack the ! magic does not address the use case where one or more services are to be run as unprivileged user in a separate network namespace, connected to the main host via a veth interface.

I think you are wrong. If you ExecStartPre=+, you can call a script which sets up the veth interfaces yourself like ip does. For an example, see this script: https://iankelling.org/git/?p=newns;a=blob;f=newns;hb=HEAD

fabiocannizzo · Answer 10 · Thu Jan 19 2017 18:39:04 GMT+0800 (China Standard Time)

The problem is not setting up the namespace and its interfaces. They are already up and running. I just need to join the namespace. How do I get a process running as an unprivileged user to join that namespace?

My working but horrible solution is to cheat on the user then downgrade it:

User=root
ExecStart=/usr/bin/ip netns exec ns1 runuser -u myuser ip l

Using + does not address the issue, I still have to downgrade the user:

ExecStart=+/usr/bin/ip netns exec ns1 runuser -u myuser ip l

whereas it would be so much more elegant to write:

User=myuser
JoinNetworkNamespace=ns1
ExecStart=ip l

If there is a way to state this in a more elegant way, I will stand corrected.

For sake of clarity, the objectives here are:

ip l must display the interface of the namespace ns1
ip l must run as user myuser
myuser is a unprivileged user, which has no access to the command ip netns exec

Thanks for consideration anyway.

Ian Kelling · Answer 11 · Thu Jan 19 2017 18:59:55 GMT+0800 (China Standard Time)

The problem is not setting up the namespace and its interfaces. They are already up and running. I just need to join the namespace. How do I get a process running as an unprivileged user to join that namespace?

You let systemd create the namespace with PrivateNetwork=true, then you setup it's interfaces from within a ExecStart, or more likely an ExecStartPre, and then subsequent calls to Exec... will be in that namespace, and subsequent units join that namespace by using PrivateNetwork=true and JoinsNamespaceOf=. All the subesequent joins can be using unpriviledged users.

My script does not make the namespace be named like ip does, because the naming is not necessary, although it could have been done with some more work. For debugging and joining the namespace with a bash shell outside of systemd, I use nsenter -n -m -t $(pgrep PROCESS_IN_NAMESPACE) bash, so using ip is not needed. Note: if I knew how to easily ask systemd what pid a unit has, i would do that.

I added to the documentation of my script to say this same thing.

Ian Kelling · Answer 12 · Thu Jan 19 2017 19:12:19 GMT+0800 (China Standard Time)

ip l must display the interface of the namespace ns1
ip l must run as user myuser
myuser is a unprivileged user, which has no access to the command ip netns exec

To add to my previous comment up since I didn't read closely, these 3 things can be done using my suggestion.

fabiocannizzo · Answer 13 · Thu Jan 19 2017 19:43:16 GMT+0800 (China Standard Time)

@ian-kelling appreciate your suggestion. This is the same solution proposed by @CrackerJackMack. I am not considering this because it requires moving the generation of my namespace in a systemd unit, splitting my network configuration in various places.

Also, the services which are to join this namespace are not persistent (they come and go), so I am not sure if that would require some dummy sleeping service to be launched by ExecStart to keep the namespace alive.

Regardless, this seems to me even more of a workaround and less elegant than what I am doing at present.

Dmitrii Sutiagin · Answer 14 · Fri May 26 2017 07:34:13 GMT+0800 (China Standard Time)

Please reopen. Imho, "you can simply invoke "/usr/bin ip netns exec…" from the ExecStart=" is a bad solution, at least because the journald log in that case shows logs as coming from 'ip' executable, not the actual process in the namespace, which is a mess.

Having an ability to start a unit in a given pre-existing namespace is quite important in some cases.

Also, important here is ability to manage "named" namespaces, while what "PrivateNetwork" creates is not visible via "ip netns" tool, so it's not convenient to interact with such namespaces.

alexkingnz · Answer 15 · Sun Jun 04 2017 08:45:31 GMT+0800 (China Standard Time)

I have a use case similar to fabiocannizzo's. I'd like to run squid and a VPN in a nework namespace, but at any time either of them may not be running, so I need a persistent network namespace.

Perhaps this can be done with a unit to set up the namespace itself, then separate units for squid and openvpn?

Ideally the namespace would be named to allow convenient interaction on the command line via ip netns.

Dmitrii Sutiagin · Answer 16 · Sun Jun 04 2017 12:23:21 GMT+0800 (China Standard Time)

@alexkingnz if you need a namespace which is persistent as long as at least 1 service which needs it was started, you can just create a dummy unit with PrivateNetwork=true, and make other units use After, Requires, and JoinsNamespaceOf=. You can also make such unit auto-shutdown with "StopWhenUnneded=true". Alternatively, if you want a "named" namespace, visible via 'ip netns", check out https://github.com/f3flight/openconnect-ns, that might be helpful as a direction for your use case.

Still I'd prefer to have built-in named network namespace management in systemd, without the need to use ugly hacks like 'ip netns exec somenetns sudo -u someuser somecommand'

Kevin Landreth · Answer 17 · Sun Jun 04 2017 13:41:26 GMT+0800 (China Standard Time)

I know it's been a while since I've circled back to this but here is a ip netns compatible systemd namespace https://gist.github.com/CrackerJackMack/a620e9557bf6e015df540aa4e26510ff

I feel this might work perfectly for @alexkingnz request w/o the need to run things via ip netns exec ... . It mostly consists of a oneshot service workaround to capture the systemd network namespace it creates and makes an iproute2 compatible mount.

fabiocannizzo · Answer 18 · Sun Jun 04 2017 15:44:04 GMT+0800 (China Standard Time)

systemd is great for many things, but on named network namespace systems developers have taken a firm view they do not want to support them, neither in the service units nor in systemd-networkd.

I tried the workarounds mentioned in this thread, but eventually dropped them, in part because I do no like workarounds in general, part because they are inconvenient. For instance, if I want to run a command in the namespace from the cli it is difficult to do so.

So I simply dropped systemd-networkd for network interfaces creation and configuration, went back to old classics shell scripts and I am happier than ever. I still use systemd to coordinate these scripts tough, which is what systemd is good at, i.e. manage dependencies.

fabiocannizzo · Answer 19 · Sun Jun 04 2017 15:51:55 GMT+0800 (China Standard Time)

Forgot to mention, for those units which have to start in the namespace, I add 'ip netns' in the command line. If they need to start as a different user, I also use 'runas' in the command line.

Etienne Dechamps · Answer 20 · Mon Oct 16 2017 03:43:39 GMT+0800 (China Standard Time)

FWIW, I came up with a simpler, minimalistic version of @CrackerJackMack's solution above:

# netns@.service

[Unit]
Description=Named network namespace %I
Documentation=https://github.com/systemd/systemd/issues/2741#issuecomment-336736214
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes

# Ask systemd to create a network namespace
PrivateNetwork=yes

# Ask ip netns to create a named network namespace
# (This ensures that things like /var/run/netns are properly setup)
# (Why flock? See https://bugs.debian.org/949235)
ExecStart=/usr/bin/flock --no-fork -- /var/run/netns.lock /bin/ip netns add %I

# Drop the network namespace that ip netns just created
ExecStart=/bin/umount /var/run/netns/%I

# Re-use the same name for the network namespace that systemd put us in
ExecStart=/bin/mount --bind /proc/self/ns/net /var/run/netns/%I

# Clean up the name when we are done with the network namespace
ExecStop=/bin/ip netns delete %I

Then, when comes the need to run a service inside a named network namespace, simply add the following to the target service:

# some_unit.service

[Unit]
BindsTo=netns@foobar.service
After=netns@foobar.service
# Join the "foobar" named network namespace that netns@ created
JoinsNamespaceOf=netns@foobar.service

[Service]
PrivateNetwork=yes
# Your service is now running inside the "foobar" named network namespace!

This seems to work quite well, and it feels very "clean" because it doesn't require ugly hacks like ExecStart=/usr/bin/ip netns exec ... which, as many people pointed out in this thread, tends to cause more problems than it solves.

Zang MingJie · Answer 21 · Sun Oct 22 2017 02:55:42 GMT+0800 (China Standard Time)

@dechamps's solution works pretty well.

But if you want to override resolv.conf aka DNS server, add following line to your service file:

BindPaths=/etc/netns/<foobar>/resolv.conf:/etc/resolv.conf

Jamesits · Answer 22 · Thu Nov 02 2017 12:12:10 GMT+0800 (China Standard Time)

I rewrote @dechamps's service file into an easier-to-use project. It also takes care of automatic bridging or NAT to the outside network to achieve per-service routing.

https://github.com/Jamesits/systemd-named-netns

Erik Jensen · Answer 23 · Sun Aug 05 2018 10:06:20 GMT+0800 (China Standard Time)

What's the best way to have /sys reflect the new namespace, for services that need that?

Aidan Walton · Answer 24 · Tue Oct 30 2018 00:30:31 GMT+0800 (China Standard Time)

Hi,
Thanks for this, really helpful.
I added a few extras to dechamps original script. These help to build a more complete network and although rather scrappy I placed network configuration data into systemd/network/netns/ just as a holding space that seems like a logical place to store such data. I hope it will help when debugging network issue later. I'm using the standard linux bridge tools and hence brctl to configure bridges. The bridge service below assumes that you already have a bridge within your root net namespace and this should be referenced in the network config file.

I was forced to get this working to properly control a squeezebox server that refused to start-up without attaching itself to my public IP interface. By constraining it to a netns, I resolved the problem, and of course gained better control of the daemons traffic using iptables.

I took the approach of using:
netns@.service
to build the netns as above, but then added a second service:
netns-bridge@.service
that builds the network. This imports its required variables using: EnvironmentFile=/etc/systemd/network/netns/netns-%i-bridge.conf
Clearly you need to ensure this config file carries the correct netns name as defined within the service file below.

And to initiate the service stack I start:
squeezeboxserver_in_netns.service
This is responsible for nameing the netns instance. Care needs to be taken here as interface names are limited in length (16 chars I think) and they will be generated by appending this name to the interface name suffix as defined in the .conf file in ./network/netns/netns-[service-name]-bridge.conf. It works nicely and if the daemon dies it removes all the config.

BTW as a note on DNS resolution. It is not strictly necessary to use any BindPaths=
The problem really comes from the fact that net namespaces do not have visibility of the stub resolver address that systemd-resolved generates in the root net namespace. By default the host system will have a symlink to:
resolv.conf -> /run/systemd/resolve/stub-resolv.conf

This only lists the stub resolver address 127.0.0.53, which any other net namespace can not see.
However you can change this symlink to:
resolv.conf -> /run/systemd/resolve/resolv.conf
Then the net namespace can use the 'real' nameserver which has been defined typically by dhcp or some other mechanism such as bind9 daemon etc, then its address can be switched or routed to from within the net namespace. So long as you set up appropriate connectivity in the net namespace. Of course this then sidesteps the DNS masquerading done by systemd-resolvd, but it makes it very simple to get DNS working in the namespace by either sending DNS requests to your hosts DNS server, or routing them out to your provider/public DNS server according to whatever appears in /run/systemd/resolve/resolv.conf

See here is the code:
/etc/systemd/system/netns@.service

[Unit]
Description=Named network namespace %I
Documentation=https://github.com/systemd/systemd/issues/2741#issuecomment-336736214
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes

# Ask systemd to create a network namespace
PrivateNetwork=yes


# Ask ip netns to create a named network namespace
# (This ensures that things like /var/run/netns are properly setup)
ExecStart=/sbin/ip netns add %I

# Drop the network namespace that ip netns just created
ExecStart=/bin/umount /var/run/netns/%I

# Re-use the same name for the network namespace that systemd put us in
ExecStart=/bin/mount --bind /proc/self/ns/net /var/run/netns/%I

# Clean up the name when we are done with the network namespace
ExecStop=/sbin/ip netns delete %I

/etc/systemd/system/netns-bridge@.service

[Unit]
Description=Named network namespace "%i" network config
BindsTo=netns@%i.service
After=netns@%i.service
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/systemd/network/netns/netns-%i-bridge.conf

# Configure network
ExecStart=/sbin/ip link add ${ROOT_NS_INTF_NAME}-%i type veth peer name ${IN_NS_INTF_NAME}-%i
ExecStart=/sbin/ip link set ${IN_NS_INTF_NAME}-%i netns %i
ExecStart=/sbin/ip link set ${ROOT_NS_INTF_NAME}-%i up
ExecStart=/sbin/brctl addif ${ROOT_NS_BRIDGE} ${ROOT_NS_INTF_NAME}-%i
ExecStart=/sbin/ip netns exec %i ip addr add ${IN_NS_INTF_IP} dev ${IN_NS_INTF_NAME}-%i
ExecStart=/sbin/ip netns exec %i ip link set ${IN_NS_INTF_NAME}-%i up
ExecStart=/sbin/ip netns exec %i ip link set lo up
ExecStart=/sbin/ip netns exec %i ip route add default via ${IN_NS_DEFAULT_GW}

# Clean up when we are done with the network namespace
ExecStop=/sbin/ip link del veth0-%i

/etc/systemd/system/squeezeboxserver_in_netns.service

[Unit]
# Ensure network is configured
BindsTo=netns-bridge@squeeze.service
After=netns-bridge@squeeze.service

# Join the "squeeze" named network namespace that netns@ created
JoinsNamespaceOf=netns@squeeze.service

[Service]
PrivateNetwork=yes
# Your service is now running inside the "squeeze" named network namespace!

ExecStart=/bin/bash /usr/sbin/squeezeboxserver_safe /usr/sbin/squeezeboxserver --prefsdir /var/lib/squeezeboxserver/prefs

/etc/systemd/network/netns/netns-squeeze-bridge.conf

#Bridge configuration for test netns

IN_NS_INTF_NAME=veth1
IN_NS_INTF_IP=10.0.0.11/24
IN_NS_DEFAULT_GW=10.0.0.10
ROOT_NS_INTF_NAME=veth0
ROOT_NS_BRIDGE=int_br0

Etienne Dechamps · Answer 25 · Tue Oct 30 2018 02:08:44 GMT+0800 (China Standard Time)

@aidyw: I fully approve of your approach of layering a "configuration" service on top of the netns service. In fact, I've been using an almost identical approach with my own services (the only real difference is that I used ip link type veth instead of brctl, but I guess that depends on the use case). It makes things very clear in terms of systemd dependencies and it scales well in my experience. Thank you for documenting it :)

Etienne Dechamps · Answer 26 · Sun Jan 19 2020 01:29:02 GMT+0800 (China Standard Time)

FYI, I just found a race condition in ip netns add that can result in mount point havoc (such as /proc/self/mountinfo explosion that can even slow the entire system down) if ip netns add is run for the first time from multiple processes simultaneously. If you're using systemd units to run ip netns add commands during system boot, you are especially likely to hit that race condition. See Debian Bug 949235 for details.

For now I recommend the following flock(1) workaround for those using my original unit file from #2741 (comment):

ExecStart=/usr/bin/flock --no-fork -- /var/run/netns.lock /bin/ip -details -statistics -statistics netns add %i

Daniel Farina · Answer 27 · Tue Feb 28 2023 07:54:00 GMT+0800 (China Standard Time)

> # Ask systemd to create a network namespace
> PrivateNetwork=yes
> 
> 
> # Ask ip netns to create a named network namespace
> # (This ensures that things like /var/run/netns are properly setup)
> ExecStart=/sbin/ip netns add %I

Are both of these things necessary? If the unit is creating a network namespace, does it matter so much what its network namespace is? (the one executing ip netns add)?

Wondering if I'm misunderstanding something.

Jamesits · Answer 28 · Tue Feb 28 2023 09:02:51 GMT+0800 (China Standard Time)

@fdr It matters when you want to access the exact network namespace outside the unit. Some examples:

You want to make multiple services share a network namespaces (but not necessary other type of namespaces)
You want to manually execute a program inside the service's network namespace (so you will need to know the exact name of the namespace)

However it is 2023 now and the original feature requested can be fulfilled with NetworkNamespacePath directive, so most workarounds we wrote at that time is now obsolete.

Daniel Farina · Answer 29 · Tue Feb 28 2023 09:04:31 GMT+0800 (China Standard Time)

Nice, I was wondering if something happened. Thanks linking me.

Marek Küthe · Answer 30 · Tue May 23 2023 15:51:12 GMT+0800 (China Standard Time)

Maybe this should be closed as "Not planned"? Since the problem seems not to have been solved.

Etienne Dechamps · Answer 31 · Wed Aug 02 2023 04:31:21 GMT+0800 (China Standard Time)

Heads up: I noticed that the workaround described in #2741 (comment) stopped working when upgrading systemd from 253 to 254. The unit will fail to start with the following error:

umount: /run/netns/XXX: not mounted.

This is because of #26458 and more specifically c2da3bf.

This can be fixed by adding PrivateMounts=no to the unit file.

As @Jamesits pointed out in #2741 (comment), this is likely better fixed by migrating to NetworkNamespacePath=. I just felt like providing a quick fix for people who have not refactored their configs to use it yet.