systemd / systemd

The systemd System and Service Manager

Home Page:https://systemd.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Access mounted (ip netns ...) network namespaces.

tonobo opened this issue · comments

Submission type

Request for enhancement (RFE)

systemd version the issue has been seen with

229

Used distribution

Ubuntu 16.04 Xenial

Is there any possibilty to join a named network namespace. I'd like to create units with PrivateNetwork=my_custom_namespace. There are some cases where is would be able to join a namespace which is not created or handled by systemd.

I can see why people want this, but I must say I really don't like the concept of this, and I am not sure I want to see this supported natively.

However, note that you can simply invoke "/usr/bin ip netns exec…" from the ExecStart= line of a service. Hence I think there's really no need to support this directly in systemd, as you can easily do this anyway without such support.

I hope this makes sense.

Closing.

One possible use case is to run distribution defined services in a namespace which requires complex setup.

Having a separate option allows to run existing service definitions without worrying about updates changing the distribution-defined ExecStart= lines.

Its also not as esay as you written @poettering, when your services shouln't be run as privileged user. Its possible to workaround it with sudo but its quite not the pretty way.

@dopykuh your particular usecase can be solved by prefixing the Exec-line with ! to run the command as privileged user. See #3493. This will be part of the next release.

@wlhlm the magic character allows me to run the hole command as privileged user, but i don't know how to switch to the namespace. E.g. You need to launch a webserver within a named network namespace as unprivileged user. How could i do this? "ExecStart=/usr/bin ip netns exec nginx" isn't working and i expect "ExecStart=!/usr/bin ip netns exec nginx" will spawn the nginx as privileged user.

So far, the only working solution I have, which I feel circumvents systemd's enrichment, is
ExecStart=/sbin/ip netns exec examplens su -c '/usr/bin/nginx' -s /bin/sh www-data which I don't feel is all that elegant but it does do what I need for now. It would be really nice to have JoinsNamespaceOf= accept the namespace notation for non-systemd services. i.e. net:[4026532211] user:[4026531837]. Seems like a hack though as we need a dependent unit to create those namespaces as well.

Still experimenting but it seems like we might be able to do oneshot service with PrivateNetwork=true which we join in the networking device(s) into the created namespace. Mimicking the behavior of ip netns add

@dopykuh your particular usecase can be solved by prefixing the Exec-line with ! to run the command as privileged user. See #3493. This will be part of the next release.

Note, the syntax changed from ! to +.

As shown in my post #5101 and as mentioned by @CrackerJackMack the ! magic does not address the use case where one or more services are to be run as unprivileged user in a separate network namespace, connected to the main host via a veth interface.

Fortunately a workaround exists, although very ugly: it requires combining in ExecStart both ip netns exec and sudo.

I believe there would be value in adding this feature, as it would allow an elegant and concise syntax to solve a legitimate use case, instead of being forced to achieve the same goal via an horrible workaround.

As shown in my post #5101 and as mentioned by @CrackerJackMack the ! magic does not address the use case where one or more services are to be run as unprivileged user in a separate network namespace, connected to the main host via a veth interface.

I think you are wrong. If you ExecStartPre=+, you can call a script which sets up the veth interfaces yourself like ip does. For an example, see this script: https://iankelling.org/git/?p=newns;a=blob;f=newns;hb=HEAD

The problem is not setting up the namespace and its interfaces. They are already up and running. I just need to join the namespace. How do I get a process running as an unprivileged user to join that namespace?

My working but horrible solution is to cheat on the user then downgrade it:

User=root
ExecStart=/usr/bin/ip netns exec ns1 runuser -u myuser ip l

Using + does not address the issue, I still have to downgrade the user:

ExecStart=+/usr/bin/ip netns exec ns1 runuser -u myuser ip l

whereas it would be so much more elegant to write:

User=myuser
JoinNetworkNamespace=ns1
ExecStart=ip l

If there is a way to state this in a more elegant way, I will stand corrected.

For sake of clarity, the objectives here are:

  1. ip l must display the interface of the namespace ns1
  2. ip l must run as user myuser
  3. myuser is a unprivileged user, which has no access to the command ip netns exec

Thanks for consideration anyway.

The problem is not setting up the namespace and its interfaces. They are already up and running. I just need to join the namespace. How do I get a process running as an unprivileged user to join that namespace?

You let systemd create the namespace with PrivateNetwork=true, then you setup it's interfaces from within a ExecStart, or more likely an ExecStartPre, and then subsequent calls to Exec... will be in that namespace, and subsequent units join that namespace by using PrivateNetwork=true and JoinsNamespaceOf=. All the subesequent joins can be using unpriviledged users.

My script does not make the namespace be named like ip does, because the naming is not necessary, although it could have been done with some more work. For debugging and joining the namespace with a bash shell outside of systemd, I use nsenter -n -m -t $(pgrep PROCESS_IN_NAMESPACE) bash, so using ip is not needed. Note: if I knew how to easily ask systemd what pid a unit has, i would do that.

I added to the documentation of my script to say this same thing.

ip l must display the interface of the namespace ns1
ip l must run as user myuser
myuser is a unprivileged user, which has no access to the command ip netns exec

To add to my previous comment up since I didn't read closely, these 3 things can be done using my suggestion.

@ian-kelling appreciate your suggestion. This is the same solution proposed by @CrackerJackMack. I am not considering this because it requires moving the generation of my namespace in a systemd unit, splitting my network configuration in various places.

Also, the services which are to join this namespace are not persistent (they come and go), so I am not sure if that would require some dummy sleeping service to be launched by ExecStart to keep the namespace alive.

Regardless, this seems to me even more of a workaround and less elegant than what I am doing at present.

Please reopen. Imho, "you can simply invoke "/usr/bin ip netns exec…" from the ExecStart=" is a bad solution, at least because the journald log in that case shows logs as coming from 'ip' executable, not the actual process in the namespace, which is a mess.

Having an ability to start a unit in a given pre-existing namespace is quite important in some cases.

Also, important here is ability to manage "named" namespaces, while what "PrivateNetwork" creates is not visible via "ip netns" tool, so it's not convenient to interact with such namespaces.

I have a use case similar to fabiocannizzo's. I'd like to run squid and a VPN in a nework namespace, but at any time either of them may not be running, so I need a persistent network namespace.

Perhaps this can be done with a unit to set up the namespace itself, then separate units for squid and openvpn?

Ideally the namespace would be named to allow convenient interaction on the command line via ip netns.

@alexkingnz if you need a namespace which is persistent as long as at least 1 service which needs it was started, you can just create a dummy unit with PrivateNetwork=true, and make other units use After, Requires, and JoinsNamespaceOf=. You can also make such unit auto-shutdown with "StopWhenUnneded=true". Alternatively, if you want a "named" namespace, visible via 'ip netns", check out https://github.com/f3flight/openconnect-ns, that might be helpful as a direction for your use case.

Still I'd prefer to have built-in named network namespace management in systemd, without the need to use ugly hacks like 'ip netns exec somenetns sudo -u someuser somecommand'

I know it's been a while since I've circled back to this but here is a ip netns compatible systemd namespace https://gist.github.com/CrackerJackMack/a620e9557bf6e015df540aa4e26510ff

I feel this might work perfectly for @alexkingnz request w/o the need to run things via ip netns exec ... . It mostly consists of a oneshot service workaround to capture the systemd network namespace it creates and makes an iproute2 compatible mount.

systemd is great for many things, but on named network namespace systems developers have taken a firm view they do not want to support them, neither in the service units nor in systemd-networkd.

I tried the workarounds mentioned in this thread, but eventually dropped them, in part because I do no like workarounds in general, part because they are inconvenient. For instance, if I want to run a command in the namespace from the cli it is difficult to do so.

So I simply dropped systemd-networkd for network interfaces creation and configuration, went back to old classics shell scripts and I am happier than ever. I still use systemd to coordinate these scripts tough, which is what systemd is good at, i.e. manage dependencies.

Forgot to mention, for those units which have to start in the namespace, I add 'ip netns' in the command line. If they need to start as a different user, I also use 'runas' in the command line.

FWIW, I came up with a simpler, minimalistic version of @CrackerJackMack's solution above:

# netns@.service

[Unit]
Description=Named network namespace %I
Documentation=https://github.com/systemd/systemd/issues/2741#issuecomment-336736214
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes

# Ask systemd to create a network namespace
PrivateNetwork=yes

# Ask ip netns to create a named network namespace
# (This ensures that things like /var/run/netns are properly setup)
# (Why flock? See https://bugs.debian.org/949235)
ExecStart=/usr/bin/flock --no-fork -- /var/run/netns.lock /bin/ip netns add %I

# Drop the network namespace that ip netns just created
ExecStart=/bin/umount /var/run/netns/%I

# Re-use the same name for the network namespace that systemd put us in
ExecStart=/bin/mount --bind /proc/self/ns/net /var/run/netns/%I

# Clean up the name when we are done with the network namespace
ExecStop=/bin/ip netns delete %I

Then, when comes the need to run a service inside a named network namespace, simply add the following to the target service:

# some_unit.service

[Unit]
BindsTo=netns@foobar.service
After=netns@foobar.service
# Join the "foobar" named network namespace that netns@ created
JoinsNamespaceOf=netns@foobar.service

[Service]
PrivateNetwork=yes
# Your service is now running inside the "foobar" named network namespace!

This seems to work quite well, and it feels very "clean" because it doesn't require ugly hacks like ExecStart=/usr/bin/ip netns exec ... which, as many people pointed out in this thread, tends to cause more problems than it solves.

@dechamps's solution works pretty well.

But if you want to override resolv.conf aka DNS server, add following line to your service file:

BindPaths=/etc/netns/<foobar>/resolv.conf:/etc/resolv.conf

I rewrote @dechamps's service file into an easier-to-use project. It also takes care of automatic bridging or NAT to the outside network to achieve per-service routing.

https://github.com/Jamesits/systemd-named-netns

What's the best way to have /sys reflect the new namespace, for services that need that?

Hi,
Thanks for this, really helpful.
I added a few extras to dechamps original script. These help to build a more complete network and although rather scrappy I placed network configuration data into systemd/network/netns/ just as a holding space that seems like a logical place to store such data. I hope it will help when debugging network issue later. I'm using the standard linux bridge tools and hence brctl to configure bridges. The bridge service below assumes that you already have a bridge within your root net namespace and this should be referenced in the network config file.

I was forced to get this working to properly control a squeezebox server that refused to start-up without attaching itself to my public IP interface. By constraining it to a netns, I resolved the problem, and of course gained better control of the daemons traffic using iptables.

I took the approach of using:
netns@.service
to build the netns as above, but then added a second service:
netns-bridge@.service
that builds the network. This imports its required variables using: EnvironmentFile=/etc/systemd/network/netns/netns-%i-bridge.conf
Clearly you need to ensure this config file carries the correct netns name as defined within the service file below.

And to initiate the service stack I start:
squeezeboxserver_in_netns.service
This is responsible for nameing the netns instance. Care needs to be taken here as interface names are limited in length (16 chars I think) and they will be generated by appending this name to the interface name suffix as defined in the .conf file in ./network/netns/netns-[service-name]-bridge.conf. It works nicely and if the daemon dies it removes all the config.

BTW as a note on DNS resolution. It is not strictly necessary to use any BindPaths=
The problem really comes from the fact that net namespaces do not have visibility of the stub resolver address that systemd-resolved generates in the root net namespace. By default the host system will have a symlink to:
resolv.conf -> /run/systemd/resolve/stub-resolv.conf

This only lists the stub resolver address 127.0.0.53, which any other net namespace can not see.
However you can change this symlink to:
resolv.conf -> /run/systemd/resolve/resolv.conf
Then the net namespace can use the 'real' nameserver which has been defined typically by dhcp or some other mechanism such as bind9 daemon etc, then its address can be switched or routed to from within the net namespace. So long as you set up appropriate connectivity in the net namespace. Of course this then sidesteps the DNS masquerading done by systemd-resolvd, but it makes it very simple to get DNS working in the namespace by either sending DNS requests to your hosts DNS server, or routing them out to your provider/public DNS server according to whatever appears in /run/systemd/resolve/resolv.conf

See here is the code:
/etc/systemd/system/netns@.service

[Unit]
Description=Named network namespace %I
Documentation=https://github.com/systemd/systemd/issues/2741#issuecomment-336736214
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes

# Ask systemd to create a network namespace
PrivateNetwork=yes


# Ask ip netns to create a named network namespace
# (This ensures that things like /var/run/netns are properly setup)
ExecStart=/sbin/ip netns add %I

# Drop the network namespace that ip netns just created
ExecStart=/bin/umount /var/run/netns/%I

# Re-use the same name for the network namespace that systemd put us in
ExecStart=/bin/mount --bind /proc/self/ns/net /var/run/netns/%I

# Clean up the name when we are done with the network namespace
ExecStop=/sbin/ip netns delete %I

/etc/systemd/system/netns-bridge@.service

[Unit]
Description=Named network namespace "%i" network config
BindsTo=netns@%i.service
After=netns@%i.service
StopWhenUnneeded=true

[Service]
Type=oneshot
RemainAfterExit=yes
EnvironmentFile=/etc/systemd/network/netns/netns-%i-bridge.conf

# Configure network
ExecStart=/sbin/ip link add ${ROOT_NS_INTF_NAME}-%i type veth peer name ${IN_NS_INTF_NAME}-%i
ExecStart=/sbin/ip link set ${IN_NS_INTF_NAME}-%i netns %i
ExecStart=/sbin/ip link set ${ROOT_NS_INTF_NAME}-%i up
ExecStart=/sbin/brctl addif ${ROOT_NS_BRIDGE} ${ROOT_NS_INTF_NAME}-%i
ExecStart=/sbin/ip netns exec %i ip addr add ${IN_NS_INTF_IP} dev ${IN_NS_INTF_NAME}-%i
ExecStart=/sbin/ip netns exec %i ip link set ${IN_NS_INTF_NAME}-%i up
ExecStart=/sbin/ip netns exec %i ip link set lo up
ExecStart=/sbin/ip netns exec %i ip route add default via ${IN_NS_DEFAULT_GW}

# Clean up when we are done with the network namespace
ExecStop=/sbin/ip link del veth0-%i

/etc/systemd/system/squeezeboxserver_in_netns.service

[Unit]
# Ensure network is configured
BindsTo=netns-bridge@squeeze.service
After=netns-bridge@squeeze.service

# Join the "squeeze" named network namespace that netns@ created
JoinsNamespaceOf=netns@squeeze.service

[Service]
PrivateNetwork=yes
# Your service is now running inside the "squeeze" named network namespace!

ExecStart=/bin/bash /usr/sbin/squeezeboxserver_safe /usr/sbin/squeezeboxserver --prefsdir /var/lib/squeezeboxserver/prefs

/etc/systemd/network/netns/netns-squeeze-bridge.conf

#Bridge configuration for test netns

IN_NS_INTF_NAME=veth1
IN_NS_INTF_IP=10.0.0.11/24
IN_NS_DEFAULT_GW=10.0.0.10
ROOT_NS_INTF_NAME=veth0
ROOT_NS_BRIDGE=int_br0

@aidyw: I fully approve of your approach of layering a "configuration" service on top of the netns service. In fact, I've been using an almost identical approach with my own services (the only real difference is that I used ip link type veth instead of brctl, but I guess that depends on the use case). It makes things very clear in terms of systemd dependencies and it scales well in my experience. Thank you for documenting it :)

FYI, I just found a race condition in ip netns add that can result in mount point havoc (such as /proc/self/mountinfo explosion that can even slow the entire system down) if ip netns add is run for the first time from multiple processes simultaneously. If you're using systemd units to run ip netns add commands during system boot, you are especially likely to hit that race condition. See Debian Bug 949235 for details.

For now I recommend the following flock(1) workaround for those using my original unit file from #2741 (comment):

ExecStart=/usr/bin/flock --no-fork -- /var/run/netns.lock /bin/ip -details -statistics -statistics netns add %i
> # Ask systemd to create a network namespace
> PrivateNetwork=yes
> 
> 
> # Ask ip netns to create a named network namespace
> # (This ensures that things like /var/run/netns are properly setup)
> ExecStart=/sbin/ip netns add %I

Are both of these things necessary? If the unit is creating a network namespace, does it matter so much what its network namespace is? (the one executing ip netns add)?

Wondering if I'm misunderstanding something.

@fdr It matters when you want to access the exact network namespace outside the unit. Some examples:

  • You want to make multiple services share a network namespaces (but not necessary other type of namespaces)
  • You want to manually execute a program inside the service's network namespace (so you will need to know the exact name of the namespace)

However it is 2023 now and the original feature requested can be fulfilled with NetworkNamespacePath directive, so most workarounds we wrote at that time is now obsolete.

Nice, I was wondering if something happened. Thanks linking me.

Maybe this should be closed as "Not planned"? Since the problem seems not to have been solved.

Heads up: I noticed that the workaround described in #2741 (comment) stopped working when upgrading systemd from 253 to 254. The unit will fail to start with the following error:

umount: /run/netns/XXX: not mounted.

This is because of #26458 and more specifically c2da3bf.

This can be fixed by adding PrivateMounts=no to the unit file.

As @Jamesits pointed out in #2741 (comment), this is likely better fixed by migrating to NetworkNamespacePath=. I just felt like providing a quick fix for people who have not refactored their configs to use it yet.