I run a lot of LXC containers. After rebooting for the first time in a while (so I'm not sure when this behaviour was introduced), onto sys-apps/openrc-0.21.7, I noticed that the app-backup/bacula client didn't start on the host, although it did start in each container. This seems to be because start-stop-daemon considers the service already running if it can see any process with a matching command line, regardless of whether it is running in the current system or in a container. Here's what the init script prints on the host: ~ # /etc/init.d/bacula-fd status * status: stopped ~ # /etc/init.d/bacula-fd start * Starting bacula file daemon ... * start-stop-daemon: /usr/sbin/bacula-fd is already running * ERROR: bacula-fd failed to start ~ # /etc/init.d/bacula-fd stop * WARNING: bacula-fd is already stopped So the RC system knows the correct status of the service, it's just start-stop-daemon that's being difficult. Here's what start-stop-daemon prints when run manually on the host: ~ # start-stop-daemon --test --verbose --start --exec /usr/sbin/bacula-fd -- -u root -g bacula -c /etc/bacula/bacula-fd.conf * Would send signal 0 to PID 20512 * Would send signal 0 to PID 18722 [...] * start-stop-daemon: /usr/sbin/bacula-fd is already running With those enumerated processes being the backup clients running in each container: ~ # ls -l /proc/20512/exe lrwxrwxrwx 1 root root 0 24 sep 18.15 /proc/20512/exe -> /usr/sbin/bacula-fd ~ # ls -l /proc/20512/ns/pid lrwxrwxrwx 1 root root 0 24 sep 18.16 /proc/20512/ns/pid -> 'pid:[4026533774]' ~ # ls -l /proc/18722/exe lrwxrwxrwx 1 root root 0 24 sep 18.24 /proc/18722/exe -> /usr/sbin/bacula-fd ~ # ls -l /proc/18722/ns/pid lrwxrwxrwx 1 root root 0 24 sep 18.16 /proc/18722/ns/pid -> 'pid:[4026532611]' Changing the command line slightly makes start-stop-daemon no longer consider the service to be running: ~ # start-stop-daemon --test --verbose --start --exec /usr/sbin/bacula-fd -- -u root -g bacula -c /etc/bacula/bacula-fd.conf2 * Would start /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf2 Reproducible: Always Steps to Reproduce: 1. In one root shell, run: unshare --pid sleep 17 2. In another root shell, run: start-stop-daemon --verbose sleep -- 17 Actual Results: start-stop-daemon prints: * Sending signal 0 to PID 20698 ... [ ok ] * start-stop-daemon: sleep is already running Expected Results: start-stop-daemon should ignore the process running in the other PID namespace and start its own copy.
Hi Tom, this was reported as a bug in start-stop-daemon. However, I took a look at the bacula init scripts, and it looks like they should be updated to use the default start/stop functions in OpenRC. Can you please take a look at man openrc-run and fix the scripts? I will also assist where I can, so feel free to ask questions. Thanks. William
The first things I see are: - you probably can remove your start and stop functions and just use the defaults depending on how you set up the variables. - You should not use wildcards in the name of the pid file.
I am adding the updated service scripts to this bug. I know the pidfile setting is not correct. That will need to be set to the correct path, and I am not sure what that should be since I do not use bacula.
Created attachment 450830 [details] bacula-dir.initd
Created attachment 450832 [details] bacula-fd.initd
Created attachment 450834 [details] bacula-sd.initd
(In reply to William Hubbs from comment #2) > The first things I see are: > > - you probably can remove your start and stop functions and just use the > defaults depending on how you set up the variables. > > - You should not use wildcards in the name of the pid file. I double checked the wildcard problem and found the following: - bacula allows the user to start its daemons a multiple of times - if needed with different configurations. - one of the configuration option is a portnumber on which the daemon can reached. - That portnumber gets part of the pid-file name. - to catch all possible port numbers the former ebuild author used the xx.*.pid syntax I see the following possibility to fix that: Do a standard configuration with a hard coded fixed portnumber in config and initd file. Add a description how to add more than one bacula instance (by copying, renaming and adapting the relevant configuration files. The solution misses some flexibility but covers the standard use case for most users. Who wants to have more than one bacula running has to do some more work at all. The readme file should give them an idea how to proceed. So I will go that route for now.
(In reply to Karl-Johan Karlsson from comment #0) Can you please provide some more information about the problem? - What version of bacula do you run? - Do you use the standard installation in each container and the host or do you adopt the config files sand start-up scripts to allow multiple bacula daemons running in parallel? Thanks for the help.
(In reply to Thomas Beierlein from comment #8) > (In reply to Karl-Johan Karlsson from comment #0) > Can you please provide some more information about the problem? I still think start-stop-daemon does things horribly wrong in the presence of containers, but anyway: > - What version of bacula do you run? app-backup/bacula-7.4.4, built like this on the container running the Bacula director and storage daemon: USE="-X acl -bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 -libressl -logwatch -mysql postgres -qt4 readline -sqlite ssl -static -tcpd -vim-syntax" ABI_X86="64" and like this on the host and all the other containers: USE="-X acl bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 -libressl -logwatch -mysql -postgres -qt4 readline -sqlite ssl -static -tcpd -vim-syntax" ABI_X86="64" > - Do you use the standard installation in each container and the host or do > you adopt the config files sand start-up scripts to allow multiple bacula > daemons running in parallel? All standard. I have not changed either /etc/conf.d/bacula-* or /etc/init.d/bacula-* anywhere.
(In reply to Karl-Johan Karlsson from comment #9) > (In reply to Thomas Beierlein from comment #8) > > (In reply to Karl-Johan Karlsson from comment #0) > > Can you please provide some more information about the problem? > > I still think start-stop-daemon does things horribly wrong in the presence > of containers, but anyway: > Yes, that may be so. But as we need to sort out things also for bacula as a whole let us try that first. > > - What version of bacula do you run? > > app-backup/bacula-7.4.4, built like this on the container running the Bacula > director and storage daemon: > > USE="-X acl -bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 > -libressl -logwatch -mysql postgres -qt4 readline -sqlite ssl -static -tcpd > -vim-syntax" ABI_X86="64" > > and like this on the host and all the other containers: > > USE="-X acl bacula-clientonly -bacula-nodir -bacula-nosd -examples ipv6 > -libressl -logwatch -mysql -postgres -qt4 readline -sqlite ssl -static -tcpd > -vim-syntax" ABI_X86="64" > Ok, I see. you have one specialised container with the whole bacula machinery (director and storage daemon) and from there you back up all containers and the host. So there is a bacula-fd running in each container. As I am not quite familiar with the use of containers myself - how do you address the different file daemons from the central director. Do they get different ip addresses? > > - Do you use the standard installation in each container and the host or do > > you adopt the config files sand start-up scripts to allow multiple bacula > > daemons running in parallel? > > All standard. I have not changed either /etc/conf.d/bacula-* or > /etc/init.d/bacula-* anywhere. Ok. From my point of view it looks as if the containers are isolated from each other but are shining through onto the host system. So you can have the same file daemon running on each container, who sees only his own files. But the host see his own files and the files from the container(s). What is not quite clear to me atm is how start-stop-daemon checks if a daemon is already running. Maybe WilliamH can comment here.
Well Karl-Johan, could you please do a test for me? Download the bacula-fd.initd from attachment 450832 [details] and replace the pidfile line with pidfile=/var/run/bacula-fd.9102.pid Try what happens if you just replace the init file in the host with that file. If it works please replace at least one of the init files in a container with that file too. Please be aware that I am away for the rest of the week and will be back not before weekend. Thanks for the help.
(In reply to Thomas Beierlein from comment #10) > Ok, I see. you have one specialised container with the whole bacula > machinery (director and storage daemon) and from there you back up all > containers and the host. So there is a bacula-fd running in each container. Correct. Each container is in fact its own minimal Gentoo system, running one "real" service (e.g. an Apache, or the Bacula SD and DIR) plus a few maintenance services (Bacula FD, Syslog, Salt...). > As I am not quite familiar with the use of containers myself - how do you > address the different file daemons from the central director. Do they get > different ip addresses? Yes. Each container has a veth interface connected to a software bridge, which is also connected to one of the host's physical interfaces. Each container then gets its own IPv4 and IPv6 address. Traffic between containers, and between host and container, go through the bridge. > Ok. From my point of view it looks as if the containers are isolated from > each other but are shining through onto the host system. So you can have the > same file daemon running on each container, who sees only his own files. But > the host see his own files and the files from the container(s). Correct. You may look at it like an augmented chroot(). Processes running inside a chroot() see only their own files, but processes on the outside can see everything. Containers give you similar facilities for processes, users, network connections, etc. Here's how the process tree looks from the outside, in the host: # ps -A --forest -o pid,user,command | grep bacula 4708 root /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf 4460 root \_ grep --colour=auto bacula 5809 root \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf 20396 root \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf 21175 root \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf [...] 6864 root /usr/bin/lxc-start -l WARN -n bacula -f /export/lxc/bacula/config -d -o /var/log/lxc/bacula.log 18646 root \_ /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf 18671 root \_ /usr/sbin/bacula-sd -u root -g bacula -c /etc/bacula/bacula-sd.conf 29092 root \_ /usr/sbin/bacula-dir -u root -g bacula -c /etc/bacula/bacula-dir.conf The lines are: 1: The host's bacula-fd. 2: The grep command. 3-5: Individual containers' bacula-fd:s. 6: A lot more identical bacula-fd:s omitted. 7: The LXC container management system master process for the container called "bacula", which runs my Bacula servers. 8: The Bacula container's bacula-fd. 9: bacula-sd, running in the container "bacula". 10: bacula-dir, running in the container "bacula". From inside the Bacula server container, the process tree looks like this: # ps -A --forest -o pid,user,command | grep bacula 6987 root \_ grep --colour=auto bacula 610 root /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf 631 root /usr/sbin/bacula-sd -u root -g bacula -c /etc/bacula/bacula-sd.conf 10378 root /usr/sbin/bacula-dir -u root -g bacula -c /etc/bacula/bacula-dir.conf These are the same processes as lines 8-10 from the outside, but with different PID:s on the inside. The bacula-fd processes running in other containers are not visible. > What is not quite clear to me atm is how start-stop-daemon checks if a > daemon is already running. Maybe WilliamH can comment here. It looks like simple command line comparison. Not even the running user is taken into account: If I run, as a regular user: sleep 17 and in another terminal, as root: start-stop-daemon --verbose sleep -- 17 start-stop-daemon claims that the process is already running: * Sending signal 0 to PID 13658 ... [ ok ] * start-stop-daemon: sleep is already running (In reply to Thomas Beierlein from comment #10) > Well Karl-Johan, could you please do a test for me? Sure, but probably not today.
(In reply to Thomas Beierlein from comment #11) > Well Karl-Johan, could you please do a test for me? > > Download the bacula-fd.initd from attachment 450832 [details] and replace > the pidfile line with > > pidfile=/var/run/bacula-fd.9102.pid > > Try what happens if you just replace the init file in the host with that > file. It seems to work just fine. Here's the standard script, still failing: # ./bacula-fd stop * Stopping bacula file daemon ... [ ok ] # ./bacula-fd start * Caching service dependencies ... [ ok ] * Starting bacula file daemon ... * start-stop-daemon: /usr/sbin/bacula-fd is already running [ !! ] And here's with the script from the attachment above: # cp bacula-fd.bugzilla bacula-fd # ./bacula-fd status * status: stopped # ./bacula-fd start * Caching service dependencies ... [ ok ] * Starting bacula-fd ... [ ok ] The process is running, with the correct arguments: # ps -A -o pid,user,command | grep $(cat /var/run/bacula-fd.9102.pid) 27226 root /usr/sbin/bacula-fd -u root -g bacula -c /etc/bacula/bacula-fd.conf The director is happy: *status client=xyz-fd Connecting to Client xyz-fd at xyz:9102 xyz-fd Version: 7.4.4 (20 September 2016) x86_64-pc-linux-gnu gentoo Daemon started 30-okt-16 10:46. Jobs: run=0 running=0. And the process stops when asked to: # ./bacula-fd stop * Stopping bacula-fd ... [ ok ] # cat /var/run/bacula-fd.9102.pid cat: /var/run/bacula-fd.9102.pid: No such file or directory # ps -p 27226 PID TTY TIME CMD # > If it works please replace at least one of the init files in a container > with that file too. That works too; starting, checking status, and stopping.
Very well Karl-Johan, thanks for the test report (and also for the inforamtion about the containers before). I will now prepare a 7.4.4-r1 in the evening with fixes for the problem.
Fixed in 7.4.4-r1. > app-backup/bacula: Update init.d service scripts bug #595044 > and fix slot operator for dev-db/postgresql bug #597666