Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

td-agent chef recipe fails during bootstrap due to systemd conflict #136

Open
niclan opened this issue Jan 2, 2020 · 1 comment
Open

Comments

@niclan
Copy link

niclan commented Jan 2, 2020

Td-agent cookbook 3.1.1, chef-client 14.13.11, CentOS 7.7.1908; fresh install, systemd 219-67.el7_7.2. Installs td-agent 3.5.1

When bootstrapping a node into chef with a command such as this:

knife bootstrap -u root -t rhel7-omnitruck u89-niclangf-01.int.vgnett.no -r "role[vgnett_base]"

the resource startup fails due to a systemd error:

[2020-01-02T08:56:14+01:00] FATAL: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
[2020-01-02T08:56:14+01:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2020-01-02T08:56:14+01:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: service[td-agent] (td-agent::configure line 77) had an error: Mixlib::ShellOut::ShellComma
ndFailed: Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/systemctl --system restart td-agent ----
STDOUT: 
STDERR: Job for td-agent.service failed because the control process exited with error code. See "systemctl status td-agent.service" and "journalctl -xe" for detail
s.
---- End output of /usr/bin/systemctl --system restart td-agent ----

System log:

Jan 02 08:56:12 u89-niclangf-01 systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
Jan 02 08:56:12 u89-niclangf-01 systemd[75018]: Failed at step RUNTIME_DIRECTORY spawning /opt/td-agent/embedded/bin/fluentd: File exists
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: td-agent.service: control process exited, code=exited status=233
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: Unit td-agent.service entered failed state.
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: td-agent.service failed.
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: td-agent.service holdoff time over, scheduling restart.
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
Jan 02 08:56:12 u89-niclangf-01 systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...

As you can see the process starts fine at the second try, but the first failure disrupts the bootstrap process.

The RUNTIME_DIRECTORY issue crops up in web searches, most of them old. This seems recent and directly relevant: puppetlabs/puppet_metrics_dashboard#37

@sharpie quoth:

This appears to be a conflict between Puppet_metrics_dashboard::Service/Exec[Create Systemd temp Files] and systemd over who gets to create the /run/grafana directory.
...
So, we probably need to drop the logic around creating the /run/grafana directory since systemd is now handling it.

The td-agent RPM package postinstall script goes like this:

if [ ! -e "/var/run/td-agent/" ]; then
  mkdir -p /var/run/td-agent/
fi

In other tickets @poettering says that systemd should only complain if the directory exists and has the wrong owners, but I have attempted creating the directory correctly by patching the chef recipe to no avail.

Since the SPEC file for the RPM package has been hard to find I've instead tried another fix in the chef recipe (td-agent::install):

directory "/var/run/td-agent" do
  action :nothing
end

package "td-agent" do
  retries 3
  retry_delay 10
  if node["td_agent"]["pinning_version"]
    action :install
    version node["td_agent"]["version"]
  else
    action :upgrade
  end
  notifies :delete, 'directory[/var/run/td-agent]', :immediately
end

which seems to have fixed it.

So it turns out that this is a issue with interaction between the RPM file and systemd and not the chef-cookbook. Still posting here since we only experience problems with this when using chef.

@niclan
Copy link
Author

niclan commented Jan 2, 2020

I'll email the person whose email address is found in the rpm package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant