Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

[bugfix] Dragonfly bind an already in used port make dfdaemon.log file size huge #364

Closed
godliness opened this issue Jan 31, 2019 · 12 comments · Fixed by #369
Closed

[bugfix] Dragonfly bind an already in used port make dfdaemon.log file size huge #364

godliness opened this issue Jan 31, 2019 · 12 comments · Fixed by #369
Labels
areas/log kind/bug This is bug report for project

Comments

@godliness
Copy link
Contributor

godliness commented Jan 31, 2019

Ⅰ. Issue Description

Dragonfly bind an already in used port make dfdaemon.log file size huge

Ⅱ. Describe what happened

Dragonfly bind an already in used port make dfdaemon.log file size huge, and dragonfly write into dfdaemon.log file frequently, below is the log content:

time="2019-01-28 12:27:00" level=info msg="start dfdaemon param:&{DfPath:/software/servers/df-client/dfget DFRepo:/data/dragonfly/.small-dragonfly/dfdaemon/data/ RateLimit:1000M CallSystem:com_ops_dragonfly URLFilter:Signature&Expires&OSSAccessKeyId Notbs:true MaxProcs:4 Version:false Verbose:false Help:false HostIP:127.0.0.1 Port:65001 Registry:http://repo.image.local DownRule: CertFile: KeyFile:}"
time="2019-01-28 12:27:00" level=fatal msg="listen tcp :65001: bind: address already in use"
time="2019-01-28 12:27:00" level=info msg=init...
time="2019-01-28 12:27:00" level=info msg="dfget version:0.2.0\n"
time="2019-01-28 12:27:00" level=info msg="init finish"
time="2019-01-28 12:27:00" level=info msg="start dfdaemon param:&{DfPath:/software/servers/df-client/dfget DFRepo:/data/dragonfly/.small-dragonfly/dfdaemon/data/ RateLimit:1000M CallSystem:com_ops_dragonfly URLFilter:Signature&Expires&OSS

image

But there is another confused thing is that dragonfly actually already bind on port 65001, but it still output the message time="2019-01-28 12:27:00" level=fatal msg="listen tcp :65001: bind: address already in use"

image

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • dragonfly version: 0.2.1
  • OS (e.g. from /etc/os-release):
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

/cc @allencloud

@mdianjun
Copy link

I have suffered this issue too.

@lowzj
Copy link
Member

lowzj commented Jan 31, 2019

Dfdaemon will shutdown when the address is already in use. Is there another process to monitor dfdaemon and do some automatically startup operations?

@godliness
Copy link
Contributor Author

@lowzj

The PID of Dfdaemon is not changed, so i think Dfdaemon did not shutdown.

@lowzj
Copy link
Member

lowzj commented Jan 31, 2019

@godliness

I means that there may be a monitor process always retry to start a new dfdaemon, the new dfdaemon process output the error listen tcp :65001: bind: address already in use and shutdown immediately because of there is an old running dfdaemon using the port 65001.

@godliness
Copy link
Contributor Author

godliness commented Jan 31, 2019

@lowzj
[Unit]
Description=dragonfly df-daemon

[Service]
User=dragonfly
Restart=always
RestartSec=10
ExecStartPre=/bin/bash /etc/profile
ExecStart=/software/servers/df-client/dfdaemon -registry http://repo.image.local

[Install]
WantedBy=multi-user.target

systemd control the Dfdaemon, and the RestartSec=10, but our condition is that the log output is very very frequently which looks like while true , a monitor process do you mean it is systemd?

@allencloud
Copy link
Contributor

allencloud commented Jan 31, 2019

Please make sure that you have read the issue dragonflyoss/dragonfly#291.
And it could not be better to have you in the discussion group there. @godliness
Thanks a lot.

@godliness
Copy link
Contributor Author

godliness commented Jan 31, 2019

@lowzj

Thanks for you comment, maybe there is another monitor process start new dfdaemon process. we will further investigation this issue, by the way, our storm also use this port 65001, whether storm process did a weird action on dfdaemon? this is just an arbitrary assumption.

After we find out the reason, i will close this issue and provide the reason.

If anyone who encounter the same issue, please leave your comment here.

@godliness
Copy link
Contributor Author

@lowzj

I think we could eliminate systemd, cause our service file include params RestartSec=10, and the systemd log also show the restart info every 10 secs, and the --nobs is true.

@lowzj
Copy link
Member

lowzj commented Feb 1, 2019

@godliness I have tested this scenario with your systemd config file:

  1. start a dfdaemon process manually binding port 65001
  2. start dfdaemon with systemctl: systemctl start dfdaemon
  3. check the content in dfdaemon.log

Then I found that these messages are logged to dfdaemon.log every 10 seconds and no other messages:

time="2019-02-01 11:58:00" level=info msg=init...
time="2019-02-01 11:58:00" level=info msg="dfget version:0.2.0\n"
time="2019-02-01 11:58:00" level=info msg="init finish"
time="2019-02-01 11:58:00" level=info msg="start dfdaemon param:&{DfPath:/home/taiyun.zj/df-client/dfget DFRepo:/home/taiyun.zj/.small-dragonfly/dfdaemon/data/ RateLimit:100M CallSystem:com_ops_dragonfly URLFilter:Signature&Expires&OSSAccessKeyId Notbs:true MaxProcs:4 Version:false Verbose:true Help:false HostIP:127.0.0.1 Port:65001 Registry:https://index.docker.io DownRule: CertFile: KeyFile:}"
time="2019-02-01 11:58:00" level=fatal msg="listen tcp :65001: bind: address already in use"

Due to that the problem of time format in dfdaemon.log, you can use this command to count how many times the dfdaemon started by systemd every minutes:

$ grep '65001: bind: address already in use' dfdaemon.log | awk -F'"' '{print $2}' |uniq -c
      5 2019-02-01 11:51:00
      6 2019-02-01 11:52:00
      5 2019-02-01 11:53:00
      6 2019-02-01 11:54:00
      6 2019-02-01 11:55:00
      6 2019-02-01 11:56:00
      6 2019-02-01 11:57:00
      6 2019-02-01 11:58:00
      6 2019-02-01 11:59:00

And you also can get some information from /var/log/messages:

grep 'listen tcp :65001: bind: address already in use' /var/log/messages | less

@godliness
Copy link
Contributor Author

godliness commented Feb 1, 2019

@lowzj

Thanks for your testing, i have reproduced this issue on our machine, below is the detail steps:

  1. Create log file and make the size large than 20M, /data/dragonfly/.small-dragonfly/logs/dfdaemon.log
  2. Start a dfdaemon process manually binding port 65001
  3. Sart dfdaemon with systemctl: systemctl start dfdaemon (make sure two process write into the same log file)
  4. Wait for seconds, the log file will be very huge, from 20M to 37GB.

I think the key point is the rotateLog function, let's talk in deep about this:

https://github.com/dragonflyoss/Dragonfly/blob/v0.2.0/dfdaemon/initializer/initializer.go#L104

func rotateLog(logFile *os.File, logFilePath string) {
	logSizeLimit := int64(20 * 1024 * 1024)
	for {
		time.Sleep(time.Second * 60)
		stat, err := os.Stat(logFilePath)
		if err != nil {
			log.Errorf("failed to stat %s: %s", logFilePath, err)
			continue
		}
		// if it exceeds the 20MB limitation
		if stat.Size() > logSizeLimit {
			log.SetOutput(ioutil.Discard)
			logFile.Sync()
			if transFile, err := os.Open(logFilePath); err == nil {
				// move the pointer to be (end - 10MB)
				transFile.Seek(-10*1024*1024, 2)
				// move the pointer to head
				logFile.Seek(0, 0)
				count, _ := io.Copy(logFile, transFile)
				logFile.Truncate(count)
				log.SetOutput(logFile)
				transFile.Close()
			}
		}
	}
}

@lowzj lowzj added the kind/bug This is bug report for project label Feb 1, 2019
@lowzj lowzj changed the title Dragonfly bind an already in used port make dfdaemon.log file size huge [bugfix] Dragonfly bind an already in used port make dfdaemon.log file size huge Feb 1, 2019
@lowzj
Copy link
Member

lowzj commented Feb 1, 2019

@godliness Thanks for your detailed steps, I have reproduced this problem. I will take a look on this. Welcome to submit pull request if you have any idea to fix it.

@godliness
Copy link
Contributor Author

@lowzj

Not at all, i will try to fix this issue soon, and send a PR for this.

godliness referenced this issue in godliness/Dragonfly Feb 1, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364
godliness referenced this issue in godliness/Dragonfly Feb 1, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
godliness referenced this issue in godliness/Dragonfly Feb 1, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
godliness referenced this issue in godliness/Dragonfly Feb 2, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
godliness referenced this issue in godliness/Dragonfly Feb 2, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
godliness referenced this issue in godliness/Dragonfly Feb 2, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
@lowzj lowzj closed this as completed in #369 Feb 2, 2019
lowzj referenced this issue Feb 13, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
starnop referenced this issue in starnop/Dragonfly Nov 27, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
inoc603 referenced this issue in inoc603/Dragonfly Dec 23, 2019
When the size of dfdaemon.log file large than 20M,
the size of log file will continuous growth up.

Use Mmap to copy the file which need be truncated.

Related to issue:
dragonflyoss/dragonfly#364

Signed-off-by: godliness <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
areas/log kind/bug This is bug report for project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants