Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hltd 1.6.0 merge to master #60

Merged
merged 82 commits into from
Jan 30, 2015
Merged

hltd 1.6.0 merge to master #60

merged 82 commits into from
Jan 30, 2015

Conversation

smorovic
Copy link
Contributor

1.5.3 is now branched into hltd-1.5.3, so merging current version into master. release will be created afterwards.

/etc/appliance) and switchover to "cloud" mode for FUs
instances (might handle with single merger instance if transfer options are written per run)
in name finding for closure checks (in case central cluster is used)
*not using same port for instances on BU, instead offset based on BU instance list order
*modified init script
*changed config file init
problem: hltd still writes to /etc/hltd.conf if started using all option
of the init script (will execve or popen to start each variant)
processes of multiple instances (if only using double fork, logging was
being taken to the log file of a first process)
in case of checksum error, fill in only error events and delete the file
*new CMSSW test script
*resources stay in quarantined until next run appears (accounting for if next ongoing run is already present)
*number of non-zero exit codes are logged in boxinfo files
*resolve a bug where already renamed and deleted file is closed by CMSSW
very late
*logcollector will correctly push mapping
*lock to protect from taking nss lock while forking with demote
(can deadlock child process due to initiated DNS lookups in parent while
fork is called)
*include and exclude cgi added
index anelastic queue status in ES
…script

*monitoring and fixes to lumi queue status
*fix blacklist file support
*quicker appearance of box file (1s) after restart on FU
*close index by default
*better define types in aUtils
*fixed active run cleanup in BU
*instances share same output and make symlink in main dir for mergers to
work transparently
-fillresources.py always executed by configurefff.sh
-subtract sub-mount sizes when calculating ramdisk total and used size
-ramdisk occupancy in summary json
-fixed anelastic accounting of queued lumis
-worst-case number of queued lumis taken only for latest active run in
appliance (if any)
-silence messages about file deletion in elastic and anelastic (already
obvious from other info)
*10s threshold for resource_summary file info collection
*retry writing and moving queue status file as it is sometimes not correctly written
*log collector process can be restarted by creating 'logrestart' file in watch directory
*update 1.6.0 release candidate rpm's
Conflicts:
	python/elastic.py
	python/elasticbu.py
	python/hltd.py
	python/setupmachine.py
	scripts/metarpm.sh
*box files indexing skipped when unable to have connection
*using os._exit to quit the process when there is processing loop exception in anelastic or
elastic.py
@smorovic
Copy link
Contributor Author

#57 is partially addressed, by protecting against BU hltd failure on unsuccessful notification of FU
#58 is resolved by implementing support for multiple instances
$59 is resolved b providing a resource summary file as described in the issue

@smorovic smorovic closed this Jan 30, 2015
@smorovic smorovic reopened this Jan 30, 2015
smorovic added a commit that referenced this pull request Jan 30, 2015
@smorovic smorovic merged commit 1a021f6 into master Jan 30, 2015
@smorovic smorovic deleted the hltd-160-candidate branch January 30, 2015 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant