Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

handle connection errors better: try reconnecting and buffering data #47

Open
Dieterbe opened this issue Sep 3, 2017 · 1 comment
Open

Comments

@Dieterbe
Copy link

Dieterbe commented Sep 3, 2017

Hello,
it seems this publisher does not handle errors well.
we've seen something happen in production, which is sometimes we see no more data come in at all (for hours) and the "last error" shown
is Error: dial tcp xxx.xxx.xxx.xxx:2003: i/o timeout . it seems after this point no attempts are made to reconnect.
when i then run service snap restart all the stats start coming through (but obviously some stats will be lost)

my suggestion would be:

  1. on connection failure, keep retrying periodically
  2. when there's no connection, buffer stats up to a configurable limit
  3. when the connection restores, drain the buffer and flush all stats (probably best the current stats first, and then backfill from the historical buffer, but just doing everything in order would be fine for me too)

I use snapd version master-ca32c9a

hopefully this is a good bug report, i'm not that familiar with the inner workings of snap and this publisher.
thanks!

@mkleina
Copy link

mkleina commented Sep 6, 2017

@Dieterbe Hello!

on connection failure, keep retrying periodically

To achieve this behavior, just configure your task's max-failures parameter. For example (100 retries):

---
version: 1
schedule:
  type: simple
  interval: 1s 
max-failures: 100
workflow:
  collect:
    metrics:
      "/intel/psutil/load/load1": {}
      "/intel/psutil/load/load15": {}
      "/intel/psutil/load/load5": {}
      "/intel/psutil/vm/available": {}
      "/intel/psutil/vm/free": {}
      "/intel/psutil/vm/used": {}
    publish:
    - plugin_name: graphite
      config:
        server: localhost
        prefix: issue32
        prefix_tags: "space tags"
        port: 2003

About buffering data - I think it is very good idea. We will consider implementing this in future version. Thanks for your report!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants