-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify counter64
reset zero value.
#1154
base: master
Are you sure you want to change the base?
Conversation
* (M) types/openconfig-yang-types.go - `counter64` defined a reset behaviour that is not common to all usages of the type. Remove the definition so that each leaf describes its behaviour for resets if required.
No major YANG version changes in commit 8422184 |
Context: Some users were asking if OC counters are expected to be reset to zero when, for example, a CLI command is issued to reset the counter. This is to clarify that there are no expectations in OpenConfig that a counter value will be reset like this. The only expectation is that the counter will "wrap" back to zero when exceeding the max value. We welcome any comments on this. |
I set last call to 4 weeks from now to give a bit of time for feedback from the community. |
Discontinuities in a counter are generally triggered only when | ||
the counter is reset to zero, through operator or system | ||
intervention."; | ||
reaches its maximum value, 2^64-1, it loops to zero."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we don't just use consistent wording in both counter variants?
When the counter reaches its maximum value (2^32-1), it wraps to zero
When the counter reaches its maximum value (2^64-1), it wraps to zero
The IETF type variants are:
The counter32 type represents a non-negative integer
that monotonically increases until it reaches a
maximum value of 2^32-1 (4294967295 decimal), when it
wraps around and starts increasing again from zero.
The counter64 type represents a non-negative integer
that monotonically increases until it reaches a
maximum value of 2^64-1 (18446744073709551615 decimal),
when it wraps around and starts increasing again from zero.
LGTM - the typedef should not carry any behavior description of counter resets - the intended behavior is best served in whatever is triggering the reset (e.g. RPC) |
Probably want to reset the commit message on merge as well
s/go/yang/ |
|
||
Discontinuities in the counter are generally triggered only when | ||
the counter is reset to zero."; | ||
reaches its maximum value, in this case 2^32-1, it wraps to 0."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the context from @dplore, is it to say that if a command on the device such as "clear ip intf statistics counters" is issued it will clear only the counters used in native schema but not in OC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hellt I'll give my viewpoint here since I know where you are going with this considering the wording if OC counters are expected to be reset ...
Maybe this should be conveyed somewhere more formally to set expectations correctly but I have yet to see this across any industry effort to date (I may just not have come across as well)
In most cases we should clarify that there is a single state (I'll concentrate only on r/o for now) data-source for a given unique object in a given implementation and thus no per-schema or per-API datasets of sorts (I hope we all agree on this otherwise this warrants an entirely separate discussion). This means if an implementation supports native, OpenConfig and say IETF modeling and there is a node that is meant to access the same underlying data, that this data is always represented consistently across all views (assuming type/representation matches and transformations aside)
Now, your example above is a bit unusual in intent so this opens up another case of concern in general.
First, there is no "OC" method (gNOI) to execute such an action - CLI is not a defined NBI for this world so this falls back to ymmv across implementation behaviors.
Second, the intent of this type of command across implementations is generally tied to a specific NBI which is the human interactive CLI (or even user session possibly which is what it really should be tied to at most imo) - a clear
in this case does not mean to flush the counters on all NBI as you are not resetting IF-MIB OIDs, YANG state nodes, etc... but rather its use has been for real-time human debugging on the element directly that doesn't necessarily correspond to the paradigm of OC (at least not as it stands today) - a clear across all NBI as you can imagine could be more catastrophic affecting downstream systems consuming such data.
I feel like this topic probably does need to be recorded otherwise it will pop up from time to time on expected behaviors but as I see it any "clear counters" is local to the native implementation over CLI only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@earies maybe an example would do good
if we have a leaf ro some-counter
that in whatever schema contains a value 100
, then if we have a CLI show command show some-counter
executed, it would return value 100.
Now if we execute clear some-counter
in CLI, then the next show some-counter
would return 0, but the underlying value for the leaf would still be 100
. Thus, clear
in the CLI affects the representation of a value in the CLI, but does not nullify the underlying data.
Is that example matching what you described above, @earies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to: A "clear counters" CLI command should affect only the CLI view of the counters. The operational use case for this CLI is debugging. It's common to utilize the underlying hardware counters for systems (such as billing) to read the system counters and be negatively impacted by unexpected, repeated rollovers on counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to: A "clear counters" CLI command should affect only the CLI view of the counters. The operational use case for this CLI is debugging. It's common to utilize the underlying hardware counters for systems (such as billing) to read the system counters and be negatively impacted by unexpected, repeated rollovers on counters.
This is uncharted waters, and, practically speaking, will be unimplementable in many cases. The system will have to maintain N copies of every counter. But there are other situations where counter64 leafs are reset to zero. For example, a system reboot, usually, clears all counters to the default value. A component reboot (e.g. a linecard reboot, or a software component restart) may do the same, depending on the implementation. There could be other cases.
Also, w.r.t interface counters specifically, there's already some text about it in openconfig-interfaces
, which allows clearing the counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is uncharted waters
Agreed - we are taking an existing feature that contradicts the management and debuggability goals under this umbrella imo
practically speaking, will be unimplementable in many cases
Well, various implementations have supported what I describe for years already. A clear counters
is only segmented to the CLI NBI - Arista EOS and JUNOS/EVO work this way today (I cannot recall other implementations off the top of my head)
eos1#show interfaces Ethernet 1 | include clearing|packets.*put
Last clearing of "show interface" counters never
544 packets input, 58805 bytes
465 packets output, 59009 bytes
interfaces/interface[name=Ethernet1]/state/counters/in-octets: 67993
IF-MIB::ifInOctets.1 = Counter32: 67659
eos1#clear counters ethernet 1
eos1#show interfaces Ethernet 1 | include clearing|packets.*put
Last clearing of "show interface" counters 0:00:23 ago
10 packets input, 814 bytes
13 packets output, 1641 bytes
interfaces/interface[name=Ethernet1]/state/counters/in-octets: 68400
IF-MIB::ifInOctets.1 = Counter32: 69068
there's already some text about it in openconfig-interfaces, which allows clearing the counters.
For context, here is the text for in-octets
today
description
"The total number of octets received on the interface,
including framing characters.
Discontinuities in the value of this counter can occur
at re-initialization of the management system, and at
other times as indicated by the value of
'last-clear'.";
reference
I think this text is incorrect and implies that the leafs values are cleared in correspondence to the last-clear
leaf which is not true in various implementations. This leaf in those implementations is not cleared by any forceful "clear" command but would be if there is a process/hw reinit
@LimeHat @hellt - in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?
if we have a leaf ro some-counter that in whatever schema contains a value 100, then if we have a CLI show command show some-counter executed, it would return value 100.
Now if we execute clear some-counter in CLI, then the next show some-counter would return 0, but the underlying value for the leaf would still be 100. Thus, clear in the CLI affects the representation of a value in the CLI, but does not nullify the underlying data.
Is that example matching what you described above, @earies?
Yep you got it... The leaf last-clear
imo is only of value to the CLI user to understand why show commands are displaying different than the actual underlying data-source (my observation across a few implementations is this is across the entirety of the CLI NBI and not per session)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?
We follow a single source of truth approach and the cli view will match gnmi/netconf/any other interface's view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?
We follow a single source of truth approach and the cli view will match gnmi/netconf/any other interface's view.
With this logic, this could really be applied generically to "reset" any state counter in the system at the source-of-truth.... nothing specific to interface counters then for what was a historical human-debugging use-case.
In either case a consumer needs to handle rollover and this approach just means a client triggered action can force.
Another implementation observation (IOS-XR): A clear will reset the state leaf for YANG based access (gNMI/NETCONF) + CLI but not for SNMP IF-MIB.
I know this conversation is going down a bit of a divergent path of pointing out some behavioral differences but it seems that the usefulness of a last-clear
and user-defined triggers may very well warrant common behaviors which I tend to agree about maintaining and operating on a single copy/source-of-truth.
This could warrant a last-clear
node at every ./counters
container + generic methods to clear any counter within a system. A generic gNOI RPC could also cover this by way of targeting any suffixed path of ../../counters
.
This would introduce backwards incompatible (or knob enabled change) yet consistent behaviors across various implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of thoughts on this thread:
Firstly, I would argue that specifying the behaviour of native CLI commands is somewhat out of scope for OpenConfig specifications, even though I see benefits for operators if systems behave similarly.
But generally, we think that it makes sense for the CLI to clear the counters everywhere, otherwise, this means that the device needs to burn extra memory keeping a separate copy of the counters for CLI vs OC vs IETF vs MIBs, etc.
I can see the argument about wanting to debug network issues via the CLI, but I think that could be better achieved by a separate transient command that takes a snapshot of the counters that could then be compared with the values at a subsequent point entirely independently from the main stats collection. Then, when the snapshot is no longer required, the associated resources can be freed.
There are a few scenarios where the counter64 can be reset to zero:
In my view, the spirit of the definition is that the values are not expected to be decremented/reset in normal conditions and without a user action to do so. And one can argue that the current text places points 2-4 into the letter of the definition as well. The text allows the existence of methods to implement point#2, but it does not require and does not mandate it. |
Also, to respond to the Darren's comment:
I don't think this should be covered by OpenConfig models in general. If an operator uses OC models and OC-defined protocols (e.g., gNMI/gNOI/...), then there is no such cli for them. But as I already indicated in the comment above, my interpretation is that the current text allows this behavior, but does not require it. |
The new, proposed text is ambiguous if there are any other reasons the counter might reset. It only specifies the rollover condition. My interpretation is that if the description doesn't permit a behavior, then it is not allowed. However, leafs which use this type (as pointed out in other comments in this PR) may impose additional conditions such as public/release/models/interfaces/openconfig-interfaces.yang Lines 747 to 761 in 72d3b67
last-clear as Ebben pointed out. (I am dating myself by saying this, but in SNMP the equivalent is ifCounterDiscontinuityTime ).
|
reaches its maximum value, in this case 2^32-1, it wraps to 0. | ||
|
||
Discontinuities in the counter are generally triggered only when | ||
the counter is reset to zero."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robshakir, putting aside the somewhat orthogonal discussion about whether OC interface counters should be cleared when a CLI "clear interface counters" action is taken, I wanted to better understand the intention of your change.
My reading of the existing text is that it is saying two things:
(1) just because a counter has wrapped around to 0, you wouldn't expect an associated discontinuity event to be triggered (e.g., for interface counters (which I know describe their own specific behaviour anyway) I would not expect the last-clear timestamp to change just because a counters wrapped around to 0).
(2) You would not expect to see breaks/jumps in the counters unless they were reset to 0.
Are you saying that there are some counters where one or both of these conditions don't hold?
If you do remove this text, perhaps there should instead be a comment suggesting that for any leaves using this type, the description should indicate if and when discontinuities may occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's a good addition. will make such a change.
the primary reason for this change is that not all counters that use this type are the same, and the fact that the type defines the behaviour not the leaves is fundamentally incorrect.
future changes may wish to clarify the behaviour on a per-leaf basis, or define a new type, but this text is currently erroneous.
Just going back over this one I think that there are two changes that should probably be made before the final call merge: (1) As per Ebben's comment, aligning the language between the 32 and 64 bit versions of the counter makes sense (i.e., loop vs reset). |
I'm not sure this should be in the
If that's the purpose of the change, I wouldn't support this due to reasons I mentioned in the comment #1154 (comment) Additionally, Rob wanted to make a change based on the comment from Robert #1154 (comment) (with a different interpretation of the proposal), which didn't happen yet. |
agreed, removed the last-call label, it was an oversight |
Apologies, will get to this within the next week or so. |
YANG change only -- for counters in the interfaces model, behaviours are defined in the leaf defintions. Other counters do not have this behaviour.