Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify counter64 reset zero value. #1154

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

robshakir
Copy link
Contributor

 * (M) types/openconfig-yang-types.go
   - `counter64` defined a reset behaviour that is not common to
      all usages of the type. Remove the definition so that each
      leaf describes its behaviour for resets if required.

YANG change only -- for counters in the interfaces model, behaviours are defined in the leaf defintions. Other counters do not have this behaviour.

 * (M) types/openconfig-yang-types.go
   - `counter64` defined a reset behaviour that is not common to
      all usages of the type. Remove the definition so that each
      leaf describes its behaviour for resets if required.
@robshakir robshakir requested a review from a team as a code owner July 24, 2024 03:03
@OpenConfigBot
Copy link

No major YANG version changes in commit 8422184

@dplore
Copy link
Member

dplore commented Jul 24, 2024

Context: Some users were asking if OC counters are expected to be reset to zero when, for example, a CLI command is issued to reset the counter. This is to clarify that there are no expectations in OpenConfig that a counter value will be reset like this. The only expectation is that the counter will "wrap" back to zero when exceeding the max value.

We welcome any comments on this.

@dplore dplore self-assigned this Jul 24, 2024
@dplore
Copy link
Member

dplore commented Jul 24, 2024

I set last call to 4 weeks from now to give a bit of time for feedback from the community.

Discontinuities in a counter are generally triggered only when
the counter is reset to zero, through operator or system
intervention.";
reaches its maximum value, 2^64-1, it loops to zero.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we don't just use consistent wording in both counter variants?

When the counter reaches its maximum value (2^32-1), it wraps to zero
When the counter reaches its maximum value (2^64-1), it wraps to zero

The IETF type variants are:

The counter32 type represents a non-negative integer
that monotonically increases until it reaches a
maximum value of 2^32-1 (4294967295 decimal), when it
wraps around and starts increasing again from zero.
The counter64 type represents a non-negative integer
that monotonically increases until it reaches a
maximum value of 2^64-1 (18446744073709551615 decimal),
when it wraps around and starts increasing again from zero.

@earies
Copy link
Contributor

earies commented Jul 25, 2024

Context: Some users were asking if OC counters are expected to be reset to zero when, for example, a CLI command is issued to reset the counter. This is to clarify that there are no expectations in OpenConfig that a counter value will be reset like this. The only expectation is that the counter will "wrap" back to zero when exceeding the max value.

We welcome any comments on this.

LGTM - the typedef should not carry any behavior description of counter resets - the intended behavior is best served in whatever is triggering the reset (e.g. RPC)

@earies
Copy link
Contributor

earies commented Jul 25, 2024

Probably want to reset the commit message on merge as well

 * (M) types/openconfig-yang-types.go

s/go/yang/


Discontinuities in the counter are generally triggered only when
the counter is reset to zero.";
reaches its maximum value, in this case 2^32-1, it wraps to 0.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the context from @dplore, is it to say that if a command on the device such as "clear ip intf statistics counters" is issued it will clear only the counters used in native schema but not in OC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellt I'll give my viewpoint here since I know where you are going with this considering the wording if OC counters are expected to be reset ...

Maybe this should be conveyed somewhere more formally to set expectations correctly but I have yet to see this across any industry effort to date (I may just not have come across as well)

In most cases we should clarify that there is a single state (I'll concentrate only on r/o for now) data-source for a given unique object in a given implementation and thus no per-schema or per-API datasets of sorts (I hope we all agree on this otherwise this warrants an entirely separate discussion). This means if an implementation supports native, OpenConfig and say IETF modeling and there is a node that is meant to access the same underlying data, that this data is always represented consistently across all views (assuming type/representation matches and transformations aside)

Now, your example above is a bit unusual in intent so this opens up another case of concern in general.

First, there is no "OC" method (gNOI) to execute such an action - CLI is not a defined NBI for this world so this falls back to ymmv across implementation behaviors.

Second, the intent of this type of command across implementations is generally tied to a specific NBI which is the human interactive CLI (or even user session possibly which is what it really should be tied to at most imo) - a clear in this case does not mean to flush the counters on all NBI as you are not resetting IF-MIB OIDs, YANG state nodes, etc... but rather its use has been for real-time human debugging on the element directly that doesn't necessarily correspond to the paradigm of OC (at least not as it stands today) - a clear across all NBI as you can imagine could be more catastrophic affecting downstream systems consuming such data.

I feel like this topic probably does need to be recorded otherwise it will pop up from time to time on expected behaviors but as I see it any "clear counters" is local to the native implementation over CLI only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@earies maybe an example would do good

if we have a leaf ro some-counter that in whatever schema contains a value 100, then if we have a CLI show command show some-counter executed, it would return value 100.

Now if we execute clear some-counter in CLI, then the next show some-counter would return 0, but the underlying value for the leaf would still be 100. Thus, clear in the CLI affects the representation of a value in the CLI, but does not nullify the underlying data.

Is that example matching what you described above, @earies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to: A "clear counters" CLI command should affect only the CLI view of the counters. The operational use case for this CLI is debugging. It's common to utilize the underlying hardware counters for systems (such as billing) to read the system counters and be negatively impacted by unexpected, repeated rollovers on counters.

Copy link

@LimeHat LimeHat Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to: A "clear counters" CLI command should affect only the CLI view of the counters. The operational use case for this CLI is debugging. It's common to utilize the underlying hardware counters for systems (such as billing) to read the system counters and be negatively impacted by unexpected, repeated rollovers on counters.

This is uncharted waters, and, practically speaking, will be unimplementable in many cases. The system will have to maintain N copies of every counter. But there are other situations where counter64 leafs are reset to zero. For example, a system reboot, usually, clears all counters to the default value. A component reboot (e.g. a linecard reboot, or a software component restart) may do the same, depending on the implementation. There could be other cases.

Also, w.r.t interface counters specifically, there's already some text about it in openconfig-interfaces, which allows clearing the counters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is uncharted waters

Agreed - we are taking an existing feature that contradicts the management and debuggability goals under this umbrella imo

practically speaking, will be unimplementable in many cases

Well, various implementations have supported what I describe for years already. A clear counters is only segmented to the CLI NBI - Arista EOS and JUNOS/EVO work this way today (I cannot recall other implementations off the top of my head)

eos1#show interfaces Ethernet 1 | include clearing|packets.*put
  Last clearing of "show interface" counters never
     544 packets input, 58805 bytes
     465 packets output, 59009 bytes

interfaces/interface[name=Ethernet1]/state/counters/in-octets: 67993

IF-MIB::ifInOctets.1 = Counter32: 67659

eos1#clear counters ethernet 1
eos1#show interfaces Ethernet 1 | include clearing|packets.*put
  Last clearing of "show interface" counters 0:00:23 ago
     10 packets input, 814 bytes
     13 packets output, 1641 bytes

interfaces/interface[name=Ethernet1]/state/counters/in-octets: 68400

IF-MIB::ifInOctets.1 = Counter32: 69068

there's already some text about it in openconfig-interfaces, which allows clearing the counters.

For context, here is the text for in-octets today

description
  "The total number of octets received on the interface,
  including framing characters.

  Discontinuities in the value of this counter can occur
  at re-initialization of the management system, and at
  other times as indicated by the value of
  'last-clear'.";
reference

I think this text is incorrect and implies that the leafs values are cleared in correspondence to the last-clear leaf which is not true in various implementations. This leaf in those implementations is not cleared by any forceful "clear" command but would be if there is a process/hw reinit

@LimeHat @hellt - in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?

if we have a leaf ro some-counter that in whatever schema contains a value 100, then if we have a CLI show command show some-counter executed, it would return value 100.

Now if we execute clear some-counter in CLI, then the next show some-counter would return 0, but the underlying value for the leaf would still be 100. Thus, clear in the CLI affects the representation of a value in the CLI, but does not nullify the underlying data.

Is that example matching what you described above, @earies?

Yep you got it... The leaf last-clear imo is only of value to the CLI user to understand why show commands are displaying different than the actual underlying data-source (my observation across a few implementations is this is across the entirety of the CLI NBI and not per session)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?

We follow a single source of truth approach and the cli view will match gnmi/netconf/any other interface's view.

Copy link
Contributor

@earies earies Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in SRLinux/SROS does this correspond to the behavior of your native "clear counters"?

We follow a single source of truth approach and the cli view will match gnmi/netconf/any other interface's view.

With this logic, this could really be applied generically to "reset" any state counter in the system at the source-of-truth.... nothing specific to interface counters then for what was a historical human-debugging use-case.

In either case a consumer needs to handle rollover and this approach just means a client triggered action can force.

Another implementation observation (IOS-XR): A clear will reset the state leaf for YANG based access (gNMI/NETCONF) + CLI but not for SNMP IF-MIB.

I know this conversation is going down a bit of a divergent path of pointing out some behavioral differences but it seems that the usefulness of a last-clear and user-defined triggers may very well warrant common behaviors which I tend to agree about maintaining and operating on a single copy/source-of-truth.

This could warrant a last-clear node at every ./counters container + generic methods to clear any counter within a system. A generic gNOI RPC could also cover this by way of targeting any suffixed path of ../../counters.

This would introduce backwards incompatible (or knob enabled change) yet consistent behaviors across various implementations

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of thoughts on this thread:

Firstly, I would argue that specifying the behaviour of native CLI commands is somewhat out of scope for OpenConfig specifications, even though I see benefits for operators if systems behave similarly.

But generally, we think that it makes sense for the CLI to clear the counters everywhere, otherwise, this means that the device needs to burn extra memory keeping a separate copy of the counters for CLI vs OC vs IETF vs MIBs, etc.

I can see the argument about wanting to debug network issues via the CLI, but I think that could be better achieved by a separate transient command that takes a snapshot of the counters that could then be compared with the values at a subsequent point entirely independently from the main stats collection. Then, when the snapshot is no longer required, the associated resources can be freed.

@LimeHat
Copy link

LimeHat commented Jul 26, 2024

There are a few scenarios where the counter64 can be reset to zero:

  1. counter wrap.
  2. direct user action (e.g. clear cli cmd).
  3. indirect user action (e.g. reboot or restart of a hw/sw component).
  4. an event that leads to a reboot or restart (hw failures, power failures, sw crash/restart, etc) without user involvement.

In my view, the spirit of the definition is that the values are not expected to be decremented/reset in normal conditions and without a user action to do so.

And one can argue that the current text places points 2-4 into the letter of the definition as well. The text allows the existence of methods to implement point#2, but it does not require and does not mandate it.
I don't have a strong opinion on whether this belongs to the typedef definition, but even if it's not mentioned, points 3 and 4 will be applicable.

@LimeHat
Copy link

LimeHat commented Jul 26, 2024

Also, to respond to the Darren's comment:

Context: Some users were asking if OC counters are expected to be reset to zero when, for example, a CLI command is issued to reset the counter. This is to clarify that there are no expectations in OpenConfig that a counter value will be reset like this. The only expectation is that the counter will "wrap" back to zero when exceeding the max value.

I don't think this should be covered by OpenConfig models in general. If an operator uses OC models and OC-defined protocols (e.g., gNMI/gNOI/...), then there is no such cli for them.
And if they're using a vendor-proprietary cmd to do something, then it is up to the user and the vendor to define the behavior.

But as I already indicated in the comment above, my interpretation is that the current text allows this behavior, but does not require it.

@dplore
Copy link
Member

dplore commented Jul 27, 2024

Also, to respond to the Darren's comment:

Context: Some users were asking if OC counters are expected to be reset to zero when, for example, a CLI command is issued to reset the counter. This is to clarify that there are no expectations in OpenConfig that a counter value will be reset like this. The only expectation is that the counter will "wrap" back to zero when exceeding the max value.

I don't think this should be covered by OpenConfig models in general. If an operator uses OC models and OC-defined protocols (e.g., gNMI/gNOI/...), then there is no such cli for them. And if they're using a vendor-proprietary cmd to do something, then it is up to the user and the vendor to define the behavior.

But as I already indicated in the comment above, my interpretation is that the current text allows this behavior, but does not require it.

The new, proposed text is ambiguous if there are any other reasons the counter might reset. It only specifies the rollover condition. My interpretation is that if the description doesn't permit a behavior, then it is not allowed.

However, leafs which use this type (as pointed out in other comments in this PR) may impose additional conditions such as

leaf in-octets {
type oc-yang:counter64;
description
"The total number of octets received on the interface,
including framing characters.
Discontinuities in the value of this counter can occur
at re-initialization of the management system, and at
other times as indicated by the value of
'last-clear'.";
reference
"RFC 2863: The Interfaces Group MIB - ifHCInOctets.
RFC 4293: Management Information Base for the
Internet Protocol (IP).";
}
which explicitly says the counter may be reset for 'various' reasons and has an accommodation for last-clear as Ebben pointed out. (I am dating myself by saying this, but in SNMP the equivalent is ifCounterDiscontinuityTime).

reaches its maximum value, in this case 2^32-1, it wraps to 0.

Discontinuities in the counter are generally triggered only when
the counter is reset to zero.";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robshakir, putting aside the somewhat orthogonal discussion about whether OC interface counters should be cleared when a CLI "clear interface counters" action is taken, I wanted to better understand the intention of your change.

My reading of the existing text is that it is saying two things:

(1) just because a counter has wrapped around to 0, you wouldn't expect an associated discontinuity event to be triggered (e.g., for interface counters (which I know describe their own specific behaviour anyway) I would not expect the last-clear timestamp to change just because a counters wrapped around to 0).

(2) You would not expect to see breaks/jumps in the counters unless they were reset to 0.

Are you saying that there are some counters where one or both of these conditions don't hold?

If you do remove this text, perhaps there should instead be a comment suggesting that for any leaves using this type, the description should indicate if and when discontinuities may occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a good addition. will make such a change.

the primary reason for this change is that not all counters that use this type are the same, and the fact that the type defines the behaviour not the leaves is fundamentally incorrect.

future changes may wish to clarify the behaviour on a per-leaf basis, or define a new type, but this text is currently erroneous.

@dplore dplore added the last-call PR that is in final review before merging. label Sep 13, 2024
@rgwilton
Copy link

@dplore @robshakir ,

Just going back over this one I think that there are two changes that should probably be made before the final call merge:

(1) As per Ebben's comment, aligning the language between the 32 and 64 bit versions of the counter makes sense (i.e., loop vs reset).
(2) As per my comment on Aug 7, add a suggestion to the description (or just as a comment in the YANG file) indicating that data nodes using these counter types should specify what discontinuities may occur, and how they are reported (if at all).

@LimeHat
Copy link

LimeHat commented Sep 20, 2024

I'm not sure this should be in the last-call, it appears there are a few different takes on the new text.
Darren states that

My interpretation is that if the description doesn't permit a behavior, then it is not allowed.

If that's the purpose of the change, I wouldn't support this due to reasons I mentioned in the comment #1154 (comment)

Additionally, Rob wanted to make a change based on the comment from Robert #1154 (comment) (with a different interpretation of the proposal), which didn't happen yet.

@dplore dplore removed the last-call PR that is in final review before merging. label Sep 20, 2024
@dplore
Copy link
Member

dplore commented Sep 20, 2024

agreed, removed the last-call label, it was an oversight

@robshakir
Copy link
Contributor Author

Apologies, will get to this within the next week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for author
Development

Successfully merging this pull request may close these issues.

7 participants