Skip to content
This repository has been archived by the owner on Jun 17, 2023. It is now read-only.

Where do I start Confidence

Wes edited this page Mar 8, 2017 · 2 revisions

Introduction

Confidence details the degree of certainty of a given observation. For instance:

  • I am 80% confident that on 2015-03-20T00:00:01Z example.com is dropping malware
  • I am 90% confident that partner-1's observation that http://example.com/1.html on 2015-03-20T00:01:01Z was being used as a phishing url
  • I am 100% confident that tinyurl.com was observed in a piece of unsolicited commercial email (eg: spam).

One of the primary use cases for confidence is in the generation of threat intelligence feeds. For example, You may want to generate a de-duplicated feed of indicators seen within the last seven days with a confidence of 8.5 or higher to be used in a network sensor. While judging confidence may be subjective; there's one simple pattern that can narrow down the answer rather quickly:

  1. would you trust the data author with root access on your firewall to block something? if no, it's not an 8 or higher.
  2. is there a better than 50/50 chance (a coin flip) that there's something suspect about the data? if yes. it's a 6 or higher. if no, it's less than a 5 and almost does not matter.

From there, you can very easily get to a 6, 7 or 8 depending on your risk tolerance. With the WDIS Feeds concept, whitelists are used to help further reduce the risk of blocking something like google.com. With that, generally a 7 or 8 is OK as long as the feeds are extremely specific about the risk (eg: ipv4|ipv6 addresses have a portlist, protocol and timestamp associated with them).

General Scale and Other Rules of Thumb

(9-10) Certain

  • highly vetted data by known, trusted security professionals
  • vetting relationship has been consistent for more than 2 years
  • very specific data (eg: ip+port+protocol, or a specific url, or malware hash)
  • can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.

(7-8) Very Confident

  • vetted data by known, trusted security professionals
  • data that has been vetted by a human or set of known and proven processes
  • vetting relationship has been consistent and in-place for at-least 1 year
  • data feed has been observed for at-least a year
  • data should be highly specific (eg: port/protocols, prefixes should be as narrow as possible)
  • can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.

(6-7) Somewhat Confident

  • semi-vetted data by a security professional or trusted analytics process
  • data that has under-gone some either machine or human vetting (eg: checked against a whitelist automatically)
  • could be leveraged in traffic mitigation processes (eg: dns sink-holing), contains slight risk of collateral damage, but still severely mitigated by native whitelisting process.

(5-6) Not Confident

  • machine generated data or enumerated data
  • some feeds might fall in the category if the author is lazy, or trying to cram too much into the feed
  • examples might include a domains list where the author is simply taking a botnet urls list and posting just the domains as a feed (6)
  • carries risk when used in automatic mitigation processes

(5) "50/50 shot" (eg: same as a coin flip)

(0-4) Informational Data

  • machine generated / enumerated data
  • examples include:
  • auto-enumerated name-servers from domains
  • infrastructure resolved from domain data
  • carries significant risk when used in automatic mitigation processes