- Motivation
- Assumptions
- Overall Vision
- Definitions
- Overall design vision
- Overall design constraints
- Syntax
- Semantic
- Examples
CRS is described in the ModSecurity language.
- This makes it harder for other (non-ModSecurity) WAFs to adapt CRS.
- It makes it hard to maintain these rules in good quality.
Having a more abstracted version which get's compiled down should improve readability.
- A CRS rule set should be as declarative as possible. Imperative programming style is harder to read, understand and test.
- The CRS ruleset should be platform independent and should have tools for compilation from the platform independent format to platform specific formats (for example ModSec)
- There is a difference between a WAF config and the CRS. Today, this is mixed, as CRS is bound to a single WAF. For example, IP reputation is part of WAF but should not be a part of CRS.
- A good WAF should allow 2 kinds of configuration. There should be a declarative part which will be sufficient for 95% of all users. And a scripting language part which should be good enough to solve all the special needs of the remaining users. I see CRS clearly in the declarative config part. A "typical user" should configure their WAF and not program it.
- The CRS rule description should allow positive and negative security problem
- Having a simple to parse, simple to implement declarative language which is just powerful enough to express the CRS.
- Having a compiler from this language into ModSec language. The final ModSec representation of the CRS should match the current CRS.
- Having a simple execution model of a WAF in Python which works together with https://github.com/fastly/ftw to check the ruleset.
- Having compilers from this new language to other WAFs and/or webstacks. For example, having an embedded WAF/CRS interpreter for Java stacks or Django middleware would improve the WAF space and makes CRS a more important player.
- This language should support an positive security model.
- Integrate with vulnerability tools like Threadfix which can generate WAF rules from vulnerability findings
The classical model where the rule tries to detect known attack patterns. If a known attack is detected, the request is flagged as bad.
A model where the rule describes how a variable (a specific part of the request) should look like.
This comes in 2 flavours. A rule can be "required" and/or "sufficient".
- "required" means, when the input does not conform to the rule, the request will be flagged as bad.
- "sufficient" means, when the input does conform to the rule, that then no other rule (from the positive or the negative security model) will be checked on this part of the request (for example this argument).
Exclusions (or Exceptions) are used to adapt a generic rule to a local installation. It declares, the this (generic) rule should not be applied to this specific variable.
- fully declarative (single assignment)
- Vendor independent - should be a community project and part of CRS.
We still need rule-id's for manageability of the CRS.
We need to be able to translate all the rules to ModSecurity and other WAF's rule language. This means that some features can not be expressed. An example for a feature which can not be expressed is a proper check of the Request-Range header which CRS rule 920190 is trying to achieve.
For features like the positive security model, we need a way to mark the value of a variable as not to be checked by upcoming rules. ModSecurity does have the "removeTargetById" feature which does basically this. But doing it right in ModSecurity is hard, because there is no clear distinction between variables which are used for control flow and variables which are used to check for attacks. A good example here is the REQUEST_HEADERS:Content-Type. It is used in control flow (for deciding which body processor to use) and can also be the place to detect attacks. A rule language should distinguish these 2 kinds of usage.
Different people have different opinions about good syntax. But the more important point is the semantic, so we will drop the discussion of the syntax for now.
As a "lingua franca" the language should use a universal data exchange format for it's syntax.
I'm using YAML as syntax, JSON and XML would be fine too, but I think YAML is easier to read (while harder to write than JSON) for a human.
We describe the language here in it's full canonical form. On some places, things can be omitted and abbreviated to make it more readable.
The following scalar data types should be supported
- string
- int
- regex
- bool
Compound types:
- list of (strings, int, regex)
- collection string -> scalar (multi value like ModSec). I'm not sure if we need this now, but wee keep
Operations on these types
-
convert:
- int("42") -> 42
-
length(string) -> int # see transformations below
-
length(list) -> int
-
names(collection) -> [string]
-
group(string list) -> collection of element from the list and the number of their occurrence
-
transformation -> every useful transformation from string -> string or int which is used in CRS
Note that we have a separate type for regex and we also allow a list of regex here. Both can be used as a parameter of the @rx
or similar operators. A list of regex is here equivalent to an |
concatenation of all the regexes in the list. This would allow to write these large and ugly regexes actually in a more readable form and let the compiler do the optimisation if necessary.
All variables which exist in ModSecurity which describe a part of the request do exist with the same or similar name here for pragmatic reasons. We may rename them later.
What is open here, is a clear definition what the variable means, e.g. it is already urldecoded or not.
Define a constant "max_body_size" as an integer with the value of 32k (32768)
- define:
name: max_body_size
type: int
value: 32k
Define "restricted_extensions" as a list of strings. The final transformation is optional and I'm not sure if it is needed. But it can be used to map
the list.
- define:
name: restricted_extensions
type: [string]
value:
- "asa"
- "asax"
- ...
- "xsd"
- "xsx"
transformations:
- ".%{$0}"
Define unix_shell_data
as a list of strings, and load the actual value from an external resource.
- define:
- name: unix_shell_data
- type: [string]
- load: "unix-shell.data"
Note that the types are redundant here, because they can be derived from the syntax. But I think it is good to require explicit typing.
If you do not add a value here, this is only a declaration. This introduced the variable but set the value to unset
with is represented by null
. This value is special and we will explain later how variables with the value are handled in conditions in rules and control flow.
We need a way to extract data from the request if the underlying WAF does not already have this variable. Here we are using the define
together with an extract
statement
- define:
comment: extract the request extension, first chain from 912150
name: request_basename_extension
type: string
extract:
variable: REQUEST_BASENAME
pattern: /(\.[a-z0-9]{1,10})?$/
value: $1
If the variable is not defined, the new variable is not defined. A
rule will not execute on this variables (same as for pre-defined
variables which does not exist, for example REQUEST_HEADER:foo
).
Not sure if we should allow operations on lists or collections here or if variable should always be a scalar.
In an ideal world, we should never modify a variable. So we should treat them as constants.
But for some special cases in the application specific exclusion handling, we are adding 2 operators which are working on lists: add-to-list
and remove-from-list
.
To keep the declarative behaviour, it is not allowed to have the same string in an add-to-list
and remove-from-list
for the same list. Which means that the order of the add/remove ops are not relevant and it is still declarative in some sense.
FIXME: I still looking for a better name for these 2 modifies which make the declarative behaviour more clear. Something along the lines of "ensure-in-list" and "ensure-not-in-list"
- add-to-list:
variable: allowed_request_content_type
elements:
- "application/special-content-type"
- remove-from-list:
variable: allowed_request_content_type
elements:
- "text/xml"
- "application/xml"
- "application/soap+xml"
There are 3 different kind of conditions in this language. They all describe the same concept (a boolean expression) but these are used in 3 different contexts.
The first context is control flow. The result of the evaluation of the conditions decides, if a block of code is active for this request. This condition is part of an if/then/else control flow.
The second context is similar to the first, this are preconditions for rules. These conditions decide, if a rule should be executed. This condition is part of the if block inside a rule.
The third is fundamental different. This is the conditions where the rule is checking for an attack in some variable. Like checking if the string </
is part of the user input. This is part of the "detect" part of the rule.
While these 3 conditions are formally interchangeable (you can always move the "detect" part to the precondition of the rule and use an "always match" operator for the "detect" part), we are making them different here to explicitly distinguish between control flow an attack detection. This is important when we using "remove-target-from-rule" or positive security model rule interact in a sane way with rules.
Also, we restrict the "detect" context to having only one condition (or multiple conditions on the same variable(s))
All 3 conditions are more or less the same as ModSecurity variables + operators
- detect:
comment: check if the extension of the request is in the list of restricted extensions
variables:
- request_basename_extension
transformations:
- lowercase
operator: in
parameter: $(restricted_extensions)
- detect:
variables:
- ARGS
- REQUEST_HEADERS
exclude:
- ARGS:editor_input_field
operator: rx
parameter: /script>/
All predefined variables (like ARGS
and REQUEST_HEADER:Content-Length
) and all user defined variables
can be used here. It is an error when the compiler can not determine
that a variable is declared here (e.g. if you declare a variable
in an if block but not in an else block.
FIXME: there may be a problem here with bool and set-one semantic. Think about it later.
There are additional operators for new data types:
- var in list
- var == element
- var ~ regex
- var exists - is defined and not nil
To allow optional rules (think of skip rules in ModSecurity or flags which are checked in every chain rule)
In the then
and else
block can contain anything which is allowed on the toplevel, e.g. "define", "rule", "if" and "include"
The else
part is optional
- if:
conditions:
- condition 1
- condition 2
then:
- define
- rule
- rule
else:
- define
To allow modularisation, we should allow an include directive
- include: name-of-file
- include:
- file-name-1
- file-name-2
The following list of actions can be executed, either in global
, in an if
block or in the action part of of a rule.
- default
- deny
- disable-rule (by id, by id-range, by tag)
- remove-variable-from-rule
- allow
- actions:
- disable-rule: 12345
- remove-variable-from-rule:
variable: ARGS:password
rules: 1-9999999
- actions:
- block
- actions:
- block:
comment: do we really need to be this specific here?
reason: Content-Length header is required.
code: 411
Note that there is no setvar here, because I think it is not needed. All the anomaly scoring stuff can be done by the compiler, names and valued can be derived from the severity, phase and paranoia level.
A rule contains of meta data, optional preconditions and detect rule. I removed the actions here, because they are probably not needed and is always block - depended on the severity is may also only be log - but this is not part of the rule knowledge but part of the environment and the compiler
- rule:
id: 999999
meta:
phase: request # not sure if we need this
message: "Possible Foo attacks"
paranoia-level: 1
severity: CRITICAL # also used to determine anomaly value
version: 1
# ...
tags:
- "application-multi"
preconditions:
- variable: REQUEST_METHOD
operator: streq
parameter: "POST"
- variable: basename_extension
operator: streq
parameter: "foo"
detect:
variables:
- ARGS
transformations:
- removeSpaces
checks:
- operator: rx
parameter: /some crazy regex/
Note that "checks" is a list of multiple conditions on these variables. The conditions must all be true to match a variable. In most rules, we will only have one condition. In this case, as a shortcut, "checks" can be omitted and operator and parameter can be moved up one level in the structure.
The rules are looking the same as above. Instead of "detect" we are using "ensure" to define how a variable should look like. As mentioned in the beginning, positive security rules can be "required", "sufficient" or both. If they are "required" and a variable does not match, this counts as "detecting an attack". If the rule is sufficient and a variable match, this means that this variable will not be checked by following rule, especially by the classical negative security model rules. To make the rule order still declarative, the compiler will warn you if the result is oder depended. From a performance point of view, we prefer to execute all positive security model rules before the negative security model rules and execute the sufficient rules before the required rules.
- rule:
id: 42
meta:
comment:
- We do not care about __utm request cookie in the following rules, as long
- as it is not bigger than 4k. We will reject a request with an __utm cookie
- larger than 4k
ensure:
variables:
- REQUEST_COOKIES:__utm
transformations:
- length
checks:
- operator: less
parameter: 4k
mode:
- sufficient
- required
- rule:
id: 23
meta:
comment: a product id is used a direct database reference. It must be a number.
ensure:
variables:
- ARGS:product_id
checks:
- operator: rx
parameter: /^\d{1,20}$/
mode:
- required
- sufficient
- rule:
id: 34
meta:
comment: input field foo is base64 encoded stuff
ensure:
variables:
- ARGS:foo
checks:
- operator: rx
parameter: /^[0-9a-zA-Z+/]+=?=?$/
mode:
- requried
This is probably not needed, just an idea. I'm not sure if this will simplify rule writing and understanding.
But there may be a use case for a simple form of single inheritance for rules, to avoid repetitive typing:
Defining a template for a rule:
- template:
name: name-of-template
rule:
Using a template for a rule:
- rule:
id: 123456
template: name-of-template
detect:
...
In this case, rule is created by cloning the rule from the template and updating all fields which are set in the rule itself. lists are overwritten, objects will be updated.
A rule in a template can inherit from another template.
This sections contain examples on how to express some more complicated ModSecurity rules in the new language:
Interesting rule from owasp-modsecurity/ModSecurity#1632
#example from https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#base64decode
SecRule REQUEST_HEADERS:Authorization "^Basic ([a-zA-Z0-9]+=*)$" "phase:1,id:93,capture,chain,logdata:%{TX.1}"
SecRule TX:1 ^(\w+): t:base64Decode,capture,chain
SecRule TX:1 ^(admin|root|backup)$
- define:
name: basic_auth_header
type: string
extract:
variable: REQUEST_HEADERS:Authorization
pattern: /^Basic\s+([a-zA-Z0-9]+=*)$/
value: $1
- define:
name: basic_auth_header_username
type: string
extract:
variable: basic_auth_header
transformations: base64decode
pattern: /^([^:]+):/
value: $1
- define:
name: invalid_basic_auth_usernames
type: [string]
value:
- admin
- root
- backup
- rule:
...
condition:
variable: basic_auth_header_username
operator: in
value: $invalid_basic_auth_usernames
See rule 942150
SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@pmf sql-function-names.data" \
"id:942150,\
phase:2,\
block,\
capture,\
t:none,t:urlDecodeUni,t:lowercase,\
msg:'SQL Injection Attack',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\
tag:'application-multi',\
tag:'language-multi',\
tag:'platform-multi',\
tag:'attack-sqli',\
tag:'OWASP_CRS/WEB_ATTACK/SQL_INJECTION',\
tag:'WASCTC/WASC-19',\
tag:'OWASP_TOP_10/A1',\
tag:'OWASP_AppSensor/CIE1',\
tag:'PCI/6.5.2',\
tag:'paranoia-level/2',\
ctl:auditLogParts=+E,\
rev:2,\
ver:'OWASP_CRS/3.0.0',\
severity:'CRITICAL',\
chain"
SecRule MATCHED_VARS "@rx (?i)\b(?:c(?:o(?:n(?:v(?:ert(?:_tz)?)?|c....
"setvar:'tx.msg=%{rule.msg}',\
setvar:'tx.sql_injection_score=+%{tx.critical_anomaly_score}',\
setvar:'tx.anomaly_score=+%{tx.critical_anomaly_score}',\
setvar:'tx.%{rule.id}-OWASP_CRS/WEB_ATTACK/SQL_INJECTION-%{matched_var_name}=%{tx.0}'"
Can be translated to
- define:
name: sql_function_names
type: [string]
load: "sql-function-name.data"
- define:
name: sql_function_names_regex
type: regex
value: /(?i)\b(?:c(?:o(?:n(?:v(?:ert(?:_tz)?)?|c.......
- rule:
id: 942150
meta:
....
detect:
variables:
- REQUEST_COOKIES
- REQUEST_COOKIES_NAMES
- ARGS_NAMES
- ARGS
- XML:/*
exclude:
- REQUEST_COOKIES:/__utm/
checks:
- operator: pm
parameter: ${sql_function_name}
- operator: rx
parameter: ${sql_function_names_regex}
- rule:
id: 920100
comment: Check HTTP/1.1 request line for correctness
meta:
name: HTTP/1.1 Request line
message: "Invalid HTTP Request Line"
strategy: whitelist
paranoia-level: 1
version: OWASP_CRS/3.1.0
tags:
- CAPEC:
- 272
ensure:
variables:
- REQUEST_LINE
transformations:
- lowercase
checks:
- operator: rx
parameter:
- /^(?i:(?:[a-z]{3,10}\s+(?:\w{3,7}?:\/\/[\w\-\.\/]*(?::\d+)?)?\/[^?#]*(?:\?[^#\s]*)?(?:#[\S]*)?)?)$/
- /^(?i:(?:connect (?:\d{1,3}\.){3}\d{1,3}\.?(?::\d+)?)?)$/
- /^(?i:get \/[^?#]*(?:\?[^#\s]*)?(?:#[\S]*)?)$/
mode:
- required
- rule:
id: 920120
comment: Block Filenames with out of place metachars
meta:
name: HTTP/1.1 Request line
strategy: blacklist
message: "Attempted multipart/form-data bypass"
paranoia-level: 1
version: OWASP_CRS/3.1.0
tags:
- CAPEC:
- 272
detect:
variables:
- REQUEST_LINE
transformations:
- htmlEntityDecode
- lowercase
checks:
- operator: rx
parameter:
- /;/
- /['"=]/
- rule:
id: 920160
comment: Check to ensure content-length header is numeric
meta:
name: Content-length numeric header
strategy: whitelist
message: "Content-Length HTTP header is not numeric"
paranoia-level: 1
version: OWASP_CRS/3.1.0
tags:
- CAPEC:
- 272
ensure:
variables:
- REQUEST_HEADERS:Content-Length
checks:
- operator: rx
parameter:
- /^\d+$/
mode:
- required