Welcome to the httpd_pyparser
documentation.
The parser runs under Python 3.7+ on Linux, Windows and Mac.
httpd_pyparser is dual licensed under the following licenses. You can use the software according to the terms of your chosen license.
- GNU Affero General Public License (AGPL) v3 with additional terms
- Our Own Proprietary License - please contact with us
This means, we can apply any pull requests from any contributor after the agreement of our CLA. For mor information, please check our contrbuting reference
The parser relies on Ply as its underlying parsing library.
Therefore, to run it you will need:
- a Python 3 interpreter
- Ply - the Python Ley Yacc library
- YAML and/or JSON it you want your output to be either of those
You can install these packages on Debian with this command:
sudo apt install python3-ply python3-yaml python3-simplejson
Try to keep the module updated, because it is under heavy development now.
httpd_pyparser
contains two main submodules:
- apache
- nginx
Both main submodules have three classes:
- Lexer
- Parser
- Writer
Before you start to work with any classplease check the version to make sure you have the current one (0.3
):
$ python3
...
>>> import httpd_pyparser
>>> import httpd_pyparser.apache
>>> import httpd_pyparser.nginx
>>> httpd_pyparser.__version__
'0.3'
>>> httpd_pyparser.apache.__version__
'0.3'
>>> httpd_pyparser.nginx.__version__
'0.3'
The Lexer
classes are wrappers for Ply's lexer
object. You can use it independently, to check and see what tokens are in your Apache
or Nginx
configuration.
Here is a simple example:
$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import httpd_pyparser
>>> import httpd_pyparser.nginx
>>> import httpd_pyparser.apache
>>>
>>> config = """<VirtualHost *:80>
... ServerName www.yourdomain.com
... Redirect / https://www.yourdomain.com
... </VirtualHost>
... """
>>>
>>> mlexer = httpd_pyparser.apache.Lexer()
>>> mlexer.lexer.input(config)
>>> while True:
... tok = mlexer.lexer.token()
... if not tok:
... break
... print(tok)
...
LexToken(T_CONFIG_DIRECTIVE_TAG,'<VirtualHost *:80>',1,0)
LexToken(T_CONFIG_DIRECTIVE,'ServerName',2,23)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'www.yourdomain.com',2,34)
LexToken(T_CONFIG_DIRECTIVE,'Redirect',3,57)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'/',3,66)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'https://www.yourdomain.com',3,68)
LexToken(T_CONFIG_DIRECTIVE_TAG_CLOSE,'</VirtualHost>',4,95)
>>>
>>> config = """server {
... listen 80;
... server_name www.yourhost.com;
...
... location / {
... proxy_set_header X-Real-IP $remote_addr;
... proxy_set_header X-Forwarded-For $remote_addr;
... proxy_set_header Host $host;
... proxy_pass http://vm-lxc1;
... }
... }
... """
>>> mlexer = httpd_pyparser.nginx.Lexer()
>>> mlexer.lexer.input(config)
>>> while True:
... tok = mlexer.lexer.token()
... if not tok:
... break
... print(tok)
...
LexToken(T_CONFIG_DIRECTIVE,'server',1,0)
LexToken(T_BRACE_OPEN,'{',1,7)
LexToken(T_CONFIG_DIRECTIVE,'listen',2,13)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'80',2,20)
LexToken(T_SEMICOLON,';',2,22)
LexToken(T_CONFIG_DIRECTIVE,'server_name',3,28)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'www.yourhost.com',3,40)
LexToken(T_SEMICOLON,';',3,56)
LexToken(T_CONFIG_DIRECTIVE,'location',5,63)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'/',5,72)
LexToken(T_BRACE_OPEN,'{',5,74)
LexToken(T_CONFIG_DIRECTIVE,'proxy_set_header',6,84)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'X-Real-IP',6,101)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'$remote_addr',6,111)
LexToken(T_SEMICOLON,';',6,123)
LexToken(T_CONFIG_DIRECTIVE,'proxy_set_header',7,133)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'X-Forwarded-For',7,150)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'$remote_addr',7,166)
LexToken(T_SEMICOLON,';',7,178)
LexToken(T_CONFIG_DIRECTIVE,'proxy_set_header',8,188)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'Host',8,205)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'$host',8,210)
LexToken(T_SEMICOLON,';',8,215)
LexToken(T_CONFIG_DIRECTIVE,'proxy_pass',9,225)
LexToken(T_CONFIG_DIRECTIVE_ARGUMENT,'http://vm-lxc1',9,236)
LexToken(T_SEMICOLON,';',9,250)
LexToken(T_BRACE_CLOSE,'}',10,256)
LexToken(T_BRACE_CLOSE,'}',11,258)
The Parser
classes are wrappers for Ply's parser
object. The parser object needs a lexer class, but both Parser
classes invoke the required Lexer
and sets it up.
Here is a simple example:
>>> mparser = httpd_pyparser.nginx.Parser()
>>> mparser.parser.parse(config)
>>> print(mparser.configlines)
[{'type': 'directive', 'value': 'server', 'lineno': 1, 'arguments': [], 'blocks': [[{'type': 'directive', 'value': 'listen', 'lineno': 2, 'arguments': [{'value': '80', 'lineno': 2, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'server_name', 'lineno': 3, 'arguments': [{'value': 'www.yourhost.com', 'lineno': 3, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'location', 'lineno': 5, 'arguments': [{'value': '/', 'lineno': 5, 'quote_type': 'no_quote'}], 'blocks': [[{'type': 'directive', 'value': 'proxy_set_header', 'lineno': 6, 'arguments': [{'value': 'X-Real-IP', 'lineno': 6, 'quote_type': 'no_quote'}, {'value': '$remote_addr', 'lineno': 6, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_set_header', 'lineno': 7, 'arguments': [{'value': 'X-Forwarded-For', 'lineno': 7, 'quote_type': 'no_quote'}, {'value': '$remote_addr', 'lineno': 7, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_set_header', 'lineno': 8, 'arguments': [{'value': 'Host', 'lineno': 8, 'quote_type': 'no_quote'}, {'value': '$host', 'lineno': 8, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_pass', 'lineno': 9, 'arguments': [{'value': 'http://vm-lxc1', 'lineno': 9, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}]]}]]}]
>>>
>>>
>>> config = """<VirtualHost *:80>
... ServerName www.yourdomain.com
... Redirect / https://www.yourdomain.com
... </VirtualHost>
... """
>>> mparser = httpd_pyparser.apache.Parser()
>>> mparser.parser.parse(config)
>>> print(mparser.configlines)
[{'type': 'directive_tag', 'value': 'VirtualHost', 'lineno': 1, 'arguments': [{'value': '*:80', 'quote_type': 'no_quote', 'lineno': 1}], 'blocks': [[{'type': 'directive', 'value': 'ServerName', 'lineno': 2, 'arguments': [{'value': 'www.yourdomain.com', 'lineno': 2, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'Redirect', 'lineno': 3, 'arguments': [{'value': '/', 'lineno': 3, 'quote_type': 'no_quote'}, {'value': 'https://www.yourdomain.com', 'lineno': 3, 'quote_type': 'no_quote'}], 'blocks': []}]]}, {'type': 'directive_tag_close', 'value': 'VirtualHost', 'lineno': 4, 'arguments': [], 'blocks': []}]
These classes transforms the inside structure to the string. You can save the result to a file. This class converts YAML, JSON, etc, to a config file. See the example file test_writer.py
for how it works.
Here is a simple example:
struct = [{'type': 'directive_tag', 'value': 'VirtualHost', 'lineno': 1, 'arguments': [{'value': '*:80', 'quote_type': 'no_quote', 'lineno': 1}], 'blocks': [[{'type': 'directive', 'value': 'ServerName', 'lineno': 2, 'arguments': [{'value': 'www.yourdomain.com', 'lineno': 2, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'Redirect', 'lineno': 3, 'arguments': [{'value': '/', 'lineno': 3, 'quote_type': 'no_quote'}, {'value': 'https://www.yourdomain.com', 'lineno': 3, 'quote_type': 'no_quote'}], 'blocks': []}]]}, {'type': 'directive_tag_close', 'value': 'VirtualHost', 'lineno': 4, 'arguments': [], 'blocks': []}]
>>> mwriter = httpd_pyparser.apache.Writer(struct, " ")
>>> mwriter.generate()
>>> print("\n".join(mwriter.output))
<VirtualHost *:80>
ServerName www.yourdomain.com
Redirect / https://www.yourdomain.com
</VirtualHost>
>>>
>>> struct = [{'type': 'directive', 'value': 'server', 'lineno': 1, 'arguments': [], 'blocks': [[{'type': 'directive', 'value': 'listen', 'lineno': 2, 'arguments': [{'value': '80', 'lineno': 2, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'server_name', 'lineno': 3, 'arguments': [{'value': 'www.yourhost.com', 'lineno': 3, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'location', 'lineno': 5, 'arguments': [{'value': '/', 'lineno': 5, 'quote_type': 'no_quote'}], 'blocks': [[{'type': 'directive', 'value': 'proxy_set_header', 'lineno': 6, 'arguments': [{'value': 'X-Real-IP', 'lineno': 6, 'quote_type': 'no_quote'}, {'value': '$remote_addr', 'lineno': 6, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_set_header', 'lineno': 7, 'arguments': [{'value': 'X-Forwarded-For', 'lineno': 7, 'quote_type': 'no_quote'}, {'value': '$remote_addr', 'lineno': 7, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_set_header', 'lineno': 8, 'arguments': [{'value': 'Host', 'lineno': 8, 'quote_type': 'no_quote'}, {'value': '$host', 'lineno': 8, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}, {'type': 'directive', 'value': 'proxy_pass', 'lineno': 9, 'arguments': [{'value': 'http://vm-lxc1', 'lineno': 9, 'quote_type': 'no_quote'}, {'value': None, 'quote_type': 'no_quote'}], 'blocks': []}]]}]]}]
>>> mwriter = httpd_pyparser.nginx.Writer(struct, " ")
>>> mwriter.generate()
>>> print("\n".join(mwriter.output))
server {
listen 80;
server_name www.yourhost.com;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
proxy_pass http://vm-lxc1;
}
}
The Parser
classes reads the configuration files, and transforms them into a Python list
. Every item in this list is a dictionary
. Every dictionary item has the keys type
and lineno
. Depending on the type
there might be additional keys.
These are the supported types:
- Comment
- Directive
There are four types of dictionary objects for types above:
{
'type': 'comment',
'value': <class 'str'>,
'lineno': <class 'int'>
}
{
'type': 'directive',
'value': <class 'str'>,
'lineno': <class 'int'>,
'arguments': <class 'list' of 'arg'>,
'blocks': <class 'list' of 'directive' or 'directive_tag'>
}
{
'type': 'directive_tag',
'value': <class 'str'>,
'lineno': <class 'int'>,
'arguments': <class 'list' of 'arg'>,
'blocks': <class 'list' of 'directive' or 'directive_tag'>
}
{
'type': 'directive_tag_close',
'value': <class 'str'>,
'lineno': <class 'int'>,
'arguments': None,
'blocks': None
}
# arg type:
{
'value': <class 'str'>,
'quote_type': QUOTE_TYPE,
'lineno': <class 'int'>
}
{
'type': 'comment',
'value': <class 'str'>,
'lineno': <class 'int'>
}
{
'type': 'directive',
'value': <class 'str'>,
'lineno': <class 'int'>,
'arguments': <class 'list' of 'arg'>,
'blocks': <class 'list' of 'directive'>
}
# arg type:
{
'value': <class 'str'>,
'lineno': <class 'int'>,
'quote_type': QUOTE_TYPE
}
Quote type:
'QUOTE_TYPE' could be item from set('no_quote', 'quotes', 'quoted')
where
no_quote
- there isn't any quote markquotes
- means Single quote ('
)quoted
- means Ddouble quote ("
)
Description: type of the configuration directive
Used at: Comment, Directive, DirectiveTag
Syntax: 'type': <class 'str'>
Example Usage: 'type': "DirectiveTag"
Default Value: no default value
Possible value: Comment
, or any possible directive in ModSecurity (except DirectiveTag
and Directive
)
Scope: Comment, Directive or DirectiveTag dictionary
Added Version: 0.1
Description: line number in the original file
Syntax: 'lineno': <class 'int'>
Example Usage: 'lineno': 10
Default Value: no default value
Possible value: a positive integer
Scope: every item in the list
Added Version: 0.1
Description: the dictionary next to the directive
Syntax: {'argument': <class 'str'>, 'quote_type': QUOTE_TYPE}
Example Usage: {'argument': '# this is a comment', 'quote_type': 'no_quote'}
Default Value: no default value
Possible value: no restrictions
Scope: Comment or Directive dictionary
Added Version: 0.1
Changd in: 1.0
Description: indicates if the argument was quoted or not
Syntax: 'quoted': <class 'str'>
Example Usage: 'quoted': quotes
Default Value: no_quoted
Possible value: no_quoted
, quoted
(quoted with DOUBLE quotes "
), quotes
(quoted with SINGLE quotes '
)
Scope: Dictionary key in Comment, Directive types, and used list: variables, actions and arguments.
Added Version: 0.1
There is the examples/
subdirectory with some examples, data, and descriptions in the code. There are three scripts:
examples/test_lexer.py
examples/test_parser.py
examples/test_writer.py
All of them demonstrates how the classes work. There are two more scripts in root directory of source tree, which converts Apache2 or Nginx configuration files (without recursion) into YAML or JSON files.
Let's see more details with help of examples.
Consider a very simple configuration file for Apache2 web server:
<VirtualHost *:80>
ServerName www.yourdomain.com
</VirtualHost>
The parser will generate this structure:
[
{
"type": "directive_tag",
"value": "VirtualHost",
"lineno": 1,
"arguments": [
{
"lineno": 1,
"quote_type": "no_quote",
"value": "*:80"
}
],
"blocks": [
[
{
"type": "directive",
"value": "ServerName",
"lineno": 2,
"arguments": [
{
"lineno": 2,
"quote_type": "no_quote",
"value": "www.yourdomain.com"
}
],
"blocks": []
}
]
]
},
{
"type": "directive_tag_close",
"value": "VirtualHost",
"lineno": 3,
"arguments": [],
"blocks": []
}
]
This is a list with two items. First item is a directive_tag
type, the value is the VirtualHost
. This is in the first line. The tag has an argument: *:80
. The second item is a directive_tag_close
, which indicates this is the end of the block. The first item has a blocks
key, which contains the blocks. Every block item can be a comment
, directive
or directive_tag
. The blocks
list has one item, generated from the ServerName www.yourdomain.com
. This is a directive
(not directive_tag
), value is the ServerName
in 2nd line, and it has one argument: www.yourdomain.com
.
Now take a look to Nginx config:
server {
server_name www.yourhost.com;
location / {
proxy_pass http://vm-lxc1;
}
}
This will generates a structure like this:
[
{
"type": "directive",
"value": "server",
"lineno": 1,
"arguments": [],
"blocks": [
[
{
"type": "directive",
"value": "server_name",
"lineno": 2,
"arguments": [
{
"lineno": 2,
"quote_type": "no_quote",
"value": "www.yourhost.com"
},
{
"quote_type": "no_quote",
"value": null
}
],
"blocks": []
},
{
"type": "directive",
"value": "location",
"lineno": 4,
"arguments": [
{
"lineno": 4,
"quote_type": "no_quote",
"value": "/"
}
],
"blocks": [
[
{
"type": "directive",
"value": "proxy_pass",
"lineno": 5,
"arguments": [
{
"lineno": 5,
"quote_type": "no_quote",
"value": "http://vm-lxc1"
},
{
"quote_type": "no_quote",
"value": null
}
],
"blocks": []
}
]
]
}
]
]
}
]
If you run into unexpected behavior, found a bug, or have a feature request, just create a new issue, or drop an e-mail to us: modsecurity at digitalwave dot hu.
Actually, there isn't any know bug.