Skip to content

Commit

Permalink
Updating the feature extraction analyzer documentation (#2973)
Browse files Browse the repository at this point in the history
* Update the feature extraction analyzer documentation
  • Loading branch information
jkppr authored Nov 3, 2023
1 parent 11f8256 commit db58229
Show file tree
Hide file tree
Showing 2 changed files with 94 additions and 10 deletions.
2 changes: 1 addition & 1 deletion data/winevt_features.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
# For more details and examples of such an extraction check the Timesketch
# documentation:
#
# TODO(Add documentation link)
# https://timesketch.org/guides/analyzers/feature_extraction/
#
# ------------------------------------------------------------------------
# 4624: An account was successfully logged on.
Expand Down
102 changes: 93 additions & 9 deletions docs/guides/analyzers/feature_extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,41 @@
hide:
- footer
---
The feature extraction analyzer creates attributes out of event data based on regular expressions. Different
features can be specified in the `data/regex_features.yaml` file.
The feature extraction analyzer creates attributes out of event data based on
different extraction plugins.

Please be aware that this analyzer does *not* extract ipv4, email-addresses and similar from *all* events, but only those that match the query_string.
Currently supported:
* Regular expression based extractions
* [Regex Extraction Plugin](#regex-extraction-plugin)
* Plaso parsed windows event logs
* [Winevt Extraction Plugin](#winevt-extraction-plugin)

> **Note**
Please be aware that this analyzer does *not* extract ipv4, email addresses and
similar from *all* events, but only those that match the definitions configured
for the plugins explained below!

### Use case

This analyzer is helpful to built a list of `email_addresses` in a sketch that are used in in `WEBHIST`. To do that, run the analyzer to have the feature extracted. Check the results by running a query: `email_address:*`.
This analyzer is helpful to extract additional data from events as separate
attributes. Those extracted attributes can then be used in search, lookups,
correlations, aggregations or with analyzers.

For example: In the default configuration, the analyzer will extract
`email_addresses` from the message field of events with the source `WEBHIST`
matching the regular expression.

Those results now can be used in an aggregation to plot a table limited to that column.
## Regex Extraction Plugin

Another way of extracting that information is via API, querying events that contain `email_address:*` as a pandas dataframe, and work from there.
This feature extraction plugin uses regular expression to extract matching
strings from an existing event attribute (e.g. message) and adds it as a new
attribute to the event.

### Configuration

A feature extraction definition looks like this:
Features are defined in [data/regex_features.yaml](../../../data/regex_features.yaml)

A regex based feature extraction definition looks like this:

```
name:
Expand All @@ -40,9 +59,9 @@ name:
keep_multimatch: False
```

Each definition needs to define either a query_string or a query_dsl.
Each definition needs to define either a `query_string` or a `query_dsl`.

`re_flags` is a list of flags as strings from the re module. These include:
`re_flags` is a list of flags as strings from the `re` module. These include:
- DEBUG
- DOTALL
- IGNORECASE
Expand Down Expand Up @@ -72,3 +91,68 @@ The feature extraction works in the way that the query is run, and the regular e
The first value extracted is then stored inside the "store_as" attribute.
If there are emojis or tags defined they are also applied to that event.
In the end, if a view is supposed to be created a view searching for the added tag is added (only if there are results).

## Winevt Extraction Plugin

This feature extraction plugin uses configured mappings to create new attributes
for Windows Event Log events that were parsed using [Plaso](https://github.com/log2timeline/plaso).

The mapping is based on the `strings` array, that gets generated by Plaso for
the event data entries.

> **Note**
The winevt extraction plugin does *not* map all Windows Event Log fields. It
does only map the ones configured in [data/winevt_features.yaml](../../../data/winevt_features.yaml)!

### Configuration

Features are defined in [data/winevt_features.yaml](../../../data/winevt_features.yaml)

A mapping for a Windows Event uses the yaml format and looks like this:

```
name:
source_name: Type: list[str] | REQUIRED | case-insensitive
A list of source names to match against. Multiple
entries will be checked with OR.
provider_identifier: Type: list[str] | OPTIONAL | case-insensitive
A list of provider identifiers to match against.
Multiple entries will be checked with OR.
event_version: Type: int | REQUIRED
The event version to match against.
event_identifier: Type: int | REQUIRED
The event identifier to match against.
references: Type: list[str] | OPTIONAL
A list of references to provide as context and
source for the event mapping. E.g. a URL to the
official Microsoft documentation on the event.
mapping: Type: list[dict] | REQUIRED
A list of dicts that define the new attribute name
and the string index of the event to extract the
value from. Additonally it can also contain an
alias list to add multiple attributes with
the same value but different names.
name: Type: str | REQUIRED
The name of the new attribute to create.
string_index: Type: int | REQUIRED | Starting at index 0
The string index of the event to extract the
value from. Based on the plaso extracted "strings"
attribute with Windows eventlog entries.
aliases: Type: list[str] | OPTIONAL
A list of aliases to add additionally to the
offical name of the attribute. This can be used
to add different field names matching individual
field name ontologies. E.g. srcIP, domain, etc.
```

Checkout the preconfigured mappings for some examples:
[data/winevt_features.yaml](../../../data/winevt_features.yaml)

0 comments on commit db58229

Please sign in to comment.