Skip to content

Latest commit

 

History

History
114 lines (83 loc) · 3.64 KB

dedup.rst

File metadata and controls

114 lines (83 loc) · 3.64 KB

dedup

Table of contents

Using dedup command to remove identical document defined by field from the search result.

dedup [int] <field-list> [keepempty=<bool>] [consecutive=<bool>]

  • int: optional. The dedup command retains multiple events for each combination when you specify <int>. The number for <int> must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. Default: 1
  • keepempty: optional. if true, keep the document if the any field in the field-list has NULL value or field is MISSING. Default: false.
  • consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. Default: false.
  • field-list: mandatory. The comma-delimited field list. At least one field is required.

The example show dedup the document with gender field.

PPL query:

os> source=accounts | dedup gender | fields account_number, gender;
fetched rows / total rows = 2/2
+----------------+--------+
| account_number | gender |
|----------------+--------|
| 1              | M      |
| 13             | F      |
+----------------+--------+

The example show dedup the document with gender field keep 2 duplication.

PPL query:

os> source=accounts | dedup 2 gender | fields account_number, gender;
fetched rows / total rows = 3/3
+----------------+--------+
| account_number | gender |
|----------------+--------|
| 1              | M      |
| 6              | M      |
| 13             | F      |
+----------------+--------+

The example show dedup the document by keep null value field.

PPL query:

os> source=accounts | dedup email keepempty=true | fields account_number, email;
fetched rows / total rows = 4/4
+----------------+-----------------------+
| account_number | email                 |
|----------------+-----------------------|
| 1              | [email protected]  |
| 6              | [email protected] |
| 13             | null                  |
| 18             | [email protected]   |
+----------------+-----------------------+

The example show dedup the document by ignore the empty value field.

PPL query:

os> source=accounts | dedup email | fields account_number, email;
fetched rows / total rows = 3/3
+----------------+-----------------------+
| account_number | email                 |
|----------------+-----------------------|
| 1              | [email protected]  |
| 6              | [email protected] |
| 18             | [email protected]   |
+----------------+-----------------------+

The example show dedup the consecutive document.

PPL query:

os> source=accounts | dedup gender consecutive=true | fields account_number, gender;
fetched rows / total rows = 3/3
+----------------+--------+
| account_number | gender |
|----------------+--------|
| 1              | M      |
| 13             | F      |
| 18             | M      |
+----------------+--------+

The dedup command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.