Skip to content

[sambamba view] Filter expression syntax

Artem Tarasov edited this page Jun 8, 2016 · 7 revisions

Sambamba-view supports custom filtering for alignment records. This wiki page describes syntax of filter expressions which are provided by the user with --filter command-line option. Fields and flags are described in the SAM specification.

Syntax overview

A filter expression is a number of basic conditions linked by and, or, not logical operators, and enclosed in parentheses where needed.

Basic condition is a one for a single record field, tag, or flag.

You can use ==, !=, >, <, >=, <= comparison operators for both integers and strings.

Strings are delimited by single quotes, if you need a single quote inside a string, escape it with \.

Usage

Reduce the BAM file to a BAM file containing reads on the second reference sequence chr2 as described in the SAM header.

sambamba view -F "ref_id==1" -f bam HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam > HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr2.bam

Show all read names that start with ERR

sambamba view -F "read_name =~ /^ERR/" HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr1.bam

More examples of filter expressions

    mapping_quality >= 30 and ([RG] =~ /^abcd/ or [NM] == 7)
    read_name == 'abc\'def'

Basic conditions for flags

The following flag names are recognized:

  • paired
  • proper_pair
  • unmapped
  • mate_is_unmapped
  • reverse_strand
  • mate_is_reverse_strand
  • first_of_pair
  • second_of_pair
  • secondary_alignment
  • failed_quality_control
  • duplicate
  • supplementary
  • chimeric

Flag example

    not (unmapped or mate_is_unmapped) and first_of_pair

Basic conditions for fields

Conditions for integer and string fields are supported.

List of integer fields:

  • ref_id
  • position
  • mapping_quality
  • sequence_length
  • mate_ref_id
  • mate_position
  • template_length

List of string fields:

  • read_name
  • sequence
  • cigar
  • strand ('+'/'-')
  • ref_name
  • mate_ref_name

Example

    ref_id == 3 and mapping_quality >= 50 and sequence_length >= 80

Basic conditions for tags

Tags are denoted by their names in square brackets, for instance, [RG] or [Q2]. They support conditions for both integers and strings, i.e. the tag must also hold value of the corresponding type.

In order to do filtering based on the presence of a particular tag, you can use special null value.

Example

    [RG] != null and [AM] == 37