Skip to content

grep and sed

Meg Staton edited this page Nov 2, 2022 · 3 revisions

Grep

Goal: search for text in a given string or file

Create a file, data.txt

## Cherokee Brave (red bract, red leaf) by App Spring (white bract, green leaf)
## phenotyping data
## ID BRACT_COLOR LEAF_COLOR HEIGHT
121 white green 210
342 red green 111
566 white green 220
578 red red 311
789 white red 250
999 red red #

Grep searches for a thing and returns LINES that contain that thing

grep red data.txt
grep white data.txt
grep 9 data.txt

Grep can count the NUMBER OF LINES with that thing.

grep -c red data.txt
grep -c white data.txt
grep -c 9 data.txt

What if you want your search to be case insensitive?

grep 'bract' data.txt
grep -i 'bract' data.txt

Grep searches for a thing and with -v returns LINES that DO NOT contain that thing

grep -v red data.txt
grep -v 9 data.txt

Grep can count the NUMBER OF LINES that DO NOT contain that thing.

grep -vc red data.txt
grep -vc 9 data.txt

This is going to be useful for discarding headers, but first we need...

Regular expressions

Regex for short.

Basic building blocks:

. (dot) - a single character.
? - the preceding character matches 0 or 1 times only.
* - the preceding character matches 0 or more times.
+ - the preceding character matches 1 or more times.
{n} - the preceding character matches exactly n times.
{n,m} - the preceding character matches at least n times and not more than m times.
[agd] - the character is one of those included within the square brackets.
[^agd] - the character is not one of those included within the square brackets.
[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
() - allows us to group several characters to behave as one.
| (pipe symbol) - the logical OR operation.
^ - matches the beginning of the line.
$ - matches the end of the line.

Dot matches any single character

grep '2.0' data.txt

^ anchors a pattern to the beginning of the line

grep '#' data.txt
grep '^#' data.txt

Now we can see how to remove header lines!

grep -v '^#' data.txt 

Don't forget pipes

grep -v '^#' data.txt | grep 'red'

grep

Sed

Find and replace practice