-
Notifications
You must be signed in to change notification settings - Fork 1
grep and sed
Meg Staton edited this page Nov 2, 2022
·
3 revisions
Goal: search for text in a given string or file
Create a file, data.txt
## Cherokee Brave (red bract, red leaf) by App Spring (white bract, green leaf)
## phenotyping data
## ID BRACT_COLOR LEAF_COLOR HEIGHT
121 white green 210
342 red green 110
566 white green 220
578 red red 311
789 white red 250
999 red red #
Grep searches for a thing and returns LINES that contain that thing
grep red data.txt
grep white data.txt
grep 9 data.txt
Grep can count the NUMBER OF LINES with that thing.
grep -c red data.txt
grep -c white data.txt
grep -c 9 data.txt
What if you want your search to be case insensitive?
grep 'bract' data.txt
grep -i 'bract' data.txt
Grep searches for a thing and with -v returns LINES that DO NOT contain that thing
grep -v red data.txt
grep -v 9 data.txt
Grep can count the NUMBER OF LINES that DO NOT contain that thing.
grep -vc red data.txt
grep -vc 9 data.txt
This is going to be useful for discarding headers, but first we need...
Regex for short.
Basic building blocks:
. (dot) - a single character.
? - the preceding character matches 0 or 1 times only.
* - the preceding character matches 0 or more times.
+ - the preceding character matches 1 or more times.
{n} - the preceding character matches exactly n times.
{n,m} - the preceding character matches at least n times and not more than m times.
[agd] - the character is one of those included within the square brackets.
[^agd] - the character is not one of those included within the square brackets.
[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
() - allows us to group several characters to behave as one.
| (pipe symbol) - the logical OR operation.
^ - matches the beginning of the line.
$ - matches the end of the line.
Dot matches any single character
grep '2.0' data.txt
^ anchors a pattern to the beginning of the line
grep '#' data.txt
grep '^#' data.txt
Now we can see how to remove header lines!
grep -v '^#' data.txt
Don't forget pipes
grep -v '^#' data.txt | grep 'red'
Find and replace practice