Unix Fu - Sed – dev.in.the.shell

Let the substitution begin!

Sed is much more than a ‘find and replace’ utility.
It allows you to do all kinds of things with your files!

This post leans heavily on this amazing work.
Still not satisfied? Dive deeper (at your own risk).

Keep in mind

Not all Sed implementations are created equal.
This post references the GNU version as it has a lot of cool features that OSX, the various BSDs and Busybox variants are missing.

Hopefully most of the following information is generic enough to be applicable to most Sed implementations.

Also, don’t obsess with Sed.
It’s really cool but if you are having trouble achieving your goal with it (especially if dealing with line breaks), you can try other cool programs like tr and Awk.
They might be better suited depending on the use case.

Basics

Sed stands for Stream EDitor, you can edit a stream like this:
echo "searching, seek and destroy" | sed 's/seek/destroy/g'

Or run the program directly on a file like this:

sed 's/seek/destroy/g' lightning.md
 |           |              |
sed      'do_this'       on_this

Let’s break down the ‘do_this’ part:
Sed will Substitute ‘seek’ with ‘destroy’ Globally* within ‘lightning.md’.

*Sed operates on a per-line basis, so when we determine the scope (Global in the example), we are referring to the scope within each line.

As is the case with most terminal utilities, it defaults output to stdout, so no changes will be done to our lightning.md file.
We can pass it the -i flag to make the changes ‘in place’, which means to overwrite the original file.

Of course, we can also redirect its output to a different file with >.

Recap

So given a file like:

1
2


This line contains the word line twice
This line also contains the word line twice

If we run a Sed command like sed 's/line/potato/' test-one-line.md, it would print the following to stdout:

This potato contains the word line twice
This potato also contains the word line twice

Notice how we didn’t use the Global scope, so Sed parsed only the first instance of line on both lines.

Using the -i flag it will overwrite test-two-lines.md

Quality of life

Always quote

Notice the ' in sed 's/seek/destroy/g'.
These are there to prevent any regex we might use from leaking out to the shell.
You can omit them if not using regex, but it’s best to quote.

Extended Regex

We can choose to use Extended regex by passing the -E flag to the command.
If you find your super nice regex to not work as expected, this will most likely fix it.

Learn more about regex!

Pick a convenient delimiter

Speaking of regex, those get messy sometimes, and you will have to escape a few special characters (like the delimiter /).

You might find it useful to change delimiter, especially when using Sed on paths:
sed 's/\/bin\/bash\//\/bin\/sh\//g' -> sed 's:/bin/bash/:/bin/sh/:g'

Sed doesn’t really care what you use as long as you keep consistency with your delimiter.

Common use cases

Remove all EOL spaces

sed 's/\s$//g'
Remove all spaces at the end of all lines in the given file.

The \s is simply a way of representing white spaces.
You can learn more about it here.

Delete all instances of word

sed 's/foo//g'
Delete all instances of foo.

You might be tempted to use something like s/*foo*//g to delete any line containing foo.
Don’t, it will leave a line break behind.
There is a delete command for this use case.

Apply only to nth instance

sed 's/lorem/dolor/2'
Substitute lorem for dolor only on the 2nd instance of lorem of every line.

Apply only from nth instance

sed 's/lorem/dolor/2g'
Substitute lorem for dolor from the 2nd instance of lorem of every line, until the end of the line.

Apply only on matching lines

sed '/^foo/ s/hi/mom/' file
Substitute hi for mom only on lines that start with foo.

Example use case: migrate CSS classes to camelCase from snake_case, without compromising their properties.
sed -E '/\{$/ s:_(\w+?):\u\1:g' file.css
Substitutes snake-case for camelCase only in lines that end with {.

If that seems like a bunch of random symbols, you’ll love this post!.

Fancy things you can do

fancy

Re-use the match

You can use & to represent the match:
echo "what a nice example, this is a cool program!" | sed 's/[nice|cool]/VERY&/'
Would output:
what a VERYnice example, this is a VERYcool program!

Case-insensitive

You can add an i at the end to make the whole thing case-insensitive:
sed 's/foo/bar/gi'
Which means:
foo Foo -> bar bar

Negate matches

You can tell Sed to do it’s magic only on lines not matching a given pattern:
sed '/ /! s/^/#/' afile.txt
This would comment out (by adding a #) any line not containing a space.

Output replacements to separate file

You can write the lines affected by Sed to a separate file with w:
sed 's_foo_bar_w replacementsFile' fileToModify

Groupings and References

You can leverage the magic of Groupings and References to, for example, switch words around:
sed -E 's:([a-zA-Z]*) ([a-zA-Z]*):\2 \1:' file
Which means:
World Hello -> Hello World

You are going to need the -E flag for this one, since it requires Extended regex support to work.

Want a neat use case?
sed -E 's_(.+?)\[(.+?)\]$([^)]+)$(.+?)_\1\2[^\3]\4\n\n\n[^\3]: \3\n_g' book.md

sweat

Let’s take it apart:

Search

So the ‘search’ part looks like this: (.+?)\[(.+?)\]$([^)]+)$(.+?).
The first and last groupings are pretty simple: ‘whatever goes before/after the mess in between’.

That leaves us with \[(.+?)\]$([^)]+)$, which looks like a mess because we have to escape a lot of regular and squared parenthesis.

There are two distinct zones to this regex: \[(.+?)\] and $([^)]+)$.
The first simply means ‘everything inside [squared parenthesis]’, while the second could also be written like $(.+?)$ (which is pretty much the same as the other one, except for the different parenthesis).
Want to know why to use one instead of the other? Check out this post.

So we have four groups:

Everything before
Everything within []
Everything within ()
Everything after

Replace

On the other hand, the ‘replace’ part reads \1\2[^\3]\4\n\n\n[^\3]: \3\n.

We can see that there are two parts to this mess: \1\2[^\3]\4 and [^\3]: \3, with a bunch of line breaks (\n) here and there.
Notice also how the ‘[squared parenthesis]’ are not escaped here.

The first part simply removes all the parenthesis from the match, while enclosing the third grouping in squared parenthesis and prepending it with a ^.
So text [looks like](a-link) more text becomes text looks like[^a-link] more text.

The second half repeats the previous behavior regarding the third grouping while adding it again after a : and a whitespace.
Taking into account the line breaks, text [looks like](a-link) more text becomes:

text looks like[^a-link] more text


[^a-link]: a-link

WTF

If you’ve ever worked with Pandoc Markdown, you probably saw where that was heading.
We successfully turned Markdown links into Markdown references, without breaking the rest of the line.

Keep in mind that this command will hammer through images (![image-text](image-link)) as well.
You might want to negate those matches with something like /!.*/!.
Also, this command won’t behave nicely on lines with two or more links.

Was it a headache? Sure!
Was it more of a headache than doing it by hand on +400 pages, heavily referenced book? Hell no!

Change cases

Remember those GNU specific goodies mentioned earlier? Here are some:

\l Turn the next character to lowercase.
\L Apply \l until a \U or \E is found.
\u Turn the next character to uppercase.
\U Apply \u until a \L or \E is found.
\E End case conversion started by \L or \U.

So to give a simple example, you can ensure all headings in a .md file start with upper case letters by running this:
sed -E 's/^(#+) (\w+)/\1 \u\2/' cases.md
Which means:
## all caps -> ## All caps

Concatenate multiple commands

Sometimes doing everything in one go is a bit of a headache or actually impossible.
You can pipe Sed commands one after the other by adding the -e flag before them:
sed -Ee 's/(^#+) (\w+)/\1 \u\2/' -e 's/foo/bar/g' cases.md

Other use cases

Sed is a stream editor, so you can do much more than substitutions with it.

Delete

To delete any line containing the word vim you could do:
sed /vim/d file

For a more useful example, you could delete empty lines with:
sed '/^$/d' file

Or delete commented lines starting with # like so:
sed -E '/^#/d' file

Or negate the whole thing and delete everything but commented lines:
sed -E '/^#/!d' file

Print

You can tell Sed to print the lines where replacements are made with p:
sed 's/foo/bar/p' file

You can also simulate grep-like behavior with something like sed '/re/p' file (familiar?), which would simply print all instances of re.

Of course, without the -i flag Sed prints everything else as well, so you end up with the lines you are interested in printed twice.
Pass it the -n flag to make it behave as expected (which is to only print matching lines).

For a more practical example, you can print the lines between two regex matches:
sed -nE '/^between-this/,/^and-this/p' file

Append, Insert and Change

Append text on a new line after each line containing the given regex:
sed '/foo/a\AFTER FOO' file

Insert text on a new line before each line containing the given regex:
sed '/foo/i\BEFORE FOO' file

Change line containing the given regex:
sed '/bar/c\BAR IS CHANGED' file

Unix Fu - Sed