阅读量： 655

sed&gawk

sed

sed是流编辑器，它可以自动完成数据流的编辑（对数据流的每一行都执行），不像vim那样交互式的操作，而是按照顺序逐行执行，只需对数据流处理一遍即可完成编辑操作，这使得sed比交互式编辑器快得多，并且可以对数据自动修改。sed处理文件时不会直接修改文件数据，而是将编辑后的内容输出到终端。

1.命令格式

sed options script file
# options 参数允许修改sed命令的行为
    -e commands 在处理输入时，加入额外的 sed 命令
    -f file 在处理输入时，将 file 中指定的命令添加到已有的命令中
    -n 不产生命令输出，使用 p（print）命令完成输出
# script 参数指定应用于数据的单个命令
    a ：新增， a 的后面可以接字串，而这些字串会在新的一行出现(目前的下一行)～
    c ：修改， c 的后面可以接字串，这些字串可以修改 n1,n2 之间的行！
    d ：删除，因为是删除啊，所以 d 后面通常不接任何东东；
    i ：插入， i 的后面可以接字串，而这些字串会在新的一行出现(目前的上一行)；
    p ：打印，亦即将某个选择的数据印出。通常 p 会与参数 sed -n 一起运行
    s ：替换，可以直接进行字符串替换的工作哩！通常这个 s 的动作可以搭配正则表达式！例如 1,20s/old/new/g 就是啦！
    = ：打印行号，打印结果中行号独立成行，且在对应行前一行

sed命令行中执行多个命令

# 将data1.txt中的brown替换为red，dog替换为cat
sed -e 's/brown/red/; s/dog/cat/' data1.txt

sed中可以使用!表示分隔符

sed 's!/bin/bash!/bin/csh!' /etc/passwd  # 将/bin/bash替换为/bin/csh

2.替换

替换命令使用s，替换命令在替换多行中的文本时也能正常工作，但在默认情况下它只替换每行中出现的第一处匹配文本。要想替换每行中所有的匹配文本，必须使用替换标志（substitution flag）。替换标志在替换命令字符串之后设置。
```
sed 's/pattern/replacement/flags' file
# flags:
    # 数字，指明新文本将替换行中的第几处匹配。
    # g，指明新文本将替换行中所有的匹配。
    # p，指明打印出替换后的行。
    # w file，将替换的结果写入文件。
```

示例

$ cat data4.txt 
This is a test of the test script. 
This is the second test of the test script. 
# 仅替换每行第一处匹配
$ sed 's/test/trial/' data4.txt 
This is a trial of the test script. 
This is the second trial of the test script. 

# 使用数字指定替换每行第2处匹配
$ sed 's/test/trial/2' data4.txt 
This is a test of the trial script. 
This is the second test of the trial script.

# g指定替换所有匹配
$ sed 's/test/trial/g' data4.txt 
This is a trial of the trial script. 
This is the second trial of the trial script. 

...

3.行寻址

默认情况下，sed会将命令应用于每一行，使用行寻址可以将命令作用于特定行或范围

行寻址：1、直接在操作命令前加行号。2、文本模式过滤

# 修改第2行到结尾的所有行
sed '2,$s/dog/cat/' data1.txt 
# 只修改用户 rich 的默认 shell从bash到csh
sed '/rich/s/bash/csh/' /etc/passwd
# 删除第3行
sed '3d' data6.txt 
# 删除包含number 1的行，结果将输出到STDOUT，但实际文件数据不变
sed '/number 1/d' data6.txt

行插入：1、插入（insert）（i）命令会在指定行前增加一行。2、附加（append）（a）命令会在指定行后增加一行。
```
echo "Test Line 2" | sed 'i\Test Line 1' 
Test Line 1 
Test Line 2 

echo "Test Line 2" | sed 'a\Test Line 1' 
Test Line 2 
Test Line 1 
```

修改行：c命令

# 修改第二行
$ sed '2c\this is a changed line of text' data6.txt

# 修改包含某段字符串的行
$ cat data8.txt
I have 2 Infinity Stones
I need 4 more Infinity Stones
I have 6 Infinity Stones!
I need 4 Infinity Stones
I have 6 Infinity Stones...
I want 1 more Infinity Stone
$
$ sed '/have 6 Infinity Stones/c\
> Snap! This is changed line of text.
> ' data8.txt
I have 2 Infinity Stones
I need 4 more Infinity Stones
Snap! This is changed line of text.
I need 4 Infinity Stones
Snap! This is changed line of text.
I want 1 more Infinity Stone

# 使用行区间修改会导致整个区间被一次替换，而不是对区间中的行逐一替换
$ cat data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
$
$ sed '2,3c\
> This is a changed line of text.
> ' data6.txt
This is line number 1. 
This is a changed line of text. 
This is the 4th line.

4.转换命令

转换命令：y命令是唯一可以处理单个字符的sed编辑命令

# 将123分别转换为789,数据流中没有其中的某个字符则跳过该字符的转换
sed 'y/123/789/' data9.txt 

# 示例
$ cat data9.txt 
This is line 1. 
This is line 2. 
This is line 3. 
This is line 4. 
This is line 5. 
This is line 1 again. 
This is line 3 again. 
This is the last file line. 
$ 
$ sed 'y/123/789/' data9.txt 
This is line 7. 
This is line 8. 
This is line 9. 
This is line 4. 
This is line 5. 
This is line 7 again. 
This is line 9 again. 
This is the last file line.

5.处理文件

写入文件：w

# 将data6.txt的前两行写入test.txt中
$ sed '1,2w test.txt' data6.txt 
This is line number 1. 
This is line number 2. 
This is the 3rd line. 
This is the 4th line. 
$ 
$ cat test.txt
This is line number 1. 
This is line number 2.

读取文件：r

# 读取data13.txt文件的第3行以后
$ cat data13.txt 
This is an added line. 
This is a second added line. 
$ 
$ sed '3r data13.txt' data6.txt 
This is line number 1. 
This is line number 2. 
This is the 3rd line. 
This is an added line. 
This is a second added line. 
This is the 4th line.

读取命令与删除命令结合使用，可以将模板中占位符替换为具体内容

$ cat notice.std 
Would the following people: 
LIST 
please report to the ship's captain. 
# 将data12.txt中的内容写入到模板中LIST占位符位置
$ sed '/LIST/{
> r data12.txt
> d
> }' notice.std
Would the following people: 
Blum, R Browncoat 
McGuiness, A Alliance 
Bresnahan, C Browncoat 
Harken, C Alliance 
please report to the ship's captain.

6.多行命令

单行next命令:单行 next（n）命令会告诉 sed 编辑器移动到数据流中的下一行，不用再返回到命令列表的最开始位置。通常 sed 编辑器在移动到数据流中的下一行之前，会在当前行中执行完所有定义好的命令，而单行 next 命令改变了这个流程。

# 需求为删除首行之后的空行，保留末行之前的空行
$ cat data1.txt 
Header Line 
Data Line #1 
End of Data Lines 
$ sed '/Header/{n ; d}' data1.txt 
Header Line 
Data Line #1 
End of Data Lines 
# 由于要删除的行是空行，因此没有任何能够标示这种行的文本可供查找。解决办法是使用单行 next 命令。先用脚本查找含有单词 Header 的那一行，找到之后，单行 next命令会让 sed 编辑器移动到文本的下一行，也就是我们想删除的空行
# 这时，sed 编辑器会继续执行命令列表，即使用删除命令删除空行。sed 编辑器在执行完命令脚本后会读取数据流中下一行文本，并从头开始执行脚本。因为 sed 编辑器再也找不到包含单词Header 的行了，所以也不会再有其他行被删除。

多行next命令：单行 next 命令会将数据流中的下一行移入 sed 编辑器的工作空间（称为模式空间），多行版本的 next（N）命令则是将下一行添加到模式空间中已有文本之后

# 需求为将包含First的行和下一行合并到同一行
$ cat data2.txt 
Header Line 
First Data Line 
Second Data Line 
End of Data Lines 
$ 
$ sed '/First/{ N ; s/\n/ / }' data2.txt 
Header Line 
First Data Line Second Data Line 
End of Data Lines 
# sed 编辑器脚本先查找含有单词 First 的那行文本，找到该行后，使用 N 命令将下一行与该行合并，然后用替换命令将换行符（\n）替换成空格。这样一来，两行文本在 sed 编辑器的输出中就成了一行。

# 需求为将连续出现或跨行出现的System Admin短语替换为DevOps Engineer
$ cat data4.txt 
On Tuesday, the Linux System 
Admin group meeting will be held. 
All System Admins should attend. 
$ sed '
> s/System Admin/DevOps Engineer/
> N
> s/System\nAdmin/DevOps\nEngineer/
> ' data4.txt
On Tuesday, the Linux DevOps 
Engineer group meeting will be held. 
All DevOps Engineers should attend. 
# 用 N 命令将第一个单词所在行与下一行合并，即使短语内出现了换行，仍然可以查找到该短语。此外，将s/System Admin/DevOps Engineer/命令放在N命令前面可以避免当System Admin 文本出现在数据流中的最后一行时，N命令无法执行，导致最后一行中的短语未被替换

多行删除命令：

# 如果单行删除 (d) 命令与N命令一起使用，会一次性删除两行
$ sed 'N ; /System\nAdmin/d' data4.txt 
All System Admins should attend. 
# 多行删除（D）命令，该命令只会删除模式空间中的第一行，即删除该行中的换行符及其之前的所有字符
sed 'N ; /System\nAdmin/D' data4.txt 
Admin group meeting will be held. 
All System Admins should attend.

gawk

gawk相比于sed具有更丰富的功能，它提供了一种编程语言，而不仅是编辑器命令。可以使用gawk定义编辑脚本、可以使用变量存储数据，gawk脚本中引用变量不用加$。在默认情况下，gawk 会将下列变量分配给文本行中的数据字段
- $0 代表整个文本行。
- $1 代表文本行中的第一个数据字段。
- $2 代表文本行中的第二个数据字段。
- $n 代表文本行中的第 n 个数据字段。
sed脚本和gawk脚本文件应当以.sed和.gawk结尾，用于和shell脚本区分

1.命令格式

gawk options program file
# options:
    # -F fs 指定行中划分数据字段的字段分隔符
    # -f file 从指定文件中读取 gawk 脚本代码
    # -v var=value 定义 gawk 脚本中的变量及其默认值
    # -L [keyword] 指定 gawk 的兼容模式或警告级别
# program gawk脚本，必须放在花括号之间，然后再用单引号将花括号包裹

指定分隔符

 # 指定处理/etc/passwd文件时的分隔符为冒号
 gawk -F: '{print $1}' /etc/passwd

在脚本中使用多条命令

echo "My name is Rich" | gawk '{$4="Christine"; print $0}' 
My name is Christine

2.预处理与后处理

gawk可以指定在处理数据前和后需要完成的命令

$ cat data3.txt 
Line 1 
Line 2 
Line 3 
# 在处理数据前运行脚本使用BEGIN
$ gawk 'BEGIN {print "The data3 File Contents:"} 
> {print $0}' data3.txt 
The data3 File Contents: 
Line 1 
Line 2 
Line 3 
# 在处理数据后运行脚本使用END
$ gawk 'BEGIN {print "The data3 File Contents:"} 
> {print $0}
> END {print "End of File"}' data3.txt
The data3 File Contents: 
Line 1 
Line 2 
Line 3 
End of File

sed&gawk

sed

1.命令格式

2.替换

3.行寻址

4.转换命令

5.处理文件

6.多行命令

gawk

1.命令格式

2.预处理与后处理

推荐文章