Shell Script

16 Advanced Sed

트리스탄1234 2022. 8. 31. 19:52
728x90
반응형

 

1. Multiline Commands

Basically, the sed editor operates in a structure that processes one line at a time from multiple data lines and processes the next line. However, when the data to be processed spans multiple lines, data processing becomes difficult with the basic sed processing structure. For this, the advanced sed editor provides the following three commands.

 

■ N: Allows adding a new data line to an existing data stream.

■ D: Deletes one line from several groups.

■ P: Outputs one matching line from multiple data groups.

next command

The lowercase n command tells the sed editor to skip to the next line in the data stream. Let's take a look at the example below to help understand the next command..

 
$ cat test1
This is the header line.


This is a data line.

This is the last line.

=>If you look at the contents of the test1 file, 3 strings are separated by empty strings


$ sed ’/header/{
> n ==> Instructs the sed editor to move to the next linesed
> d ==>Instruction to delete the line from the blank line of the cut
> }’ data1
This is the header line.
This is a data line.

This is the last line.


==>If you look at the result, you can see that the space between the first and second lines has disappeared.


$
But if you want to delete all blank lines, you can use like below

$ sed ’/^$/d’ data1 ==> Delete all blank lines between the beginning and the end.
This is the header line.
This is a data line.
This is the last line.

 

combine multiple sentences

Let's briefly explain the difference between n and N. The n command moves to the next line when it encounters a string that matches the pattern, whereas N inserts the next line after the string that matches the pattern when it encounters a string that matches the pattern. In other words, it has the effect of merging two strings. Let's look at an example below.

$ cat data2 -==>Let's take a look at the contents of the data2 file.
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

$ sed ’/first/{ ==>When it encounters a string with first
> N ==> Move to the next line of text and insert it after pattern
> s/\n/ / > }’ data2 ==>Merge 2 sentences after changing the newline character to a space

This is the header line.
This is the first data line. This is the second data line.
This is the last line.
$



Let's take a look at the example below. It is easy to process the data matching the pattern in one line, but it is not possible to process the matching pattern through the two lines below.

$ cat data3
The first meeting of the Linux System
Administrator’s group will be held on Tuesday.
All System Administrators should attend this meeting.
Thank you for your attendance.

$ sed ’s/System Administrator/Desktop User/’ data3
==> Change System Administrator to Desktop
The first meeting of the Linux System
==>Patterns spanning the first and second lines are not processed.
Administrator’s group will be held on Tuesday.
All Desktop Users should attend this meeting. ==>String matching pattern is replaced pattern
Thank you for your attendance.
$

Then, we will use the N instruction to process the pattern that spans two lines.
$ sed ’
> N
> s/System\nAdministrator/Desktop\nUser/ ==> When data is matched in two lines
> s/System Administrator/Desktop User/ ==>When data is matched in one line
> ’ data3

The first meeting of the Linux Desktop
User’s group will be held on Tuesday.
All Desktop Users should attend this meeting.
Thank you for your attendance. $

The N instruction always reads the data of the next line in the pattern space. However, if the matching data is in the last line, the requested processing is not performed and the sed editor is terminated because there is no next line. Let's look at the example below.

 
$ cat data4
The first meeting of the Linux System
Administrator’s group will be held on Tuesday.
All System Administrators should attend this meeting.
$ sed ’
> N
> s/System\nAdministrator/Desktop\nUser/
> s/System Administrator/Desktop User/
> ’ data4

The first meeting of the Linux Desktop
User’s group will be held on Tuesday.
All System Administrators should attend this meeting. ==>The last line is unchanged.
$

To solve this, put the processing instruction before the N instruction on one line before the N instruction.
$ sed ’
> s/System Administrator/Desktop User/
> N
> s/System\nAdministrator/Desktop\nUser/
> ’ data4
The first meeting of the Linux Desktop
==> Changed to the replacement command after the N command.
User’s group will be held on Tuesday.
All Desktop Users should attend this meeting.
==> Changed to the replacement command before the N command.
$

Delete from multiple lines

If the data matching the pattern spans two lines, you should be careful about using the N command. Let's look at the example below.

 
$ cat data4
The first meeting of the Linux System ==>If you want to delete only data that spans two lines
Administrator’s group will be held on Tuesday.
All System Administrators should attend this meeting.

$ sed ’
> N
> /System\nAdministrator/d
> ’ data3
All System Administrators should attend this meeting.
==> Delete the 1st and 2nd lines in data4 above
$

Contrary to expectations, both lines were deleted. In this case, if the D option is used, only the first of the two lines matching the pattern is deleted. Let's look at the example below.
$ sed ’
> N
> /System\nAdministrator/D
> ’ data3
Administrator’s group will be held on Tuesday.
All System Administrators should attend this meeting.
$

Let's look at one more example. The example below is an example of deleting the first blank line.
$ cat data5
==> Let's try to delete blank lines.
This is the header line.
This is a data line.

This is the last line.

$ sed ’/^$/{
> N
> /header/D
> }’ data5
This is the header line.
This is a data line.

This is the last line.
$
반응형

 

command to print multiple lines ​

P command outputs the first line if the matching pattern spans two lines. Let's take a look at the example below.

 
$ sed -n ’
> N
> /System\nAdministrator/P
> ’ data3

The first meeting of the Linux System
$

2. Hold Space

\The sed editor stores the text to be checked in a buffer called the pattern space to process it. And hold space allows you to temporarily store lines while working on other lines in pattern space. ​ Commands related to Hold Space include the following five commands. These commands allow you to temporarily store the text in the Pattern Space in the Hold Space.

 

Let's look at an example of using the above command.

$ cat data2
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

$ sed -n ’/first/{
> h ==>When the first character is encountered, the corresponding line is stored as Hold.
> p ==>Print lines in pattern space
> n ==> Save the following line to Pattern Space
> p ==>print the saved line
> g ==> Move lines in Hold Space to Pattern Space
> p ==> output the moved line
> }’ data2

This is the first data line.
This is the second data line.
This is the first data line.
$

Let's change the order of the sentences of first and second to print.
$ sed -n ’/first/{
> h
> n
> p
> g
> p
> }’ data2

This is the second data line.
This is the first data line.
$

3. Negating Command

The previous commands applied to all data lines or to a specific line were applied. Conversely, here, we will see how to prevent a command from being applied to a specific line. ​

 

An exclamation mark '!' as a symbol for negating a command can make the command not applied. Let's take a look at the example below.

$ sed -n ’/header/!p’ data2 ==> Outputs lines that do not contain headers.
This is the first data line.
This is the second data line.
This is the last line.

Let's see how to flip the last line first and the first line last in a text stream. First of all, to briefly summarize the sequence, the following tasks are required.

A. Save line as Hold Space

B. Save the next line as Pattern Space

C. Add Hold Space as Pattern Space

D. Save Pattern Space as Hold Space

E. Repeat steps 2 to 4 to the end of the line

F. Line inquiry.

Reverse text order

 

Let's take a look at the example below.

$cat data2
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$ sed -n ’{
==> The -n option suppresses line-by-line results from being displayed on the printer.
1!G ==> G option adds hold space as pattern space.
h ==>Move the pattern space to the hold space.
$p }’ data2 ==>Outputs the processing result of data 2 file.
This is the last line.
This is the second data line.
This is the first data line.
This is the header line.
$

 

The tac command outputs the contents of the file in a similar way to cat, but outputs the output from the last line to the first line..

4. Changing The Flow

In general, the sed editor proceeds in the order of processing from the top line of the script to the last line. Of course, the D command makes sed return to the start line without reading a new data line, but the sed editor provides a function to change the execution order of these scripts.

Brabching

As we saw in the previous example, we saw that the exclamation mark ('!) prevents the command from being executed on that line. Similarly, the branch command is an address, so you can specify a range of addresses so that the data lines in that range do not execute script commands.

branch command syntax

[address]b [label]

 

The label parameter defines a location for the branch command. If label is not defined, branch command will process until the end of the script. Let's look at the example below.

 
$ cat data2 ==>This is the file content of data2.
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

$ sed ’{
> 2,3b ==> The script below is not applied to the 2nd and 3rd lines
> s/This is/Is this/
> s/line./test?/ >
} ’ data2

Is this the header test?
==>If you look at the result, all lines except the 2nd and 3rd are changed.
This is the first data line.
This is the second data line.
Is this the last test?
$

Now let's look at an example using labels.
$ sed ’{
> /first/b jump1 ==>If the data line has the word first go to jump1
> s/ is/ might be/ ==>change is to might be
> s/line/test/ ==> change line to test
> :jump1 ==> Lines with the word first line run the script from here
> s/data/text/ > ==> change data to text
}’ data2

This might be the header test. ==> All scripts are applied to the first header line
This is the first text line. ==> In the second line with first, only data is changed to text.
This might be the second text test.
This might be the last test.
$

The example below is an example of deleting commas (,) one by one by making a loop using Label.
$ echo "This, is, a, test, to, remove, commas." | sed -n ’{
> :start ==> Create a loop effect by putting a Label at the beginning of the script
> s/,//1p ==> Output after replacing commas (,) with spaces
> b start ==> go to start
> }’


Below is the result of executing the command. You can see the comma repeats until there are no lines.
This is, a, test, to, remove, commas.
This is a, test, to, remove, commas.
This is a test, to, remove, commas.
This is a test to, remove, commas.
This is a test to remove, commas.
This is a test to remove commas.

However, the above script does not specify an address, so infinite repetition occurs. To solve this, if you designate an address in the script, the script is automatically terminated when there is no comma in the line. Let's look at the example below.


$ echo "This, is, a, test, to, remove, commas." | sed -n ’{
:start
s/,//1p
/,/b start ==> Move to start only when there is a comma (,) by specifying a comma (,) as the address before the branch command.
}’

This is, a, test, to, remove, commas.
This is a, test, to, remove, commas.
This is a test, to, remove, commas.
This is a test to, remove, commas.
This is a test to remove, commas.
This is a test to remove commas.
$

 

Test command

Similar to the branch command, the test command (t) can also be used to change the execution order of scripts. The barnch command changes the script order based on the address, but the test command changes the script order based on the label. ​ The test command moves to a specific label after the s (character descriptive command) command replaces the character matching the pattern, but if the descriptive command does not match the pattern, the test command does not jump to a specific label . ​

 

The syntax for using the command is as follows.

[Address]t [label]

$ cat data2
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

The example below shows how the remaining s commands are processed when one of the two s commands is matched.
$ sed ’{
> s/first/starting/
> t
> s/line/test/
> }’ data2

This is the header test.
This is the starting data line.
This is the second data test.
This is the last test.
$

Then, let's look at an example of deleting the comma (') with the branch command using the test command.


$ echo "This, is, a, test, to, remove, commas." | sed -n ’{
:start
s/,//1p
t start
}’

Below are the test results.
This is, a, test, to, remove, commas.
This is a, test, to, remove, commas.
This is a test, to, remove, commas.
This is a test to, remove, commas.
This is a test to remove, commas.
This is a test to remove commas.
$

 

4. Pattern replacemnet ​

Andpersand(&) It is very convenient to use the '&' symbol and wildcard symbol in the pattern. For example, let's look at the example below. In the example below, you want to change any character before at to ".at" by using a wild mask (.). In other words, when you want to change cat to "cat" and hat to "hat".

 
$ echo "The cat sleeps in his hat." | sed ’s/.at/".at"/g’

The ".at" sleeps in his ".at". ==> You will get different results than you want.
$


Then, we will process it using '&'.
$ echo "The cat sleeps in his hat." | sed ’s/.at/"&"/g’
The "cat" sleeps in his "hat".
$

 

Replace only part of a paragraph

The '&' symbol basically searches the entire paragraph to see if it matches the pattern. However, sometimes you may want to change only the characters of a specific sentence throughout a paragraph. In this case, you can create the desired result using parentheses '()" and numbers. Match the sentence to be replaced in the entire sentence using parentheses '(), and define the character to be changed and the character to be changed in the pattern.

$ echo "The System Administrator manual" | sed ’
> s/\(System\) Administrator/\1 User/’
==> Extracts partial sentences from the entire sentence with the '()' sign in parentheses and replaces the first '\1' character after matching with User, where \1 means the first sentence that matches the pattern in the entire paragraph.

This is the processing result.
The System User manual
$

$ echo "That furry cat is pretty" | sed ’s/furry \(.at\)/\1/’
That cat is pretty
$ echo "That furry hat is pretty" | sed ’s/furry \(.at\)/\1/’
That hat is pretty
$

 

Insert specific characters between sentences Using the example above, you can also insert specific characters between matching patterns. Let's look at the example below.

 
$ echo "1234567" | sed ’{
> :start
> s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
> t start
> }’
1,234,567
$



In the example above, two pattern
.*(0-9) Numeric characters ending in a number
[0-9]{3} ==> 3 characters from 0 to 9 o'clock
\1,\2

 

5. Using sed in a script ​

Using wrappers If the script is very long and you need to retype the entire script, you can simply put the sed editor in a shell wrapper and use it. Here the wrapper works between the command line and the sed editor. ​ Let's see how to use it in the example below.

 
$ cat data2 ==> The file contents of data2.
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.

$ cat reverse
#!/bin/bash
# shell wrapper for sed editor script to reverse lines
sed -n ’{
1!G
h
$p }’ "$1" ==> In the sed editor, the shell variable can receive input using $1.
$

$ ./reverse data2 ==> Now you can use the script you wrote in any file
This is the last line.
This is the second data line.
This is the first data line.
This is the header line.
$

Redirecting the output of the sed editor Like the shell command, the default output for the sed editor is STDOUT. The sed editor can also redirect standard output to a file or variable. ​ Let's take a look through the example below.

$ cat fact ==>The example below is a script to find permutations up to 20..
#!/bin/bash
#add commas to numbers in factorial answer
factorial=1
counter=1
number=$1
while [ $counter -le $number ]
do
factorial=$[ $factorial * $counter ]
counter=$[ $counter + 1 ]
done
result=`echo $factorial | sed ’{
:start
s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/ .
t start
}’`
echo "The result is $result"
$ ./fact 20
The result is 2,432,902,008,176,640,000
$

6. Create the sed utility

This time, let's take a look at an example of using sed in a script by making a useful utility derived from it. ​ put spaces between sentences Let's look at the example below.

$ sed ’G’ data2 ==>G inserts the contents of hold space (current space)
This is the header line.

This is the first data line.
==> Since there is nothing in the initial hold space, a space is inserted
This is the second data line.

This is the last line.
==> A space is also inserted after the last line.
$

If you do not want to put a space after the last line, you can use it as follows.
$ sed ’$!G’ data2
This is the header line.
This is the first data line.
This is the second data line.
This is the last line.
$



If there are two or more spaces or no spaces in the text file, and you want to add only one space per sentence to all lines equally, refer to the example below.
$ sed ’/^$/d;$!G’ data6 ==> Add spaces except for the last line after deleting spaces in the start and end (^$) lines
This is line one.
This is line two.
This is line three.
This is line four.
$

 

Display line numbers in a file

The '=' symbol indicates the line number of each line. However, it is inconvenient to see because it is displayed above the relevant line as shown in the picture below. In this case, to display the line number on the same line, you can use the N option to combine the two lines and the replacement command to delete the newline character to display as desired. Let's look at the example below.

Let's display the line number in the DATA 2 file using a simple '=' symbol.
$ sed ’=’ data2
1
This is the header line.
2
This is the first data line.
3
This is the second data line.
4
This is the last line.
$

Now, let's combine two lines using N and 's/\n/ /' and replace the newline character with a space..
$ sed ’=’ data2 | sed ’N; s/\n/ /’
1 This is the header line.
2 This is the first data line.
3 This is the second data line.
4 This is the last line.
$

Now you have the results you want.

 

print the last line

It is easy to output only the last line from a very long file such as a log file. You can use it by attaching the '$' symbol to mark the end of the line and the p command, which is an output command. ​ Let's take a look through an example.

 
$ sed -n ’$p’ data2
This is the last line.
$
Shows the last line of the file. ​ The example below is an example of making two lines into one line in the /etc/passwd file..
$ sed ’{
> :start ==> Set the label to loop through the end of the file.
> $q ==> Terminates the script if the current line is the last line.
> N ==> Add the following line to the pattern space.
> 11,$D ==> If the line is the 11th line, delete the first line. (10 pattern spaces)
> b start ==> loop with start.
> }’ /etc/passwd
mysql:x:415:416:MySQL server:/var/lib/mysql:/bin/bash rich:x:501:501:Rich:/home/rich:/bin/bash katie:x:502:506:Katie:/home/katie:/bin/bash jessica:x:503:507:Jessica:/home/jessica:/bin/bash testy:x:504:504:Test account:/home/testy:/bin/csh barbara:x:416:417:Barbara:/home/barbara/:/bin/bash ian:x:505:508:Ian:/home/ian:/bin/bash emma:x:506:509:Emma:/home/emma:/bin/bash bryce:x:507:510:Bryce:/home/bryce:/bin/bash test:x:508:511::/home/test:/bin/bash
$

erase line

Another useful utility in the sed editor is to delete unwanted lines. So let's take a look at the examples below one by one.

 
Let's look at an example that deletes consecutive blank lines.
$ cat data6
This is the first line.
This is the second line.
This is the third line.
This is the fourth line.

$ sed ’/./,/^$/!d’ data6 ==>/./ /^$/ at the beginning and end of the data stream starting with a single character Delete blank lines except for the last line
This is the first line.

This is the second line.

This is the third line.

This is the fourth line.
$

if there is a blank line before the line where the first character exists, the script to delete it is as follows.

$ cat data7
==> blank line
This is the first line.
==> blank line
This is the second line.
$ sed ’/./,$!d’ data7 ==> Undelete the line between the end of the line and the sentence that starts with one or more characters
This is the first line.

This is the second line.
$



This is a script to delete if there are multiple blank lines at the end of the line..
$ cat data8
This is the first line.
This is the second line.
==> 3개의 공백라인


$ sed ’{
:start ==>Specifies the loop start point
/^\n*$/{$d ; N; b start } ==> Delete if $d (last line) in line with new character line If not the last line, two lines into one line (N)
Change the execution position to start (b start)
}’ data8
This is the first line.
This is the second line.
$

Remove HTML Tag
When downloading data from a web page, tag values ​​may also be downloaded together. In this case, you can easily process data by removing unnecessary tags.

$ cat data9
< html>
<head>
<title>This is the page title</title>
</head>
<body>
<p>
This is the <b>first</b> line in the Web page. This should provide
some <i>useful</i> information for us to use in our shell script.
</body>
</html>
$

$ sed ’s/<[^>]*>//g;/^$/d’ data9
This is the page title
This is the first line in the Web page.
This should provide some useful information for us to use in our shell script.
$

So far, we have looked at the additional features of sed. In the next chapter, we will take a closer look at the additional features of gawk.

 

 

728x90
반응형

'Shell Script' 카테고리의 다른 글

Chapter 18 using Database  (57) 2023.06.19
17 Advance gawk  (4) 2022.09.03
15 Regular Expression  (1) 2022.08.27
14 Introduction sed and gawk  (1) 2022.08.27
13 Using Graphic in Script  (1) 2022.08.17