Shell Script

17 Advance gawk

트리스탄1234 2022. 9. 3. 10:05
728x90
반응형

In this chapter, we will learn how to use gawk in more detail than the basic usage of gawk discussed in the previous chapter..

1. Using Variables

One of the main functions is to save data using variables thrown by any program and to call it back and use it. awk supports the following two types of variables.

■ Built-in variables

■ User-defined variables

Built-in variables

As we saw in the previous chapter, you can refer to the recorded data by adding a dollar '$' sign followed by a number. Data fields are separated by FS (Field Separotr). Basically, FS is set to Space, and the table below shows what kind of delimiters are in the data field.

Let's take a look at how to use OFS through the example below.

$ cat data1
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35
$ gawk ’BEGIN{FS=","} {print $1,$2,$3}’ data1
=> Separate the ofs with a comma ',' and print
data11 data12 data13
data21 data22 data23
data31 data32 data33 $
Next, let's change OFS to output.
$ gawk ’BEGIN{FS=","; OFS="-"} {print $1,$2,$3}’ data1
data11-data12-data13
data21-data22-data23
data31-data32-data33
$ gawk ’BEGIN{FS=","; OFS="--"} {print $1,$2,$3}’ data1
data11--data12--data13
data21--data22--data23
data31--data32--data33
$ gawk ’BEGIN{FS=","; OFS="‹--›"} {print $1,$2,$3}’ data1
data11‹--›data12‹--›data13
data21‹--›data22‹--›data23
data31‹--›data32‹--›data33
$
This time, we will adjust the length of the output by using the fieldwidth variable.
$ cat data1b
1005.3247596.37
115-2.349194.00
05810.1298100.1
$ gawk ’BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}’ data1b
==>with a length of 3 5 2 5
100 5.324 75 96.37
==> You can see that it is printed according to the length set above.
15 -2.34 91 94.00 058
10.12 98 100.1
$
Let's see how to use RS and ORS. Let's take a look at the following data. Below is a structure in which data spans three lines consisting of name, address, and phone number. What if you want to print only the name and number here? If you use the default FS and RS, gawk editor recognizes each line as a separate data record and the space between each word as FS. Here, to create the desired result, we will change RS and FS to execute the desired result as in the example below.
Riley Mullen
123 Main Street Chicago, IL 60601
(312)555-1234
Frank Williams
456 Oak Street Indianapolis, IN 46201
(317)555-9876
Haley Snell
4231 Elm Street
Detroit, MI 48201
(313)555-4938
$ gawk ’BEGIN{FS="\n"; RS=""} {print $1,$4}’ data2
==>The reason that fs is a newline character, RS is set to a space, and RS is set to a space is because the delimiter between the data above is a blank line.
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938 $

In addition to the variables discussed above, gawk provides Built-In variables as shown below.

반응형

 

Let's take a look at the example below

$ gawk ’BEGIN{print ARGC,ARGV[1]}’ data1
==>ARGV displays the number of parameters on the command line
==> ARGV is an array of parameters on the command line, starting from 0, output result 2 data1 ==> There are 2 parameters in the command line and the second parameter is data1.

The ENVIRON variable uses an associated array to look up the shell environment variable, and the index value of this associated array uses text, not numbers. Let's look at an example below.

$ gawk ’
› BEGIN{
› print ENVIRON["HOME"] ==> Outputs the value of HOME among environment variables
› print ENVIRON["PATH"]==>Outputs the value of PATH among environment variables
› }’
/home/rich
/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
$
The NF variable is useful when you don't know the last position of a data field.
The example below is an example of outputting the data of the first part and the last part of data in /etc/passwd.
$ gawk ’BEGIN{FS=":"; OFS=":"} {print $1,$NF}’ /etc/passwd
rich:/bin/bash ==> Prints the first rich and last /bin/bash of the data.
testy:/bin/csh
mark:/bin/bash
dan:/bin/bash
Let's look at an example of the use of FNR. FNR shows the number of records processed in the current data file.
$ gawk ’BEGIN{FS=","}{print $1,"FNR="FNR}’ data1 data1
data11 FNR=1
data21 FNR=2
data31 FNR=3
data11 FNR=1
data21 FNR=2
data31 FNR=3
$
NR shows the total number of records processed. Let's look at an example below.
$ gawk ’
› BEGIN {FS=","}
› {print $1,"FNR="FNR,"NR="NR}
› END{print "There were",NR,"records processed"}’ data1 data1
data11 FNR=1 NR=1
data21 FNR=2 NR=2
data31 FNR=3 NR=3
data11 FNR=1 NR=4
data21 FNR=2 NR=5
data31 FNR=3 NR=6
There were 6 records processed
$

User-defined variables

Variable names that can be used in the gawk editor can be numbers or letters underscore. However, variable names cannot start with a number. And the gawk editor is case sensitive.

Assigning Variables in Scripts

Let's take a look through an example

$ gawk ’
› BEGIN{
› testing="This is a test"
› print testing
› }’
This is a test
$
$ gawk ’
› BEGIN{
› testing="This is a test"
› print testing
› testing=45 ==>You can also store numbers in variables.
› print testing
› }’
This is a test
45
$
Formulas are also available in gawk.
$ gawk ’BEGIN{x=4; x= x * 2 + 3; print x}’
11
$
Let's look at another example.
$ cat script2
BEGIN{print "The starting value is",n; FS=","}
{print $n}
$ gawk -f script2 n=3 data1
The starting value is
==>In the begin sector, the value of n is not output because the variable is not valid.
data13
data23
data33
$
To print n variables, you need to pass a variable before the begin sector, and the -v option makes this possible. Let's look at the example below
.
$ gawk -v n=3 -f script2 data1
The starting value is 3 ==> You can get the results you want.
data13
data23
data33
$

2. Using Arrays

Many programs provide arrays to store multiple pieces of data in a single variable. If var is the name of the array and index is the number in the array to store data, the declaration method is as follows. var[index] = element ​ Let's take a look through an example.

capital["Illinois"] = "Springfield"
capital["Indiana"] = "Indianapolis"
capital["Ohio"] = "Columbus"
The above is an example of entering data into an array. Let's try to output this input data using a script.
$ gawk ’BEGIN{
› capital["Illinois"] = "Springfield"
› print capital["Illinois"]
› }’
Springfield
$
The following is an example of using a numeric roll index.
$ gawk ’BEGIN{
› var[1] = 34
› var[2] = 3
› total = var[1] + var[2]
› print total
› }’
37
$

Iterative calls to associative array variables

It is a bit tricky to print all the data stored in an associative array. The reason is that if the index is a character, it is difficult to know all index values. So what to do in this case? You can make a recursive call using the format below.

[Systax]

for (var in array)

{

statements

}

So let's look at an example in practice.

$ gawk ’BEGIN{
› var["a"] = 1
› var["g"] = 2
› var["m"] = 3
› var["u"] = 4
› for (test in var) ==> Store the index of the var array in a variable called test.
› {
› print "Index:",test," - Value:",var[test]
==>Outputs the index value and the data stored in the index.
› }
› }’
Index: u - Value: 4
Index: m - Value: 3
Index: a - Value: 1
Index: g - Value: 2
$

Deleting an index value from an array

The way to delete an array index in an associative array is to use the delete command. How to use is as follows. ​ delete array[index] In case of deletion using the delete command, the data stored in the corresponding index is also deleted. ​ Let's look at the example below.

$ gawk ’BEGIN{
› var["a"] = 1
› var["g"] = 2
› for (test in var)
› {
› print "Index:",test," - Value:",var[test] ==>Prints all data in var variable.
› }
› delete var["g"] ==> delete the g-index from the var array
› print "---" ==> Output '---' for identification after deletion
› for (test in var)
› print "Index:",test," - Value:",var[test] ==> Print the array var with the g-index removed.
› }’
Index: a - Value: 1
Index: g - Value: 2
---
Index: a - Value: 1
$

3. Using patterns

Like sed, gawk provides several types of patterns for data filters. Let's see what patterns are provided. ​ Regular Expressions: As we saw in the previous chapter, we support regular expression expressions. ​ Let's look at an example of how to use regular expressions. The regular expression expression must appear before the left curly brace in the script.

$ gawk ’BEGIN{FS=","} /11/{print $1}’ data1 ==> Output data matching 11 in data1 file
data11
$
$ gawk ’BEGIN{FS=","} /,d/{print $1}’ data1
data11

Match using operators

You can further refine data filtering by using regular expressions and operators. The operator matching method can be used using the tilde '~'. In the example below, $1 means the first data field in the data record, and that data field filters data fields that start with the character data.

$1 ~ /^data/

Let's go through some examples.

$ gawk ’BEGIN{FS=","} $2 ~ /^data2/{print $0}’ data1
==> Data output with 2nd data field starting with data2.
data21,data22,data23,data24,data25
$
$ gawk -F: ’$1 ~ /rich/{print $1,$NF}’ /etc/passwd
==> The first data field in the etc/passwd file starts with rich and the last data output
rich /bin/bash
$
$ gawk ’BEGIN{FS=","} $2 !~ /^data2/{print $1}’ data1
==> Output the first data field of the record whose second data field does not start with data2 in the data1 file
data11 data31
$

mathematical expression

Let's take a look at how to filter data using numbers in a data field. Let's look at the case of printing all users belonging to the root user. First of all, the root group is 0, and using this Let's try to print the users belonging to the root group in /etc/passwd.​.

$ gawk -F: ’$4 == 0{print $1}’ /etc/passwd
=> Output the first field of the data where the value of the 4th data field is 0
root
sync
shutdown
halt
operator
$

The arithmetic comparisons available in gawk are:

■ x == y:x and y is equal

■ x ‹= y: x is less than y or small

■ x‹y: x is less than y

■ x ›= y x is bigger than y or same

■ x›y: x is bigger than y 

Let's look at an example of text matching using an arithmetic expression.

$ gawk -F, ’$1 == "data"{print $1}’ data1
$ ==> There is no field matching data in file data1
$ gawk -F, ’$1 == "data11"{print $1}’ data1
data11==> Output data that matches data11
$

4. Structured Instructions

Like shell programs, gawk supports structured commands. Let's look at each phrase individually.

if statement

The gwak editor supports the if-then-else syntax. The usage syntax is as follows.

if (condition) ==> If the conditional clause is true, statement is executed, otherwise omitted. statement1 ​

Let's look at individual examples.

$ cat data4
10
5
13
50
34
$ gawk ’{if ($1 › 20) print $1}’ data4 ==> Prints only numbers greater than 20.
50
34
$
$ gawk ’{
› if ($1 › 20) ==> If the first data field is greater than 20
› {
› x = $1 * 2 ==> Stores the result of the number entered in x * 2
› print x ==> print x
› }
› }’ data4
100
68
$
Below is an example using the if -else statement..
$ gawk ’{
› if ($1 › 20) ==> If the first data field is greater than 20
› {
› x = $1 * 2 ==> Store the first data value multiplied by 2 in x
› print x ==> print x
› } else==> If the first data field is less than 20
› {
› x = $1 / 2 ==> Divide the first data field in half and store it in x
› print x › }}’ data4 ==> print the value of x
5
2.5
6.5
100
68
$
$ gawk ’{if ($1 › 20) print $1 * 2; else print $1 / 2}’ data4 ==> Display if and els on one line
5
2.5
6.5
100
68
$

while statement

To handle repetitive tasks, gawk supports the while statement. Its usage syntax is as follows.

while (condition)
{
statements
}
$ cat data5
130 120 135
160 113 140
145 170 215
$ gawk ’{
› total = 0
›i=1
› while (i ‹ 4) ==> Repeat if i value is less than 4
› {
› total += $i ==> Store i-th data field in total variable
› i++ ==> increase the value of i
› }
› avg = total / 3 ==> Store the average value in the avg variable
› print "Average:",avg ==> output average value
› }’ data5
Average: 128.333
Average: 137.667
Average: 176.667
$
The while statement also supports break and continue commands. Let's look at the example below.
$ gawk ’{
› total = 0
›i=1
› while (i ‹ 4) ==> Repeat as long as i is less than 4
› {
› total += $i
› if (i == 2) ==> If i value is 2, while loop escapes
break
› i++
› }
› avg = total / 2
› print "The average of the first two data elements is:",avg
› }’ data5
The average of the first two data elements is: 125
The average of the first two data elements is: 136.5
The average of the first two data elements is: 157.5
$

do-while syntax

This statement is similar to the while statement, but before repeating the while statement, the do statement is executed first.

do

{

statements

}

while (condition)

Let's go through an example

Print only when total value is greater than 150.
$ gawk ’{
› total = 0
›i=1
› do
› {
› total += $i ==> Enter the i-th data value in the total variable.
› i++
› } while (total ‹ 150) ==> Check the total value and repeat do if it is less than 150
› print total }’ data5
250
160
315
$

for syntax

Like all other programs, gawk supports the for statement, which is shown below.

for( variable assignment; condition; iteration process)

let's try to execute the script obtained by calculating the average value above using the for statement.
$ gawk ’{
› total = 0
› for (i = 1; i ‹ 4; i++)
› {
› total += $i
› }
› avg = total / 3
› print "Average:",avg
› }’ data5
Average: 128.333
Average: 137.667
Average: 176.667
$

5. Formatted Printing

It is difficult to get the format of the output you want with the normal print command. You can use the printf command when you need some formatted output. ​

The usage syntax is as follows.

printf "format string", var1, var2...

format string acts as a key value in the output format. This format string determines the format of the output data using text and format specifiers. The format specifier is a special code that indicates what type of variable will be displayed. The format of this format specifier is as follows.

%[modifier]control-letter

contorl-letter is a character indicating what type of data value will be displayed. A modifier is used when using additional formatting features. The table below shows the available control-letters.

When displaying the output data as a screen, use %s or %d to display it as an integer type.

Representing large numbers using scientific notation.
$ gawk ’BEGIN{
› x = 10 * 100
› printf "The answer is: %e\n", x
› }’
The answer is: 1.000000e+03
$

Additional available modifiers

■ width: A numeric value that specifies the minimum width of the output field. If the output is shorter, use a right justification for printf to pad the blanks with whitespace. text. If the output is longer than the specified width, the width value is overwritten. ​

■ prec: A numeric value that specifies the number of digits to the right of the decimal point in a floating-point number, or the maximum number of characters displayed in a text string.

■ − (minus sign): The minus sign indicates that left alignment should be used instead of right alignment when placing data in a formatted space.

Let's look at some examples using printf . Let's compare print and printf with the script that displays the name and phone number among the scripts we looked at earlier.
Scripts using the print command
$ gawk ’BEGIN{FS="\n"; RS=""} {print $1,$4}’ data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$
Scripts using the printf command
$ gawk ’BEGIN{FS="\n"; RS=""} {printf "%s %s\n", $1, $4}’ data 2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$
%s\nThe reason for the additional entry is that you need to add a new line character. If this value is not present, data with different outputs on the same line will be output together..
You can also use BEGIN and END instead of %s\n.
$ gawk ’BEGIN{FS=","} {printf "%s ", $1} END{printf "\n"}’ data1
data11 data21 data31
$
This is an example of setting the space of the first data where the name of Sarim is printed to 16 spaces
$ gawk ’BEGIN{FS="\n"; RS=""} {printf "%16s %s\n", $1, $4}’ data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$
The printf command selects right alignment by default. If you want to change this to left-aligned, use minus '-' as shown below.
$ gawk ’BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}’ data2
Riley Mullen (312)555-1234
Frank Williams (317)555-9876
Haley Snell (313)555-4938
$
The following is an example of displaying a floating point with a total of 5 lengths and only 1 decimal point.
$ gawk ’{
› total = 0
› for (i = 1; i ‹ 4; i++)
› {
› total += $i
› }
› avg = total / 3
› printf "Average: %5.1f\n",avg › }’ data5
Average: 128.3
==> The total number of output digits is displayed with one decimal place in 5 digits
including the decimal point.
Average: 137.7
Average: 176.7 $

6. Using Built-In Functions

In gwak, you can use Built In functions for arithmetic, string, and time functions. gwak supports arithmetic functions as shown in the table below.

In addition to arithmetic functions, the following bit-based operations are also supported.

■ and(v1, v2): and operation on the values ​​of v1 and v2

■ compl(val): bit-based complement operation

■ lshift(val, count): shifts the value of val to the left by count

■ or(v1, v2): or operation of v1, v2

■ rshift(val, count): Shifts the value of val to the right by count

■ xor(v1, v2): XOR operation of bit-based v1, v2

Using string functions

The string functions available in gwak are shown in the table below..

Let's look at a few examples.

This example converts lowercase letters to uppercase letters and displays the number of digits.
$ gawk ’BEGIN{x = "testing"; print toupper(x); print length(x) }’
TESTING
7
$
asort: sort by data value, asoti: sort by index value
$ gawk ’BEGIN{
› var["a"] = 1
› var["g"] = 2
› var["m"] = 3
› var["u"] = 4
› asort(var, test) ==> Sort the array of vars and store them in test
› for (i in test)
› print "Index:",i," - value:",test[i]
=> The index value of test has been changed to a number.
› }’
Index: 4 - value: 4
Index: 1 - value: 1
Index: 2 - value: 2
Index: 3 - value: 3
$
Next, let's look at an example using the split function.
$ gawk ’BEGIN{ FS=","}{
› split($0, var) ==> Insert the input value of $0 into each data field in the var array.
› print var[1], var[5]
› }’ data1
data11 data15
data21 data25
data31 data35
$

Using time functions.

Gwak also supports time functions. Among them, the available functions are shown in the table below.

7. Using User-Defined Functions

gawk allows users to define functions and use them in scripts.

function name([variables])

{

statements

}

Let's create a user-defined function.

$ gawk ’
› function myprint() ===> define a function called myprint
› {
› printf "%-16s - %s\n", $1, $4
==> At the first time, the data field is set to a 16-digit string, and when a newline character appears, a line break is This command prints the first and fourth data fields.
› }
› BEGIN{FS="\n"; RS=""}
==> Data fields are separated by a newline character, and records are classified as blank.
› {
› myprint()
› }’ data2
Riley Mullen - (312)555-1234
Frank Williams - (317)555-9876
Haley Snell - (313)555-4938 $

Creating a function library

Creating and re-creating a function is not a cumbersome task. So, you can save the created function in the library and use it by calling the library when you use it later. ​

The first thing we need to do is create a file containing all of gwak's functions. Let's look at the example below.

$ cat funclib ==> Let's check the function created in the library file.
function myprint()
{
printf "%-16s - %s\n", $1, $4
}
function myrand(limit)
{
return int(limit * rand())
}
function printthird() {
print $3
}
Let's take a look at the contents of the script file used.
$ cat script4
BEGIN{ FS="\n"; RS=""}
{
myprint()
}
Next, let's try to run it using the above script and library file. Specify the library file using the -f option. And use -f to specify the script file.
$ gawk -f funclib -f script4 data2
Riley Mullen - (312)555-1234
Frank Williams - (317)555-9876
Haley Snell - (313)555-4938
$

 

728x90
반응형

'Shell Script' 카테고리의 다른 글

Chapter 19 Using Web  (117) 2023.06.20
Chapter 18 using Database  (57) 2023.06.19
16 Advanced Sed  (1) 2022.08.31
15 Regular Expression  (1) 2022.08.27
14 Introduction sed and gawk  (1) 2022.08.27