AWK IN SHELL SCRIPT A VERY POWERFUL TOOL.
AWK is a powerful text processing tool in Linux shell scripting. It is widely used for pattern scanning and processing. AWK is an interpreted language that is used for text processing, data extraction and reporting. The main use of AWK is to search for specific patterns in a text file and then perform some action on the lines that match the pattern.
AWK is named after its authors, Alfred Aho, Peter Weinberger, and Brian Kernighan. It was first introduced in the 1970s and has since become a widely used tool in Unix and Linux environments.
AWK provides a way to define and execute operations on groups of lines in a text file based on patterns in the text. It is often used for filtering and transforming text, extracting information from log files, and generating reports.
Syntax and Template: AWK has a simple syntax. Each AWK program consists of one or more patterns and actions. Patterns specify the lines that the action should be performed on, and actions specify what to do with those lines.
The basic template for an AWK program is as follows:
/pattern/ {
action
}
Here, pattern
is a regular expression that specifies the pattern to search for in the input text. The action is enclosed in curly braces {}
and is executed for each line that matches the pattern.
Analogy: Think of AWK as a filter that processes a text file line by line and performs an action on the lines that match a specified pattern. It is like a sieve that separates the desired information from a large pile of text data.
AWK is a powerful tool for text processing and has a number of options that allow you to fine-tune its behavior. Some of the most important options are:
-F fs: This option allows you to specify the field separator character (fs) to be used for parsing the input data. The default field separator is a whitespace character.
-v var=value: This option allows you to define a variable that can be used in your AWK script. This can be useful for storing values that you want to use multiple times in your script.
-f file: This option allows you to specify the name of an AWK script file that you want to use for processing.
-e script: This option allows you to specify the AWK script directly on the command line, without using a separate file.
With these options, you can modify the behavior of AWK to meet your specific needs. For example, you can use the -F option to specify a different field separator character, such as a comma, if you are processing a CSV file. You can also use the -v option to define a variable that you can use in your script.
Here’s an example that uses the -F option to specify a different field separator and the -v option to define a variable:
$ cat data.txt
John,25,male
Jane,30,female
$ awk -F "," -v name="John" '$1 == name {print $1,$2,$3}' data.txt
John 25 male
In this example, the -F option is used to specify that the field separator is a comma, and the -v option is used to define a variable named name
with the value "John". The AWK script then matches the first field against the value of the name
variable, and if it matches, the script prints the first, second, and third fields. The output shows that the script correctly matched the data for the person named "John".
v: This option allows you to define a variable before executing the AWK script. For example, if you want to define a variable “name” and print its value, you can use the following command:
awk -v name="John" 'BEGIN{print "My name is", name}'
Output: My name is John
f: This option allows you to specify an AWK script file. For example, if you have a script file named “myscript.awk”, you can use the following command:
awk -f myscript.awk data.txt
The output will be the result of the script in myscript.awk executed on the data.txt file.
O: This option allows you to enable the profiling of the AWK script. For example, if you have a script file named “myscript.awk” and you want to enable profiling, you can use the following command:
awk -O -f myscript.awk data.txt
The output will be the result of the script in myscript.awk executed on the data.txt file, as well as the profiling information.
Options
In AWK, there are several special patterns that can be used to manipulate data in a more advanced way. These patterns are used along with the AWK commands to provide more powerful functionality.
- BEGIN: This pattern is executed before any input is read. It is used to perform initialization tasks. For example, to initialize variables or print a header.
awk 'BEGIN { print "header line"} {print $0}' inputfile.txt
2. END: This pattern is executed after all input is processed. It is used to perform clean up tasks, such as printing a summary or closing files.
awk '{ sum += $1 } END { print "Total: " sum }' inputfile.txt
3. NR: This pattern matches the current record number, starting from 1. It can be used to print line numbers for each line of the input file.
awk '{ print NR, $0 }' inputfile.txt
4.NF: This pattern matches the number of fields in the current record. It can be used to print the number of fields for each line of the input file.
awk '{ print NF, $0 }' inputfile.txt
5. $0: This pattern matches the entire line. It can be used to print the entire line, or to modify the line as a whole.
awk '{ print $0 }' inputfile.txt
Actions
Actions in awk are commands that are executed when a pattern is matched. They are specified after the pattern, separated by a comma, and are executed for each line that matches the pattern. In other words, an action defines what to do when a certain pattern is found in the input.
For example, if we want to print all the lines in a file that contain the word “error”, we can use the following awk script:
awk '/error/ {print $0}' filename
Here, the pattern “/error/” specifies that we want to match lines that contain the word “error”. The action {print $0}
specifies that we want to print the entire line (represented by $0
).
In addition to the print
statement, there are several other built-in actions in awk, such as next
, break
, continue
, and exit
, that allow us to control the flow of execution of the script.
For example, the next
action skips the remaining statements in the current iteration of the loop and moves on to the next line of input. The following awk script prints only the first line that contains the word "error":
awk '/error/ {print $0; next}' filename
The break
action terminates the execution of the script, while the continue
action skips the remaining statements in the current iteration of the loop and continues with the next iteration.
Finally, the exit
action terminates the execution of the script and returns the specified exit status.
In addition to the built-in actions, we can also define our own custom actions in awk, using a combination of awk statements, such as if-else
, while
, for
, and do-while
. These statements allow us to create more complex and powerful scripts that can manipulate the input in more sophisticated ways.
Another example:
#!/usr/bin/awk -f
# pattern that matches lines that contain the word 'error'
/error/ {
# action to increment the error_count variable
error_count++
}
# action to print the total number of errors at the end of the file
END {
print "Total number of errors:", error_count
}
In this example, the pattern /error/
matches any line that contains the word 'error'. The action associated with this pattern increments the error_count
variable by 1 each time the pattern is met. The END
block is a special pattern that is executed after all the lines of the input file have been processed. The action in the END
block prints the total number of errors.
It’s important to note that actions in Awk are executed in the order that they are defined. This means that if you have multiple patterns, the actions for the first pattern that matches a line will be executed before the actions for any subsequent patterns that match the same line.
Patterns
In AWK, patterns are used to determine what actions should be taken on a particular input. A pattern can be as simple as a regular expression that matches a certain line of text, or it can be more complex and include multiple conditions and tests.
There are two types of patterns in AWK:
- Regular expressions: These patterns match a line of text based on the pattern specified. They can include wildcard characters, special characters, and character classes.
- Conditional expressions: These patterns are used to specify multiple conditions that must be met before a certain action is taken. They allow you to perform more complex operations, such as checking the value of variables, performing arithmetic operations, and testing the contents of arrays.
Here are some examples of using patterns in AWK:
Example 1: Matching a line of text
awk '/error/ {print $0}' file.txt
In this example, the AWK command is searching for lines in file.txt
that contain the word "error". The pattern /error/
is a regular expression that matches lines containing the word "error". The action {print $0}
prints the entire line that matches the pattern.
Example 2: Matching a range of lines
awk 'NR >= 2 && NR <= 4 {print $0}' file.txt
In this example, the AWK command is searching for lines 2 to 4 in file.txt
. The pattern NR >= 2 && NR <= 4
is a conditional expression that matches lines 2 to 4. The action {print $0}
prints the entire line that matches the pattern.
Example 3: Matching multiple conditions
awk '$2 >= 50 && $3 >= 60 {print $0}' file.txt
In this example, the AWK command is searching for lines in file.txt
where the second field is greater than or equal to 50 and the third field is greater than or equal to 60. The pattern $2 >= 50 && $3 >= 60
is a conditional expression that matches lines that meet both conditions. The action {print $0}
prints the entire line that matches the pattern.
These are just a few examples of using patterns in AWK. With the ability to combine regular expressions and conditional expressions, the possibilities are virtually endless.
LOOPS
In AWK, loops are used to repeat the same action multiple times, either a certain number of times or until a condition is met. There are two types of loops in AWK: the for
loop and the while
loop.
The for
loop is used to execute a block of code a specified number of times. The basic syntax for the for
loop is:
for (initialization; condition; increment) {
statements;
}
Here, initialization
sets the initial value of the loop counter, condition
is checked each time before the loop is executed, and increment
is used to increase the value of the loop counter after each iteration. If condition
is true, the statements within the loop are executed, otherwise the loop is skipped.
Here is an example of a for
loop in AWK:
BEGIN {
for (i = 1; i <= 5; i++) {
print "Iteration", i;
}
}
This will output:
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
The while
loop is used to repeat a block of code as long as a certain condition is true. The basic syntax for the while
loop is:
while (condition) {
statements;
}
Here, condition
is checked each time before the loop is executed. If condition
is true, the statements within the loop are executed, otherwise the loop is skipped.
Here is an example of a while
loop in AWK:
BEGIN {
i = 1;
while (i <= 5) {
print "Iteration", i;
i++;
}
}
This will output:
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
In conclusion, the for
and while
loops in AWK allow you to repeat actions multiple times, either a specified number of times or until a certain condition is met. These loops are powerful tools for controlling the flow of execution in your AWK scripts.