In the third part of the following awk command series, we shall take a look at filtering text or strings based on specific patterns that a user can define.

Sometimes, when filtering text, you want to indicate certain lines from an input file or lines of strings based on a given condition or using a specific pattern that can be matched. Doing this with Awk is very easy, it is one of the great features of Awk that you will find helpful.

Let us take a look at an example below, say you have a shopping list for food items that you want to buy, called food_prices.list.

It has the following list of food items and their prices.

cat food_prices.list 
No	Item_Name Quantity	Price
1	Mangoes 10 $2.45
2	Apples 20 $1.50
3	Bananas 5 $0.90
4	Pineapples 10 $3.46
5	Oranges 10 $0.78
6	Tomatoes 5 $0.55
7	Onions 5 $0.45

And then, you want to indicate a (*) sign on food items whose price is greater than $2, this can be done by running the following command:

awk '/ *$[2-9].[0-9][0-9] */ { print $1, $2, $3, $4, "*" ; } / *$[0-1].[0-9][0-9] */ { print ; }' food_prices.list
Print Items Whose Price is Greater Than $2
Print Items Whose Price is Greater Than $2

From the output above, you can see that there is a (*) sign at the end of the lines having food items, mangoes and pineapples. If you check their prices, they are above $2.

In this example, we have used two patterns:

  • the first: / *$[2-9].[0-9][0-9] */ gets the lines that have food item prices greater than $2 and
  • the second: /*$[0-1].[0-9][0-9] */ looks for lines with food item prices less than $2.

This is what happens, there are four fields in the file, when pattern one encounters a line with a food item price greater than $2, it prints all four fields and a (*) sign at the end of the line as a flag.

The second pattern simply prints the other lines with food prices less than $2 as they appear in the input file, food_prices.list.

This way you can use pattern-specific actions to filter out food items that are priced above $2, though there is a problem with the output, the lines that have the (*) sign are not formatted out like the rest of the lines making the output not clear enough.

We saw the same problem in Part 2 of the awk series, but we can solve it in two ways:

1. Using the printf command which is a long and boring way using the command below:

$ awk '/ *$[2-9].[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10sn", $1, $2, $3, $4 "*" ; } / *$[0-1].[0-9][0-9] */ { printf "%-10s %-10s %-10s %-10sn", $1, $2, $3, $4; }' food_prices.list 
Filter and Print Items Using Awk and Printf
Filter and Print Items Using Awk and Printf

2. Using $0 field. Awk uses the variable 0 to store the whole input line. This is handy for solving the problem above and it is simple and fast as follows:

$ awk '/ *$[2-9].[0-9][0-9] */ { print $0 "*" ; } / *$[0-1].[0-9][0-9] */ { print ; }' food_prices.list 
Filter and Print Items Using Awk and Variable
Filter and Print Items Using Awk and Variable
Conclusion

That’s it, for now, these are simple ways of filtering text using pattern-specific action that can help in flagging lines of text or strings in a file using the Awk command.

Hope you find this article helpful and remember to read the next part of the series which will focus on using comparison operators using the awk tool.

Similar Posts