Tuesday, September 14, 2010

Using cut – Shellscript string manipulation

This post is designed to be a refresher, reference or quick intro into how to manipulate strings with the cut command in bash.Some times its useful to take the output of a command and reformat it. I sometimes do this for aesthetic purposes or tor format for use as input into another command.


Cut has options to cut by bytes (-b), characters (-c) or fields (-f). I normally cut by character or field but byte can come in handy some times.

The options to cut by are below.

N N’th byte, character or field, counted from 1

N- from N’th byte, character or field, to end of line

N-M from N’th to M’th (included) byte, character or field

-M from first to M’th (included) byte, character or field

The options pretty much explain themselves but I have included some simple examples below:

Cutting by characters (command on top, output below)

echo "123456789" | cut -c -5

12345

echo "123456789" | cut -c 5-

56789

echo "123456789" | cut -c 3-7

34567

echo "123456789" | cut -c 5

5

Sometimes output from a command is delimited so a cut by characters will not work. Take the example below:

echo -e "1\t2\t3\t4\t5" |cut -c 5-7

3 4

To echo a tab you have to use the -e switch to enable echo to process back slashed characters. If the desired output is 3\t4 then this would work great if the strings were always 1 character but if anywhere before field 3 a character was added the output would be completely changed as followed:

echo -e "1a\t2b\t3c\t4d\t5e" | cut -c 5-7

b 3

This is resolved by cutting by fields.

Cutting by fields

The syntax to cut by fields is the same as characters or bytes. The two examples below display different output but are both displaying the same fields (Fields 3 Through to the end of line.)

echo -e "1\t2\t3\t4\t5" | cut -f 3-

3 4 5

echo -e "1a\t2a\t3a\t4a\t5a" | cut -f 3-

3a 4a 5a

The default delimiter is a tab, if the output is delimited another way a custom delimiter can be specified with the -d option. It can be just about any printable character, just make sure that the character is escaped (back slashed) if needed. In the example below I cut the string up using the pipe as the delimiter.

echo "1|2|3|4|5" | cut -f 3- -d \|

3|4|5

One great feature of cut is that the delimiter that was used for input can be changed by the output of cut. In the example below I change the format of the string from a dash delimited output and change it to a comma.

echo -e "1a-2a-3a-4a-5a" | cut -f 3- -d – --output-delimiter=,

3a,4a,5a

Formatting with Cut Example

Sometimes certain Linux applications such as uptime do not have options to format the output. Cut can be used to pull out the information that is desired.

Normal up-time Command:

owen@the-linux-blog:~$ uptime

19:18:40 up 1 day, 22:15, 4 users, load average: 0.45, 0.10, 0.03

Time with up-time displayed:

owen@the-linux-blog:~$ uptime |cut -d , -f 1,2 | cut -c 2-

19:19:36 up 1 day, 22:22

For the above example I pipe the output of uptime to cut and tell it I want to split it with a comma , delimiter. I then choose fields 1 and 2. The output from that cut is piped into another cut that removes the spaces in front of the output.

Load averages extracted from uptime:

owen@the-linux-blog:~$ uptime |cut -d , -f 4- | cut -c 3-

load average: 0.42, 0.10, 0.03

This is about the same as the previous example except the fields changed. Instead of fields 1 and 2 I told it to display fields 4 through the end. The output from that is piped to another cut which removes the three spaces that were after the comma in "4 users, " by starting at the 3rd character.

The great thing about cutting by fields is that no matter if the field length changes the data stays the same. Take the example below. I now have 17 users logged in which would have broke the output if I had used -c (since there is an extra character due to a double digit number of users being logged in.)

owen@the-linux-blog:~$ uptime

19:25:11 up 1 day, 22:28, 17 users, load average: 0.00, 0.06, 0.04

owen@the-linux-blog:~$ uptime |cut -d , -f 4- | cut -c 3-

load average: 0.00, 0.06, 0.04

That just about covers everything for the cut command. Now you know about it you can use cut to chop up all types of strings. It is one of the many great tools available for string manipulation in bash. If you can remember what cut does it will make your shell scripting easier, you don’t need to memorize the syntax because all of the information on how to use cut is available here, in the man pages and all over the web.

No comments:

Post a Comment