Wednesday, December 7, 2011

The Perl Translation Operator

*_The Perl Translation Operator_*


*_Translation and Replacement of Characters_*

Let's consider a programming scenario in which we have to replace every
instance of a character or set of characters in a string. One way we
could go about accomplishing such a task would be to use the
*substitution operator *along with a suitable pattern. For instance,
let's say we have the string:

*$phrase = "Hello, it is a nice day today";*

Let's say we need to change all the instances of the letter *i *in this
string to the letter *a*. Our Perl programming expertise would
immediately render a solution. One such solution could be:

*#!/usr/bin/perl*

*$phrase = "Hello, it is a nice day today";*
*$phrase =~ s/i/a/g;*
*print "$phrase \n";*

This will yield:

*Hello, at as a nace day today*

Doesn't make much sense but, nonetheless, it was the replacement we were
looking for. The *g *option tagged on at the end of the expression
signifies a /global search and replace*. */Had we not specified the *g
*option then only the first instance of *i *occurring in the string
would have been replaced.

Now, let's add a small variation to the problem above. Instead of just
replacing every occurrence of *i *with *a*, lets replace every
occurrence of *a *in the original string with *i*.
We can attempt a solution to this problem by first using the expression:

*$phrase =~ s/i/a/g;*

to change all occurrences of *i *to *a*. This gives:

*$phrase = "Hello, at as a nace day today"*

and then using the expression:

*$phrase =~ s/a/i/g;*

to change all occurrences of *a *to *i*. Unfortunately, this undoes the
changes we just made via the previous search and replace thus we end up
with the string:

*"Hello, it is i nice diy todiy"*

which is not quite what we are looking for.

Unix has a built in translation function, *tr*, which performs the exact
type of exchange we are looking to do. It's syntax goes something like:

*$ tr */old new/

where /old /is the old string (values to look for) and /new /is the new
string (values to replace with). By default the command reads from
standard input and outputs to standard output but, the input and output
can be easily redirected. So, for example, let's say we want to convert
each *a * to an *i * and each *i *to an *a* in the string /"Hello, nice
day today", /we could do the following:

*$ echo "Hello, nice day today" | tr ai ia*

which will yield the output:

*Hello, at as i nace diy todiy*

One way that we can accomplish this from a Perl program is to run the
Unix *tr *program from a Perl program using *system* or back quotes.
However, Perl provides a *tr *operator which we can use to accomplish
this. The Perl version of *tr* has the following syntax:

*tr/*/old_string/*/*/new_string/

We can use this operator in a Perl program to perform the desired
translation as follows:

*#!/usr/bin/perl*

*$_ = "Hello, it is a nice day today";*

*tr/ia/ai/i*
*print "$_ \n";*

This gives the output:

*Hello, at as i nace diy todiy*

which, while it might not make much sense, is the desired result. The
character *i* in the old string corresponds with *a *in the new string
and the character *a *in the old string corresponds with *i *in the new
string. Thus, characters of *$_ *which match *i* in the old string are
replaced with the corresponding character of the new string *a*, and,
characters of *$_ *which match *a *in the old string are replaced with
the corresponding character *i *in the new string. The correspondence
between characters of the new string and the old string is shown in the
following diagram:
Translation Image

Another Example:

Let *$_ = "Every boy and girl likes to dance"*. The translation:

*tr/a-z/A-Z/;*

will yield the string:

*"EVERY BOY AND GIRL LIKES TO DANCE"*

Here *a-z *represents a range of characters, (/the lowercase letters of
the alphabet)/, Then, every character of *$_ *which matches a character
of the old string is replaced by the corresponding character of the new
string, which, in this case is the uppercase letter of the alphabet, as
the new string is represented by the range *A-Z*.



*_Relationship Between Matching Characters in the old string and
the Last Character of the New String_*

Let's say that we have the arbitrary string:

*$_ = "all cows eat corn and blue grass"*

and we perform the following character translation on that string:

*tr/a-z/x*

We will get:

*$_ = "xxx xxxx xxx xxxx xxx xxxx xxxx"*

Every character in *$_ *matches the old string and *a *in the old string
corresponds with *x *in the new string so, *x *replaces every *a
*character of *$_*, but, *x *is also repeated for every other character
of *$_ *which matches a character of the old string.

Let's look at another example. Let the string *$_ * be as above. Let's
apply the translation:

*tr/a-z/x5/*

to the string gives us:


a *l* *l* *c* *o* *w* *s* *e* *a* *t* *c* *o* *r*
*n* *a* *n* *d* *b* *l* *u* e *g* *r* a *s* *s*
x 5 5 5 5 5 5 5 x 5 5 5 5 5 x 5 5 5 5
5 5 5 5 x 5 5

The chart above shows the translation which occurs. Because every
character in *$_ * matches a character in the old string. Now the
character *a *in the old string corresponds to *x *in the new string and
the character *b *in the old string corresponds with *5 (*/the last
character/*)*, in the new string. Therefore all instances of *a *in the
string *$_ *are replaced with *x* and all instances of *b *in *$_ * are
replaced with the corresponding *5 *in the new string. However, also
make note of the fact that the last character of the new string *5 * is
also repeated for every character of *$_ *that matches the old string.
This occurs whenever the new string is shorter than the old string. The
last character of the new string will always be repeated for every
matching character of the string. This is useful, however, sometimes we
don't want this behavior to occur. One way of preventing this repeating
behavior is to use the *"d" *option which is explained below.



*_The "d" Option:_*

The *d (delete) *option can be applied to a translation by taking on a
*d * to the end of the expression. For example, letting *$_ *be the same
string used in the prior example, we can apply the expression:

*tr/a-z/x5/d*

This translation gives the following result:


a l l c o w s e a t c o r n a n d b l
u e g r a s s
x x x 5 x

We immediately see that the resulting string is quite a bit different
when we use the *d *option. That's because all characters which match
the old string but do not have a corresponding values in the new string
are deleted. Only those characters which either do not match the old
string at all, or those which match the old string and have a
corresponding value in the new string are replaced.

Let's look at another quick example using the *d *option. Let *$_ *be as
above. Lets apply the following expression to the *$_*:

*tr/abcd/QZ/d*

The following table shows the resulting string:

a l l c o w s e a t c o r n a n d b l
u e g r a s s
Q l l o w s e Q t o r n Q n Z l u e g
r Q s s

Here *a *in the old string corresponds with *Q *in the new string
*b *in the old string corresponds with *Z *in the new string
Therefore since the *d *option was used:
Characters of *$_ *which match characters of the new string and have a
corresponding value in the new string are replaced. These are the
characters *a *and *b*.

Characters of *$_ * which match characters of the old string but do not
have a corresponding value in the new string are deleted.

Characters of *$_ *which do not match characters of the old string are
copied .




*_Return Values of the /tr/ Operator:_*

In addition to matching and replacing characters, *tr* also gives a
return value. The value returned by *tr *is the number of characters of
the string matched by the old string. For example:

*$_ = "Boat floats over the deep ocean";*

*$count = tr/o/\-/;*
*print "$_ \n";*
*print "$count characters match \n";*

which gives the result:

*B-at fl-ats -ver the deep -cean*
*4 character match*

the *o *in the old string corresponds to the *- *in the new string. Note
that to match a *- *in the expression we had to precede it with a
backslash. This is important. For example:
The expression */a-z/ *will match the lowercase letters of the alphabet
for *a *thru *z*. The expression */a\-z/* will match *a*, *-*, or, *z*.

Ex:
Let *$_ = "boat floats over the deep ocean". *Consider the expression:

*$count = tr/a-z//;*

==> *$count = 26*.The expression simply maps *$_ *onto itself and the
value of *$count *represents the number of characters in the string not
counting spaces.

Ex:
Let *$_ *be as above. The expression *$count = tr/A-Za-z\ /; *==>
*$count = 31 *because snow spaces will be counted since we added a
backslash followed by a space to the expression.




*_The "c" Option:_*

The *c * option is the *complement *option.
Ex:

*$_ = "The boat floats over the deep ocean";*

*$count = tr/A-Za-z//;*

==> *$count = 29*

Now, if we append a *c *to the expression:

*$count = tr/A-Za-z//;*

==> *$count = 6*

The *"c" *complements the old string */A-Za-z/*, with all 256 standard
characters, thus, any character specified in the old string is removed
from the set of all possible letters. Only the spaces are matched.

Ex:

*$_ = "The boat floats over the deep ocean";*

*$count = tr/oa/ao/;*

==>

*$count = 7*
*$_ = "The baot flaots aver the deep acean";*

Now let's use the *c *option:

*$count = tr/oa/ao/c;*

==>

*$count = 28*


Every character other than those specified in the old string is matched
and replaced due to the action of the *c * option. The characters that
are specified in the old string are not replaced. The following chart
shows the resulting string:


T h e b o a t f l o a t s o v e r t
h e d e e p o c e a n
o o o o o o a o o o o o a o o o o o o o o o
o o o o o o o o o o o a o



*_The "s" Option:_*

The last option that we are going to examine here with regard to the
translation operator is the *s *option, often referred to as the
*squeeze-repeats *option. It essentially squeezes multiple copies of the
same successive translation into one single copy. The following example
demonstrates how this option works.

Ex:

*$_ = "The in car bus not will box";*

*tr/box/666*

results in the following translation:

T h e i n c a r b u s n o t w i l l b o x
T h e i n c a r 6 u s n 6 t w i l l 6 6 6

Note the three 6's at the end of the translation.

Now, let's apply the same expression using the *s *option:

*tr/box/666/s*

Results in the following translation:

T h e i n c a r b u s n o t w i l l b o x
T h e i n c a r 6 u s n 6 t w i l l 6

Note that the consecutive copies (repeats) of *6* have been squeezed
into one copy.