Unix Cut or Awk from 'Right TO Left'


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unix Cut or Awk from 'Right TO Left'
# 1  
Old 12-13-2010
Unix Cut or Awk from 'Right TO Left'

Hello,
I want to get the User Name details of a user from a file list.

This list can be in the format:
  • FirstName_MiddleName1_LastName_ID
  • FirstName_LastName_ID
  • FirstName_MiddleName1_MiddleName2_LastName_ID

What i want it to return is FirstName_MiddleName1_LastName of a user.

I know in awk i can for instance get the ID this way
Code:
 echo FirstName_MiddleName1_LastName_ID | awk -F_ '{print $(NF -0)}'

Is there a way to get the user-name details (without the ID?) - maybe a reverse cut?
# 2  
Old 12-13-2010
One way:

Code:
echo FirstName_MiddleName1_LastName_ID | sed -e "s/_ID\$//g"

FirstName_MiddleName1_LastName

# 3  
Old 12-13-2010
Code:
echo FirstName_MiddleName1_LastName_ID | sed 's/_[^_][^_]*$//'
OR
echo FirstName_MiddleName1_LastName_ID | nawk -F_ 'NF--&&$1=$1' OFS=_

# 4  
Old 12-13-2010
Hello, Thanks for your help. but unfortunately im adding this to a perl script... and im getting syntax errors...

anyway to convert this to perl?
the $ARGV[0] would be FirstName_MiddleName1...etc_LastName_ID

i tried:
Code:
my $tgt_user = print $ARGV[0] | sed 's/_[^_][^_]*$//;

Thanks
# 5  
Old 12-13-2010
Assuming that your data looks like this -

Code:
$
$ cat f35
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

If you want to remove the "_ID" at the end of each line, then you could do this -

Code:
$
$ perl -plne 's/_ID$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName
$

But that doesn't work if you have something other than "ID" at the end, for example "_XY".

To remove the "_" followed by (any) two characters at the end, do this -

Code:
$
$ perl -plne 's/_..$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName
$

That doesn't work if you have more than two characters after the "_" at the end, for example "_XYZ".

To remove "_" followed by any number of characters other than "_" at the end, do this -

Code:
$
$ perl -plne 's/_[^_]*$//' f35
FirstName_MiddleName1_LastName
FirstName_LastName
FirstName_LastName
FirstName_LastName
FirstName_MiddleName1_MiddleName2_LastName
$
$

This is a general substitution of which "_ID" is a special case.

Assuming that you want to go for the third one-liner, your Perl program would change like so -

Code:
...
my $tgt_user = $ARGV[0];
$tgt_user =~ s/_[^_]*$//;
...

HTH,
tyler_durden
This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 12-13-2010
Thanks again,
one last thing, in perl how do i get the Id alone?

i.e.
Code:
echo FirstName_LastName_ID | awk -F_ '{print $(NF +0)}'

tx

---------- Post updated at 07:04 PM ---------- Previous update was at 06:48 PM ----------

I solved it!
Code:
my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

---------- Post updated at 07:30 PM ---------- Previous update was at 07:04 PM ----------

Its not solved Smilie

This is the Error i get:
Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]*_/ at user_spl.pl line 39.

it works in the command line when i type:
Code:
perl -plne 's/[$_]*_//' users.txt

but not when i run the program:

Code:
my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

# 7  
Old 12-13-2010
Quote:
Originally Posted by limamichelle
...
This is the Error i get:
Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]*_/ at user_spl.pl line 39.

it works in the command line when i type:
Code:
perl -plne 's/[$_]*_//' users.txt

but not when i run the program:

Code:
my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

You really should have a look at the "Special Variables" section of the (FREE!!) online Perl documentation -

http://perldoc.perl.org/perlvar.html

Or have a look at how file processing is done in the book "Learning Perl".

First, the one-liner:

Code:
perl -plne 's/[$_]*_//' users.txt

$_ is Perl's default input and pattern-searching space. When you use the perl command-line interpreter with a file name and those options (plne), it opens the file for you, loops through each record and assigns each record to "$_".

So, with a "users.txt" file like the following -

Code:
$
$ cat users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

a "print $_" will print each record as expected:

Code:
$
$ perl -lne 'print $_' users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$
$

The option "p" will always print the line, so you can avoid typing "print". Like so -

Code:
$
$ perl -plne '$_' users.txt
FirstName_MiddleName1_LastName_ID
FirstName_LastName_ID
FirstName_LastName_XY
FirstName_LastName_XYZ
FirstName_MiddleName1_MiddleName2_LastName_ID
$

But using it with the "s///" operator may give you unexpected results -

Code:
$
$ perl -plne 's/$_//' users.txt
 
 
 
 
 
$

Which is why your regular expression works, but in an unintuitive way. The (square) brackets always match a single literal or a single character in a range.

So, [9] matches the digit "9", and [w-z] matches a single ASCII character in the range "w" through "z". The expression [$_] matches the first, and only the first, character of the input space. So, this -

Code:
$
$ perl -plne 's/[$_]//' users.txt
irstName_MiddleName1_LastName_ID
irstName_LastName_ID
irstName_LastName_XY
irstName_LastName_XYZ
irstName_MiddleName1_MiddleName2_LastName_ID
$

substitutes the first character by a zero-length string. You do not need [$_] to match a single character. A dot "." is a regular expression for a single character. So, this -

Code:
$
$ perl -plne 's/.//' users.txt
irstName_MiddleName1_LastName_ID
irstName_LastName_ID
irstName_LastName_XY
irstName_LastName_XYZ
irstName_MiddleName1_MiddleName2_LastName_ID
$

works exactly the same way.

And so, the regex [$_]*_ is essentially the same as .*_ and both mean zero or more characters all the way up to the last underscore character ("_").

Thus, your one-liner should've been like so -

Code:
$
$ perl -plne 's/.*_//' users.txt
ID
ID
XY
XYZ
ID
$
$

In a Perl program, @ARGV is an array of all input parameters, and if you pass a file name to it, then $ARGV[0] is the name of that file. This is not the same as $_.

Unlike the command-line perl interpreter, file handling (opening, looping, assigning to $_) is not done for you over here. You will have to do all that explicitly.

So, if your Perl program looks like this -

Code:
$
$ cat users.pl
#!perl -w
print "\$_ = |",$_,"|\n";
$

You won't see anything printed when you pass the file name as a parameter -

Code:
$
$ perl users.pl users.txt
Use of uninitialized value $_ in print at users.pl line 2.
$_ = ||
$

That because the file name "users.txt" is assigned to $ARGV[0] and that has nothing to do with $_.

You could check the value of the argument array (@ARGV) to get a better idea of what's happening:

Code:
$
$
$ cat users.pl
#!perl -w
print "My argument array \@ARGV is ==>|@ARGV|<==\n";
$
$
$ perl users.pl users.txt
My argument array @ARGV is ==>|users.txt|<==
$
$ perl users.pl users.txt users1.txt users2.txt users3.txt
My argument array @ARGV is ==>|users.txt users1.txt users2.txt users3.txt|<==
$
$

Perl "flattens" the argument array and shows its elements as a string separated by single spaces. But you should know that in the second case, $ARGV[0] = "users.txt", $ARGV[1] = "users1.txt", $ARGV[2] = "users2.txt" and $ARGV[3] = "users3.txt".

Maybe printing it in a loop is clearer -

Code:
$
$ cat users.pl
#!perl -w
print "My argument array \@ARGV is as follows:\n";
for ($i=0; $i<=$#ARGV; $i++) {
  print "Element $i = $ARGV[$i]\n";
}
$
$ perl users.pl users.txt users1.txt users2.txt users3.txt
My argument array @ARGV is as follows:
Element 0 = users.txt
Element 1 = users1.txt
Element 2 = users2.txt
Element 3 = users3.txt
$
$

So this piece of code -

Code:
my $ID = $ARGV[0];
$ID =~ s/[$_]*_//;

assigns the first element of the array @ARGV i.e. the file name parameter to $ID, but the next invocation to s/// operator tries to substitute the following regex:

Code:
s/[]*_//

$_ is uninitialized, remember? And so Perl throws that error message.

(You could initialize $_ and use it here, but that's beside the point.)

So what you want to do after you obtain the file name is, open the file, loop through the records, let Perl assign the record value to $_ implicitly and then call the s/// operator. Like so -

Code:
$
$
$ cat users.pl
#!perl -w
my $ID = $ARGV[0];                                 # the file name is assigned to $ID now
open (DATA, "<", $ID) or die "Can't open $ID: $!"; # open the file for reading and associate it with the file handler DATA
while (defined ($_ = <DATA>)) {                    # assign the next record to $_ and while it is defined, then
  $_ =~ s/.*_//;                                   # remove all characters up to the last "_" in current record
  print $_;                                        # and print the resultant current record
}                                                  # until there are no more records
close (DATA) or die "Can't close $ID: $!";         # good idea to clean up after ourselves
$
$

And then the Perl program will work as expected -

Code:
$
$
$ perl users.pl users.txt
ID
ID
XY
XYZ
ID
$
$

But Perl does a lot of things for you, without you asking for them. And that is true especially for Perl's default variable "$_". So when you say this -

Code:
while (<DATA>)

Perl knows you meant this -

Code:
while (defined ($_ = <DATA>))

And when you say this -

Code:
s/.*_//;

Perl knows you meant this -

Code:
$_ =~ s/.*_//;

and so on...

So your program can be shortened to this -

Code:
$
$
$ cat users.pl
#!perl -w
my $ID = $ARGV[0];                                 # the file name is assigned to $ID now
open (DATA, "<", $ID) or die "Can't open $ID: $!"; # open the file for reading and associate it with the file handler DATA
while (<DATA>) {                                   # assign the next record to $_ and while it is defined, then
  s/.*_//;                                         # remove all characters up to the last "_" in current record
  print;                                           # and print the resultant current record
}                                                  # until there are no more records
close (DATA) or die "Can't close $ID: $!";         # good idea to clean up after ourselves
$
$
$
$ perl users.pl users.txt
ID
ID
XY
XYZ
ID
$
$
$

HTH,
tyler_durden

Last edited by durden_tyler; 12-14-2010 at 10:10 AM..
These 2 Users Gave Thanks to durden_tyler For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Difficulties in matching left bracket as literal in awk

I need to work with records having #AX in the EXP1 , please see my data sample and my attempt below: $ cat xx 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AX0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 08:30:33 KEY1 (1255) EXP1 VAL:20AW0030006 $ gawk '{... (1 Reply)
Discussion started by: migurus
1 Replies

2. Shell Programming and Scripting

awk to substitute ip without zero left padding

Hello All, I have this script to awk IP to new file. #awk '/myip|yourip/ {sub(/...\....\....\..../, newip)}1' newip=$IP existing.txt > new.txt When existing.txt has myip=192.168.123.123 and $IP has 192.168.12.12, the awk script is not working. But while I add zero left padding to $IP i.e,... (3 Replies)
Discussion started by: Shaan_Shaan
3 Replies

3. Shell Programming and Scripting

Left pad spaces using awk or sed

Hi,I've a unix pipe delimited file as below f1|f2|f3|f4|f5|f6 My requirement is to pad spaces on the left to fields f2, f3 and f5. Field Lengths according to file layout f2 - 4 char f3 - 5 char f5 - 3 char If my record is as below 1|43|bc|h0|34|a Output record should be as below 1| 43| bc|h0|... (4 Replies)
Discussion started by: Soujanya_K
4 Replies

4. Shell Programming and Scripting

left join using awk

Hi guys, I need AWK to merge the following 2 files: file1 1 a 1 1 2 b 2 2 3 c 3 3 4 d 4 4 file2 a a/a c/c a/c c/c a/a c/t c c/t c/c a/t g/g c/c c/t desired output: 1 a 1 1 a/a c/c a/c c/c a/a c/t 2 b 2 2 x x x x x x 3 c 3 3 c/t c/c a/t g/g c/c c/t 4 d 4 4 x x x x x x (2 Replies)
Discussion started by: g1org1o
2 Replies

5. Shell Programming and Scripting

left join using awk

Hi guys, I need to use awk to join 2 files file_1 A 001 B 002 C 003 file_2 A XX1 B XX2 output desired A 001 XX1 B 002 missing C 003 XX2 thank you! (2 Replies)
Discussion started by: g1org1o
2 Replies

6. Shell Programming and Scripting

Left padding in Unix

I am passing input string,length, and the pad character. input string=123 Pad char=# Length=6 then the output should be: ###123 How we can do this? Thanks (5 Replies)
Discussion started by: pandeesh
5 Replies

7. Shell Programming and Scripting

Left Join in Unix based on Key?

So I have 2 files: File 1: 111,Mike,Stipe 222,Peter,Buck 333,Mike,Mills File 2: 222,Mr,Bono 444,Mr,Edge I want output to be below, where 222 records joined and all none joined records still in output 111,Mike,Stipe 222,Peter,Buck,Mr,Bono 333,Mike,Mills 444,Mr,Edge (4 Replies)
Discussion started by: stack
4 Replies

8. Shell Programming and Scripting

using cut command right to left

Hi guys I have variable that contains a full directory path. var=/tmp/a/b/c/d I want to be able to extract different directories in the path right to left while using / as the delimiter. so for e.g. if in the above example if I want c then I want it to be in the middle of 1st and... (2 Replies)
Discussion started by: alinaqvi90
2 Replies

9. Shell Programming and Scripting

Left join on files using awk

nawk 'NR==FNR{a;next} {if($1 in a) print $1,"Found" else print}' OFS="," File_B File_A The above code is not working help is appreciated (6 Replies)
Discussion started by: pinnacle
6 Replies

10. Shell Programming and Scripting

Awk reporting. Format the output with left justification for every feild

Fallowing is the input file that is pipe seperated. is it possible to generated the report that is alligned left justifed as that of sample output. I apprecitae your help on this. InputFile (temp.txt): 108005555|001|christina.lipski||Submitter... (3 Replies)
Discussion started by: ainuddin
3 Replies
Login or Register to Ask a Question