field separator in Perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting field separator in Perl
# 15  
Old 04-11-2009
use || instead of "or" in the sort. "or" is much lower precedence than || and in some circumstances, depeneding on how the code is written, "or" will not work properly because of that.
# 16  
Old 04-11-2009
You did good using the Schwarztian Transfrom to sort the data, but you're code doesn't take advantage of key caching, which makes the sort more efficient by calculating the sort keys only one time. Here it is modified to cache the sort keys:

Code:
use strict;
use warnings;
open (_file_, "< path-to-file")  or  die "Failed to read file : $! ";
my @not_sorted = <_file_>;
sub normalize {
   my $in = $_[0];
   $in = lc($in);
   $in =~ tr<aeiouu>
   <aeiouu>;
   $in =~ tr<abcdefghijklmnopqrsštuvwxyz>
   <\x01-\x1B>;
   return $in;
}
my @sorted = map {$_->[0]}
        sort{ $a->[1] cmp $b->[1]}
        map {chomp;[$_,normalize((split(/&/))[1]) ]} @not_sorted;
print "$_\n" for @sorted;
close (_file_);

# 17  
Old 04-11-2009
Smilie

I take everything back, it still does not work. I tried to change [1] to [2] to see if it sees the "&", but I got this:
Code:
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zs line 14, <_file_> line 16.
ššš&&
sss&&aaa
zzz&&
uuu&&
šas&&
saš&&
cab&&
uuū&&
ūuu&&
ūūū&&
bbc&aaa&aaa
mmn&aaa&ccc
aaa&aaa&bbb
lmn&bbb&aaa
aaa&bbb&ccc
aaa&ccc&ddd

I've also tried it on the real file, and it does not work properly.

With your modifications I get this:
Code:
ššš&&
sss&&aaa
zzz&&
uuu&&
šas&&
saš&&
cab&&
uuū&&
ūuu&&
ūūū&&
bbc&aaa&aaa
mmn&aaa&ccc
aaa&aaa&bbb
lmn&bbb&aaa
aaa&bbb&ccc
aaa&ccc&ddd

I also tried to download from cpan Sort::Fields but cannot make it to work the way I expect. Sometimes you really feel ignorant.
# 18  
Old 04-11-2009
Seems to work for me:

Code:
use strict;
use warnings;
#open (_file_, "< path-to-file")  or  die "Failed to read file : $! ";
my @not_sorted = <DATA>;
sub normalize {
   my $in = $_[0];
   $in = lc($in);
   $in =~ tr<aeiouu>
   <aeiouu>;
   $in =~ tr<abcdefghijklmnopqrsštuvwxyz>
   <\x01-\x1B>;
   return $in;
}
my @sorted = map {$_->[0]}
        sort{ $a->[1] cmp $b->[1]}
        map {chomp; [$_,normalize((split(/\&/))[0])]} @not_sorted;
print "$_\n" for @sorted;
#close (_file_);
__DATA__
bbc&aaa&aaa
mmn&aaa&ccc
lmn&bbb&aaa
aaa&ccc&ddd
ššš&&
sss&&aaa
zzz&&
aaa&bbb&ccc
aaa&aaa&bbb
uuu&&
šas&&
saš&&
cab&&
uuu&&
uuu&&
uuu&&

output:

aaa&ccc&ddd
aaa&bbb&ccc
aaa&aaa&bbb
bbc&aaa&aaa
cab&&
lmn&bbb&aaa
mmn&aaa&ccc
saš&&
sss&&aaa
šas&&
ššš&&
uuu&&
uuu&&
uuu&&
uuu&&
zzz&&
# 19  
Old 04-11-2009
I'm starting to understand.
But please explain me the use of [1] and [0]:
Code:
        sort{ $a->[1] cmp $b->[1]}
        map {chomp; [$_,normalize((split(/\&/))[0])]} @not_sorted;

I find it a bit confusing/ [0] is the first line from left, [1] is the second and so on, right?
Why did you write [0] in the last map line and [1] in the sort line?

But it works! Also on the real file.
SmilieSmilieSmilie
# 20  
Old 04-12-2009
This is really the line that makes it all work:

Code:
map {chomp; [$_,normalize((split(/\&/))[0])]} @not_sorted;

What happens is the data from @not_sorted is stored in an anonymous array, thats the stuff inside the square brackets []. First each line is chomp()'d. Then a copy of each line "$_" is stored in the first position [0] of the anonymous array (thats what is returned in the last map block to the sorted array). Then each line is split(/&/) and just the first field of the split [0] is sent to the normalize() function. Whats returned from normalize is stored in the second position of the anonymous array [1]. Now all the sort keys are stored in the second position of the anonymous arrays (cached keys) and that is what gets sorted in the sort block. I hope that is clear, if not ask again and I will try and explain. I do have an article posted on another forum that tries to explain the technique in more detail:

Sorting Data with the Schwartzian Transform - bytes
# 21  
Old 04-13-2009
does this mean that the sort line will always have [1] or that it should be one number higher than the map line? eg. map [0] sort [1]; map [1] sort [2]
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting a field without disturbing field separator on other fields

Hi All, I have the input as below: cat input 032016002 2.891 97.109 16.605 27.172 24.017 32.207 0.233 0.021 39.810 0.077 0.026 19.644 13.882 0.131 11.646 0.102 11.449 76.265 23.735 16.991 83.009 8.840 91.160 0.020 99.980 52.102 47.898 44.004 55.996 39.963 18.625 0.121 1.126 40.189... (15 Replies)
Discussion started by: am24
15 Replies

2. Shell Programming and Scripting

Field separator

Hello All, I have a file, but I want to separate the file at a particular record with comma"," in the line Input file APPLE6SSAMSUNGS5PRICEPERPIECEDOLLAR600EACH010020340URX581949695US to Output file APPLE6S,SAMSUNGS5,PRICEPERPIECE,DOLLAR600EACH,010020340URX581949695,US This is for... (11 Replies)
Discussion started by: m6248m
11 Replies

3. Shell Programming and Scripting

awk field separator

I need to set awk field separator to ";", but I need to avoid ";EXT". so that echo a;b;c;EXTd;e;f | awk -F";" '{print $3}' would give "c;EXTd" (2 Replies)
Discussion started by: locoroco
2 Replies

4. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk... (1 Reply)
Discussion started by: beca123456
1 Replies

5. Shell Programming and Scripting

Strings as Field separator

Hi, How i can use two strings as field separator.. I want to use filed separator's as &lt; and &gt; input - shdhd ads&lt;adsd adfs &gt;sdfsd sfsdfsd&lt; Please help me in this..:wall: thanks a lot... (3 Replies)
Discussion started by: pamu
3 Replies

6. Shell Programming and Scripting

Array and field separator

Hi all, I have an array in BASH and I need to change the IFS in order to split up it correctly. Here an example: array_test=(hello world+sunny) for elem in ${array_test}; do echo $elem done echo -e "\n changed IFS \n" OLD_IFS=$IFS IFS=+ for elem in ${array_test}; do echo... (3 Replies)
Discussion started by: Dedalus
3 Replies

7. Shell Programming and Scripting

Field separator X'1F'

Hi, I have a flat file with fields separated by a X'1F' i have to fetch 4th field from second line. please help me how to achieve it. I tried with below command and its not working. cut -f4 -d`echo -e '\x1f'` filename.txt I am using SunOS. Thanks in advance. (2 Replies)
Discussion started by: rohan10k
2 Replies

8. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

9. Shell Programming and Scripting

Field separator Ques.

Hello... Im trying to use "- " as field separator... I used awk -F"- " '{print $3}' input_file ... but it's not working, it assumes that the field separator is "-" and not "- " ... Any ideas ?? :( Thanks (6 Replies)
Discussion started by: yahyaaa
6 Replies

10. Shell Programming and Scripting

field separator as regexp

I have some version of AWK that does not support regular expression field separators ( neither do I have nawk or gawk). How do I go about reading a line with the field separator as either the string "=#" or "+=". My data looks like this: abhishek=#nnnnn+#1234+#87 One option is to use... (2 Replies)
Discussion started by: Abhishek Ghose
2 Replies
Login or Register to Ask a Question