field separator in Perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting field separator in Perl
# 22  
Old 04-13-2009
Quote:
Originally Posted by ahsog
does this mean that the sort line will always have [1] or that it should be one number higher than the map line? eg. map [0] sort [1]; map [1] sort [2]
No. You use whatever index number in the sort block that is the correct one, could even be more than one. The general way is to store a copy of the original line(s) in index [0] in the first map block, and add additional fields as necessary for sorting the data. Then you sort them however your program requires, it could be field [1] or [2] or whatever. Then the last map block returns a copy of the original line in the sorted order. The last map block could also be used to further manipulate the data, it doesn't have to be used to just return a copy of the original data like your code is doing.
# 23  
Old 04-13-2009
weird. that doesn't look sorted. Smilie
# 24  
Old 04-13-2009
Thank you KevinADC, but I still cannot fully understand, and especially, quirkasaurus seems to be at least in part right. If I use [1] in the sort line everything goes fine, but if I use this:
Code:
use strict;
use warnings;
open (_file_, "< path-to-file")  or  die "Failed to read file : $! ";
my @not_sorted = <_file_>;
sub normalize {
   my $in = $_[0];
   $in = lc($in);
   $in =~ tr<aeiouu>
   <aeiouu>;
   $in =~ tr<abcdefghijklmnopqrsštuvwxyz>
   <\x01-\x1B>;
   return $in;
}
my @sorted = map {$_->[0]}
        sort{ $a->[2] cmp $b->[2]}
        map {chomp; [$_,normalize((split(/\&/))[0])]} @not_sorted;
print "$_\n" for @sorted;
close (_file_);

I get this:
Code:
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
Use of uninitialized value in string comparison (cmp) at zd line 14, <_file_> line 16.
bbc&aaa&aaa
mmn&aaa&ccc
lmn&bbb&aaa
aaa&ccc&ddd
ššš&&
sss&&aaa
zzz&&
aaa&bbb&ccc
aaa&aaa&bbb
uuu&&
šas&&
saš&&
cab&&
uuū&&
ūuu&&
ūūū&&

It should be ordered by second column after the first "&", but it is not clerarly the case. Why? And why all these errors?
# 25  
Old 04-13-2009
Thats because there is no index [2] in the anonymous arrays that get sorted in your code, only index [0] and [1].

This is the anonymous array:

Code:
[$_,normalize((split(/\&/))[0])]

for this line of your data:

Code:
bbc&aaa&aaa

the anonymous array will store these two values:

Code:
['bbc&aaa&aaa','bbc']

[0] = 'bbc&aaa&aaa'
[1] = 'bbc' (<-- this is the sort key)
# 26  
Old 04-13-2009
Quote:
Originally Posted by quirkasaurus
weird. that doesn't look sorted. Smilie
It is sorted by the first field only, the stuff before the first & symbol in each line:

aaa&ccc&ddd
aaa&bbb&ccc
aaa&aaa&bbb
bbc&aaa&aaa
cab&&
lmn&bbb&aaa
mmn&aaa&ccc
saš&&
sss&&aaa
šas&&
ššš&&
uuu&&
uuu&&
uuu&&
uuu&&
zzz&&

does that look sorted now? Smilie
# 27  
Old 04-13-2009
Now I understand.
Although I don't need it right now, what if I wanted to sort by second column or third ect.? And why does Perl by defalut store only the first column as [1], and trips away the other two? Is this what chomp does?
I also wanted to ask, why did you put the "normalize" in the map part? What difference does it make (if it makes any)? I took the normalize part from here: http://interglacial.com/~sburke/tpj/as_html/tpj14.html

Last edited by ahsog; 04-13-2009 at 06:48 PM..
# 28  
Old 04-13-2009
Quote:
Originally Posted by ahsog
Now I understand.
Although I don't need it right now, what if I wanted to sort by second column or third ect.? And why does Perl by defalut store only the first column as [1], and trips away the other two? Is this what chomp does?
I also wanted to ask, why did you put the "normalize" in the map part? What difference does it make (if it makes any)? I took the normalize part from here: International Sorting with Perl's sort
chomp only removes the input record seperator, which is generally a newline. Perl is not storing the first column ( which is the return value of normalize() ) in index [1] by default, its stored there because the perl code is telling perl to store it there.

normalize() is inside the first map block so that each sort key is created only one time, then cached (stored) in the anonymous array, that has the potential to be much more efficient than calling the normalize() function inside the sort block because you then have the potential to have to create the same sort key more than one time, and depending on the data and the amount of data, the difference between caching the sort keys and creating them in the sort block can be dramatic, although only testing can tell if it really is a benefit to cache the sort keys or not, but generally it is.

If you wanted to sort by more columns you add them to the anonymous array and use the "||" operator in the sort block to add more sort comparisons:

sort {$a cmp $b || $b <=> $a || etc || etc || etc}

there is no limit I am aware of and you can sort descending or ascending and use cmp (ASCII sort) or <=> (numeric sort) in the same sort block.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inserting a field without disturbing field separator on other fields

Hi All, I have the input as below: cat input 032016002 2.891 97.109 16.605 27.172 24.017 32.207 0.233 0.021 39.810 0.077 0.026 19.644 13.882 0.131 11.646 0.102 11.449 76.265 23.735 16.991 83.009 8.840 91.160 0.020 99.980 52.102 47.898 44.004 55.996 39.963 18.625 0.121 1.126 40.189... (15 Replies)
Discussion started by: am24
15 Replies

2. Shell Programming and Scripting

Field separator

Hello All, I have a file, but I want to separate the file at a particular record with comma"," in the line Input file APPLE6SSAMSUNGS5PRICEPERPIECEDOLLAR600EACH010020340URX581949695US to Output file APPLE6S,SAMSUNGS5,PRICEPERPIECE,DOLLAR600EACH,010020340URX581949695,US This is for... (11 Replies)
Discussion started by: m6248m
11 Replies

3. Shell Programming and Scripting

awk field separator

I need to set awk field separator to ";", but I need to avoid ";EXT". so that echo a;b;c;EXTd;e;f | awk -F";" '{print $3}' would give "c;EXTd" (2 Replies)
Discussion started by: locoroco
2 Replies

4. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk... (1 Reply)
Discussion started by: beca123456
1 Replies

5. Shell Programming and Scripting

Strings as Field separator

Hi, How i can use two strings as field separator.. I want to use filed separator's as &lt; and &gt; input - shdhd ads&lt;adsd adfs &gt;sdfsd sfsdfsd&lt; Please help me in this..:wall: thanks a lot... (3 Replies)
Discussion started by: pamu
3 Replies

6. Shell Programming and Scripting

Array and field separator

Hi all, I have an array in BASH and I need to change the IFS in order to split up it correctly. Here an example: array_test=(hello world+sunny) for elem in ${array_test}; do echo $elem done echo -e "\n changed IFS \n" OLD_IFS=$IFS IFS=+ for elem in ${array_test}; do echo... (3 Replies)
Discussion started by: Dedalus
3 Replies

7. Shell Programming and Scripting

Field separator X'1F'

Hi, I have a flat file with fields separated by a X'1F' i have to fetch 4th field from second line. please help me how to achieve it. I tried with below command and its not working. cut -f4 -d`echo -e '\x1f'` filename.txt I am using SunOS. Thanks in advance. (2 Replies)
Discussion started by: rohan10k
2 Replies

8. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

9. Shell Programming and Scripting

Field separator Ques.

Hello... Im trying to use "- " as field separator... I used awk -F"- " '{print $3}' input_file ... but it's not working, it assumes that the field separator is "-" and not "- " ... Any ideas ?? :( Thanks (6 Replies)
Discussion started by: yahyaaa
6 Replies

10. Shell Programming and Scripting

field separator as regexp

I have some version of AWK that does not support regular expression field separators ( neither do I have nawk or gawk). How do I go about reading a line with the field separator as either the string "=#" or "+=". My data looks like this: abhishek=#nnnnn+#1234+#87 One option is to use... (2 Replies)
Discussion started by: Abhishek Ghose
2 Replies
Login or Register to Ask a Question