Sorting by length


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting by length
# 8  
Old 11-01-2012
Quote:
Originally Posted by Scrutinizer
Or, instead of tac, expanding on ripat's suggestion:
Code:
awk '{l=length; if(l>m)m=l; w[l]=w[l] $0 RS} END{for(l=m;l>=1;l--) if(w[l])printf "%s",w[l]}' infile

Nice trick for the reverse order!

Edit:
My solution above does not return the expected result on large dictionaries. The w array gets out of sequence. Updated version inspired by Scrutinizer's backwards loop:

Code:
awk '{le=length;w[le] = w[le] ? w[le]"\n"$0 : $0} END {for(i=length(w);i>=1;i--)  print w[i]}'


Last edited by ripat; 11-01-2012 at 10:34 AM..
This User Gave Thanks to ripat For This Post:
# 9  
Old 11-01-2012
Quote:
Originally Posted by ripat
awk oneliner alternative
Code:
awk '{w[length] = w[length] ? w[length]"\n"$0 : $0} END {for(l in w) print w[l]}' file

Quote:
Originally Posted by ripat
For the reverse order, just pipe the output in tac:
Code:
awk '{w[length] = w[length] ? w[length]"\n"$0 : $0} END {for(l in w) print w[l]}' file | tac

Quote:
Originally Posted by ripat
My solution above does not return the expected result on large dictionaries. The w array gets out of sequence.
For your specific AWK implementation, the number of member's in the array may affect the order in which the members are retrieved, but, more generally, the problem is that you were depending on undefined behavior. A solution that gives the desired result with gawk could fail on nawk. A solution that works with version N of some awk implementation could fail on version N+1 of that same implementation. And in none of those cases is an implementation not complying with the standard.

From Opengroup :: AWK:
Quote:
for (variable in array)

which shall iterate, assigning each index of array to variable in an unspecified order.
The OP never stated their operating system. If it's not GNU/Linux, tac may not be available.

In my opinion, in post #2, bipinajith linked to the nicest solution. The only thing it needs is a cut to dedecorate:
Code:
awk '{print length "\t" $0}' | sort -n | cut -f2-

These days, 40,000 lines isn't very many. Any machine that can run the perl interpreter can make short work of such a file. Unless the sort pipeline will be executed many times in a tight loop, there's no point in sacrificing simplicity, readability, and maintainability for efficiency.

Quote:
Originally Posted by khoremand
I am a newbie to Perl and more accustomed to C programming. The C program I wrote takes ages and I believe Perl or Awk are blazing fast.
Sounds like your C program is buggy. Perhaps you should post your C program to the programming forum for help.

Perl and AWK (gawk, mawk, nawk, busybox awk) are C programs themselves. And since they're general purpose interpreters, your specialized C program should not be outperformed.

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert variable length record to fixed length

Hi Team, I have an issue to split the file which is having special chracter(German Char) using awk command. I have a different length records in a file. I am separating the files based on the length using awk command. The command is working fine if the record is not having any... (7 Replies)
Discussion started by: Anthuvan
7 Replies

2. Shell Programming and Scripting

Sorting a file with frequency on length

Hello, I have a file which has the following structure word space Frequency The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a PERL or AWK script which can sort the file on length of the headword and once the file is sorted on... (12 Replies)
Discussion started by: gimley
12 Replies

3. Shell Programming and Scripting

Sorting on length with identification of number of characters

Hello, I am writing an open-source stemmer in Java for Indic languages which admit a large number of suffixes. The Java stemmer requires that each suffix string be sorted as per its length and that all strings of the same length are arranged in a single group, sorted alphabetically. Moreover as a... (3 Replies)
Discussion started by: gimley
3 Replies

4. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

5. UNIX for Dummies Questions & Answers

Sorting words based on length

i need to write a bash script that recive a list of varuables kaka pele ronaldo beckham zidane messi rivaldo gerrard platini i need the program to print the longest word of the list. word in the output appears on a separate line and word order in the output is in the order Llachsicografi costs.... (1 Reply)
Discussion started by: yairpg
1 Replies

6. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

I have a fixed length file that need to be sorted according to the following rule IF B=1 ORDER by A,B Else ORDER by A,C Input file is ABC 131 112 122 231 212 222 Output needed ABC 112 131 122 212 231 222 (1 Reply)
Discussion started by: zsk_00
1 Replies

7. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

8. UNIX for Dummies Questions & Answers

Sed working on lines of small length and not large length

Hi , I have a peculiar case, where my sed command is working on a file which contains lines of small length. sed "s/XYZ:1/XYZ:3/g" abc.txt > xyz.txt when abc.txt contains lines of small length(currently around 80 chars) , this sed command is working fine. when abc.txt contains lines of... (3 Replies)
Discussion started by: thanuman
3 Replies

9. Shell Programming and Scripting

creating a fixed length output from a variable length input

Is there a command that sets a variable length? I have a input of a variable length field but my output for that field needs to be set to 32 char. Is there such a command? I am on a sun box running ksh Thanks (2 Replies)
Discussion started by: r1500
2 Replies
Login or Register to Ask a Question