Sorting a file with frequency on length


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting a file with frequency on length
# 8  
Old 03-21-2013
Sorry I should have remembered to specify that I am working in a Windows environment. My apologies.
I tested the code of Yoda and Alister and stepped through them.
They work beautifully till they encounter the sort routine. Without the sort being executed i.e. remmed out, the output is a file sorted on length in both cases (with slight variations)
I am giving below Yoda's routine which prompts the message given below:
Code:
cmd = "sort -nr -k2 " F
                while (( cmd | getline ) > 0 )
                       print
                close (cmd)

I understand what the code does but as soon as I execute this part, I get a message
Code:
Input file specified two times.

and the output file remains unchanged i.e. DOS seems to just ignore this part of the script.
Any turnaround please.
Many thanks
# 9  
Old 03-21-2013
If you can have access to a UNIX environment:

Code:
$ cat temp.x
about 1903238
and 14291859
are 1487971
but 2994482
can 1915289
come 1541623
for 3296048
from 2207336
get 2081392
have 5930242
here 1558771
him 1571291
just 1756270
know 2221467
like 1845600
not 3091071
now 1453264
one 1988291
out 1812292

Code:
$ cat temp.sh
while read word freq; do
  len=${#word}
  echo $len $word $freq
done < temp.x > temp2.x
sort -n -k 1 -k 3 temp2.x > temp3.x
cut -d " " -f 2-3 temp3.x

Code:
$ ./temp.sh
now 1453264
are 1487971
him 1571291
out 1812292
can 1915289
one 1988291
get 2081392
but 2994482
not 3091071
for 3296048
and 14291859
come 1541623
here 1558771
just 1756270
like 1845600
from 2207336
know 2221467
have 5930242
about 1903238

I think this is so much simpler to write and maintain. If all you have is awk (or sed, or perl, or whatever), you are crippled. If you have them all in combination, within the context of a shell script, it's incredibly more powerful and easy for handling text data than DOS / Windows.

A minor problem with the above script is that it sorts in ascending order, where I think perhaps you wanted descending order. I don't think it's a big deal. If nothing else, you could reverse the file with tac and start looking at the long words first.
# 10  
Old 03-21-2013
Quote:
Originally Posted by hanson44
I think this is so much simpler to write and maintain. If all you have is awk (or sed, or perl, or whatever), you are crippled. If you have them all in combination, within the context of a shell script, it's incredibly more powerful and easy for handling text data than DOS / Windows.
If all you have is perl, you are definitely not crippled. Far from it. If all you have is perl, you'll seldom have need for any of the other standard UNIX tools. With perl you can accomplish anything you can accomplish with awk, sed, grep, sort, cut, paste, ls, find, etc. It's distribution even includes a2p and s2p, which, respectively, automatically convert awk and sed scripts to perl scripts. Further, Perl provides access to system interfaces for which there is no standard utility, e.g. stat, and date/time facilities which embarrass the typical date implementation.

Most people who are advised to install cygwin to run sh scripts could easily do without, if they just knew perl.

To quote Rob Pike from a nearly 10 year old Slashdot interview:
Quote:
8) One tool for one job? - by sczimme
Given the nature of current operating systems and applications, do you think the idea of "one tool doing one job well" has been abandoned? If so, do you think a return to this model would help bring some innovation back to software development?

(It's easier to toss a small, single-purpose app and start over than it is to toss a large, feature-laden app and start over.)

Pike:
Those days are dead and gone and the eulogy was delivered by Perl.
For the record, I'm not a fanboy defending his favorite language. Perl is a powerful tool, but I don't use it very often (although after writing this post, I think I should Smilie).

Quote:
Originally Posted by hanson44
A minor problem with the above script is that it sorts in ascending order, where I think perhaps you wanted descending order. I don't think it's a big deal. If nothing else, you could reverse the file with tac and start looking at the long words first.
Your code is identical to mine, except it uses a shell while-read loop instead of awk, and temp files instead of pipes. See my sort command (post #3) for how to fix yours.

Regards,
Alister
# 11  
Old 03-21-2013
Going back to post #3, assuming I copied and pasted correctly:

Code:
$ cat xxx.x
the 29962169
and 14291859
you 12345509
for 3296048
not 3091071
but 2994482
say 2345958
she 2123744
get 2081392
one 1988291
can 1915289
out 1812292
him 1571291
who 1543711
are 1487971
now 1453264
was 1399013
that 7834407
have 5930242
with 3983564

$ awk '{print length($1), $0}' xxx.x | sort -n -k1 -k3r | cut -d' ' -f2-
for 3296048
not 3091071
the 29962169
but 2994482
say 2345958
she 2123744
get 2081392
one 1988291
can 1915289
out 1812292
him 1571291
who 1543711
are 1487971
now 1453264
and 14291859
was 1399013
you 12345509
that 7834407
have 5930242
with 3983564

I see a problem. I'm sure there is some way to remedy this. But it seems to not sort correctly to me.

Who am I to disagree with Rob Pike? If he says "those days are dead" then I guess I should do everything with perl. Smilie

Seriously, I have nothing against others using perl, awk, sed, bash, python, whatever they want. And of course we must avoid "religious argument". I was mainly trying to encourage someone to try using Unix. I use Windows constantly, but do all the SW development and text processing on Unix, because it is so incredibly more productive for the kind of high-end work I do. I would grant that if you're stuck on DOS / Windows perl is probably a good idea to help processing text.

I just use the temp files to make the code clearer.
This User Gave Thanks to hanson44 For This Post:
# 12  
Old 03-21-2013
Quote:
Originally Posted by hanson44
Going back to post #3...
I see a problem. I'm sure there is some way to remedy this. But it seems to not sort correctly to me.
Indeed, that sort command is incorrect. Nice catch (and an amusingly ironic one, since I directed you to it to help you fix your sort).

The problem is that attaching a modifier to a key ('r' to field 3) disables any "global" modifiers (-n) that may be in effect. I did not test the following, but it should do the trick:
Code:
sort -k1n -k3nr

That should be equivalent to
Code:
sort -n -k1 -k3nr

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 13  
Old 03-21-2013
I didn't know about those sort modifiers being attached to the end of the positions. That's really quite useful and clever.

I tried it with "sort -k 1n -k 3rn" as you suggest and those options produce the desired sort.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Assigning the same frequency to more than one words in a file

I have a file of names with the following structure NAME FREQUENCY NAME NAME FREQUENCY NAME NAME NAME FREQUENCY i.e. more than one name is assigned the same frequency. An example will make this clear SANDHYA DAS 6901 ARATI DAS 6201 KALPANA DAS 4714 GITA DAS 4550 BISWANATH DAS 3949... (4 Replies)
Discussion started by: gimley
4 Replies

2. Shell Programming and Scripting

Sorting on length with identification of number of characters

Hello, I am writing an open-source stemmer in Java for Indic languages which admit a large number of suffixes. The Java stemmer requires that each suffix string be sorted as per its length and that all strings of the same length are arranged in a single group, sorted alphabetically. Moreover as a... (3 Replies)
Discussion started by: gimley
3 Replies

3. Shell Programming and Scripting

Sorting by length

Hello, I have a very large file: a dictionary of headwords of around 40000 and would like to have the dictionary sorted by its length i.e. the largest string first and the smallest at the end. I have hunted for a perl or awk script on the forum which can do the job but there is none available. I... (8 Replies)
Discussion started by: khoremand
8 Replies

4. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

5. Shell Programming and Scripting

count frequency of words in a file

I need to write a shell script "cmn" that, given an integer k, print the k most common words in descending order of frequency. Example Usage: user@ubuntu:/$ cmn 4 < example.txt :b: (3 Replies)
Discussion started by: mohit_iitk
3 Replies

6. UNIX for Dummies Questions & Answers

Sorting words based on length

i need to write a bash script that recive a list of varuables kaka pele ronaldo beckham zidane messi rivaldo gerrard platini i need the program to print the longest word of the list. word in the output appears on a separate line and word order in the output is in the order Llachsicografi costs.... (1 Reply)
Discussion started by: yairpg
1 Replies

7. Shell Programming and Scripting

Sorting value frequency within an array

How is it possible to sort different nummeric values within an Array. But i don`t want the highest or the lowest. I need the most frequently occurring value. For examble: My Array has to following values = (200 404 404 500 404 404 404 200 404) The result should be 404 The values are... (3 Replies)
Discussion started by: 2retti
3 Replies

8. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

I have a fixed length file that need to be sorted according to the following rule IF B=1 ORDER by A,B Else ORDER by A,C Input file is ABC 131 112 122 231 212 222 Output needed ABC 112 131 122 212 231 222 (1 Reply)
Discussion started by: zsk_00
1 Replies

9. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

10. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies
Login or Register to Ask a Question