Sorting a file with frequency on length Post: 302783329

Sponsored Content

Top Forums Shell Programming and Scripting Sorting a file with frequency on length Post 302783329 by gimley on Wednesday 20th of March 2013 09:32:19 AM

03-20-2013

Registered User

Sorting a file with frequency on length

Hello,
I have a file which has the following structure

Code:

word space Frequency

The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a PERL or AWK script which can sort the file on length of the headword and once the file is sorted on lenght: smallest to largest; sort each such set of words having the same length on their frequency.
At present I do this in Excel using the

Code:

=Len(text)

formula, but this is getting tedious.
I am giving below a sample input file

Code:

the 29962169
and 14291859
you 12345509
for 3296048
not 3091071
but 2994482
say 2345958
she 2123744
get 2081392
one 1988291
can 1915289
out 1812292
him 1571291
who 1543711
are 1487971
now 1453264
was 1399013
that 7834407
have 5930242
with 3983564
this 3814998
what 3327049
they 2684414
your 2329896
know 2221467
from 2207336
like 1845600
just 1756270
here 1558771
come 1541623
when 1465219
there 1957160
about 1903238
right 1410555
think 1398723
would 1346905

The expected output would be:

Code:

the 29962169
and 14291859
you 12345509
for 3296048
not 3091071
but 2994482
say 2345958
she 2123744
get 2081392
one 1988291
can 1915289
out 1812292
him 1571291
who 1543711
are 1487971
now 1453264
was 1399013
that 7834407
have 5930242
with 3983564
this 3814998
what 3327049
they 2684414
your 2329896
know 2221467
from 2207336
like 1845600
just 1756270
here 1558771
come 1541623
when 1465219
there 1957160
about 1903238
right 1410555
think 1398723
would 1346905

As you can see the file has been sorted on length and then on frequency.
Any help given would avoid the tedium of loading the file each time in excel. Many thanks in advance

gimley

View Public Profile for gimley

Find all posts by gimley

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know?

2. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected...

3. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

I have a fixed length file that need to be sorted according to the following rule IF B=1 ORDER by A,B Else ORDER by A,C Input file is ABC 131 112 122 231 212 222 Output needed ABC 112 131 122 212 231 222

4. Shell Programming and Scripting

Sorting value frequency within an array

How is it possible to sort different nummeric values within an Array. But i don`t want the highest or the lowest. I need the most frequently occurring value. For examble: My Array has to following values = (200 404 404 500 404 404 404 200 404) The result should be 404 The values are...

5. UNIX for Dummies Questions & Answers

Sorting words based on length

i need to write a bash script that recive a list of varuables kaka pele ronaldo beckham zidane messi rivaldo gerrard platini i need the program to print the longest word of the list. word in the output appears on a separate line and word order in the output is in the order Llachsicografi costs....

6. Shell Programming and Scripting

count frequency of words in a file

I need to write a shell script "cmn" that, given an integer k, print the k most common words in descending order of frequency. Example Usage: user@ubuntu:/$ cmn 4 < example.txt :b:

7. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------...

8. Shell Programming and Scripting

Sorting by length

Hello, I have a very large file: a dictionary of headwords of around 40000 and would like to have the dictionary sorted by its length i.e. the largest string first and the smallest at the end. I have hunted for a perl or awk script on the forum which can do the job but there is none available. I...

9. Shell Programming and Scripting

Sorting on length with identification of number of characters

Hello, I am writing an open-source stemmer in Java for Indic languages which admit a large number of suffixes. The Java stemmer requires that each suffix string be sorted as per its length and that all strings of the same length are arranged in a single group, sorted alphabetically. Moreover as a...

10. Shell Programming and Scripting

Assigning the same frequency to more than one words in a file

I have a file of names with the following structure NAME FREQUENCY NAME NAME FREQUENCY NAME NAME NAME FREQUENCY i.e. more than one name is assigned the same frequency. An example will make this clear SANDHYA DAS 6901 ARATI DAS 6201 KALPANA DAS 4714 GITA DAS 4550 BISWANATH DAS 3949...

LEARN ABOUT MOJAVE

locale::codes::langvar

Locale::Codes::LangVar(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangVar(3pm)

NAME

       Locale::Codes::LangVar - standard codes for language variation identification

SYNOPSIS

	  use Locale::Codes::LangVar;

	  $lvar = code2langvar('acm');		       # $lvar gets 'Mesopotamian Arabic'
	  $code = langvar2code('Mesopotamian Arabic'); # $code gets 'acm'

	  @codes   = all_langvar_codes();
	  @names   = all_langvar_names();

DESCRIPTION

       The "Locale::Codes::LangVar" module provides access to standard codes used for identifying language variations, such as those as defined in
       the IANA language registry.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language
       registry codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying language variations. A code set may be specified using either a name, or
       a constant that is automatically exported by this module.

       For example, the two are equivalent:

	  $lvar = code2langvar('en','alpha-2');
	  $lvar = code2langvar('en',LOCALE_CODE_ALPHA_2);

       The codesets currently supported are:

       alpha
	   This is the set of alphanumeric codes from the IANA language registry, such as 'arevela' for Eastern Armenian.

	   This code set is identified with the symbol "LOCALE_LANGVAR_ALPHA".

	   This is the default code set.

ROUTINES

       code2langvar ( CODE [,CODESET] )
       langvar2code ( NAME [,CODESET] )
       langvar_code2code ( CODE ,CODESET ,CODESET2 )
       all_langvar_codes ( [CODESET] )
       all_langvar_names ( [CODESET] )
       Locale::Codes::LangVar::rename_langvar  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Codes::LangVar::add_langvar  ( CODE ,NAME [,CODESET] )
       Locale::Codes::LangVar::delete_langvar  ( CODE [,CODESET] )
       Locale::Codes::LangVar::add_langvar_alias  ( NAME ,NEW_NAME )
       Locale::Codes::LangVar::delete_langvar_alias  ( NAME )
       Locale::Codes::LangVar::rename_langvar_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangVar::add_langvar_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangVar::delete_langvar_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.iana.org/assignments/language-subtag-registry
	   The IANA language subtag registry.

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 2011-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.18.2							    2014-01-06					       Locale::Codes::LangVar(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

Discussion started by: tranq01

2. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Discussion started by: Everton_Silveir

3. UNIX for Dummies Questions & Answers

Conditional sorting on fixed length flat file

Discussion started by: zsk_00

4. Shell Programming and Scripting

Sorting value frequency within an array

Discussion started by: 2retti

5. UNIX for Dummies Questions & Answers

Sorting words based on length

Discussion started by: yairpg

6. Shell Programming and Scripting

count frequency of words in a file

Discussion started by: mohit_iitk

7. Shell Programming and Scripting

Flat file-make field length equal to header length

Discussion started by: sonali.s.more

8. Shell Programming and Scripting

Sorting by length

Discussion started by: khoremand

9. Shell Programming and Scripting

Sorting on length with identification of number of characters

Discussion started by: gimley

10. Shell Programming and Scripting

Assigning the same frequency to more than one words in a file

Discussion started by: gimley

LEARN ABOUT MOJAVE

locale::codes::langvar