Reverse sort on delimited chunks within a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reverse sort on delimited chunks within a file
# 1  
Old 09-17-2012
Reverse sort on delimited chunks within a file

Hello,
I have a large file in which data of names is sorted according to their homographs. The database has the following structure:Each set of homographs with their corresponding equivalents in Devanagari is separated out from the next set by a hard return. An example will make this clear:
Quote:
#akhal
!akhal=अखाल
! akkal=अक्कल
! akal=अकाल

#akhande
!akhande=अखंडे
! aakhande=आखंडे
! akhnde=अखंडे

#aklash
!aklash=
! akhlas=अख्लास

#akshan
!akshan=अक्षन

#alag
!alag=अलग
! alagh=अलघ
! allagh=अलघ

#alakama
!alakama=अलकमा
! alkama=अलकमा
The revsort routines I have in Gawk/Perl sort in reverse order but by doing so, do not respect the structure of the file which gets jumbled up.
I have tried to write a sort in which each set is sorted in reverse order separately, maintaining the integrity of the data structure, but am quite frustrated with the results since I know the logic but just cannot handle the bit of delimiting sets and then sorting in reverse within each set.
As an example of the desired output the first two sets would look something like this: (manually sorted and correctly I hope)
Quote:
#akhal
! akkal=अक्कल
! akal=अकाल
!akhal=अखाल

#akhande
! akhnde=अखंडे
!akhande=अखंडे
! aakhande=आखंडे
Many thanks in advance for help. I work under windows so an awk or perl script would be of great use.
# 2  
Old 09-17-2012
Try this gawk solution:

Code:
gawk 'BEGIN { PROCINFO["sorted_in"] = "@ind_str_desc" }
  { delete L
    printf "%s\n",$1
    for(i=2;i<=NF;i++)
       L[gensub(/ /,"","g",$i)]=$i
    for(l in L) printf "%s\n", L[l]
    printf "\n" }' FS='\n' RS='' infile

This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 09-18-2012
Many thanks. Am not in at present. But will definitely get back to you with feedback. Your solutions always work.
Thanks once again

---------- Post updated at 10:59 PM ---------- Previous update was at 09:55 PM ----------

Hello,
Sorry to hassle you but I am getting a consistent error on line7 of the code. I tried to correct it in all possible manners but I still get a consistent error.
Could you please help. Am reproducing the awk message below:
Code:
gawk: sortonsets.gk:7:     printf "\n" }' FS='\n' RS='
gawk: sortonsets.gk:7:                  ^ Invalid char ''' in expression

Many thanx

---------- Post updated at 11:19 PM ---------- Previous update was at 10:59 PM ----------

Sorry for the goof-up. Guess I was too tired. Here's the working code I put in comments to get clarity.
Code:
BEGIN { PROCINFO["sorted_in"] = "@ind_str_desc" }
  { delete L
    printf "%s\n",$1
    for(i=2;i<=NF;i++)
       L[gensub(/ /,"","g",$i)]=$i
    for(l in L) printf "%s\n", L[l]
    printf "\n" }
    # change the record separator from newline to nothing	
	RS=""
# change the field separator from whitespace to newline
	FS="\n"

# 4  
Old 09-18-2012
NP, probably better to put RS and FS assignments in the BEGIN block.

Code:
BEGIN {
    # change the record separator from newline to empty line
    RS="";

    # change the field separator from whitespace to newline
    FS="\n"; 

    # change array sort order from unsorted to by index descending
    PROCINFO["sorted_in"] = "@ind_str_desc";
}

This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 09-18-2012
Many thanks. Will try it out and see the output.
For the nonce and out of curiosity, I did want to put the FS and RS at the top but you had placed them at the end: does it make any difference to execution.
Many thanks once more for taking pains to make this useful suggestion.
# 6  
Old 09-18-2012
Yes it does make a difference. Originally I had the assignments on the command line (outside of the single quotes) and these assignments are done before the BEGIN block.

Assignments in the program outside of the BEGIN block will be done when each line is read in. This bad as the first line will be parsed with the default RS and FS before this happens.

I don't tend to change RS in the middle of the code so I'm not sure what will happen to $0 after this is done, but I assume that nothing will happen with the current line as it's already been read from the file at this stage.
This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort tab delimited file according to which rows have missing values

Hello! I have a tab delimited file with values in three columns. Some values occur in all three columns, other values are present in only one or two columns. I would like to sort the file so that rows with no missing values come first, rows with one missing values come next, and rows with two... (9 Replies)
Discussion started by: MBarrett1213
9 Replies

2. UNIX for Dummies Questions & Answers

[Solved] Reverse the order of a list of file names (but not sort them alphabetically or numerically)

Hello all, I have a list of file names in a text document where each file name consists of 4 letters and 3 numbers (for example MACR119). There are 48 file names in the document (they are not in alphabetical or numerical order). I would like to reorder the list of names so that the 48th name is... (3 Replies)
Discussion started by: MDeBiasse
3 Replies

3. Shell Programming and Scripting

Reverse sort

Hello, I have a large list of names and would like to do a reverse sort on them i.e. the sort should be by the ending and not by the beginning of the word. I had written in awk a small script but it does wrong things { for(i=length($0);i>=1;i--) printf("%s/n",substr($0,i,1)); } Could anyone... (3 Replies)
Discussion started by: gimley
3 Replies

4. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Hi All, I have space delimited file similar to the one as shown below.. I need to convert it as a pipe delimited, the values inside the pipe delimited file should be as highlighted... AA ATIU2345098809 009697 005374 BB ATIU2345097809 005445 006518 CC ATIU9685098809 003215 003571 DD... (7 Replies)
Discussion started by: nithins007
7 Replies

5. Shell Programming and Scripting

reverse sort file

Hi all I am trying to numerically reverse sort a file but I seem to be having trouble. Example of file contents: text1,1 text2,-1 text3,0 I can sort using sort -k 2n -t, filename without any problems. However I want my results in descending order but using -r in my command... (2 Replies)
Discussion started by: pxy2d1
2 Replies

6. UNIX for Dummies Questions & Answers

Sort the fields in a comma delimited file

Hi, I have a comma delimited file. I want to sort the fields alphabetically and again store them in a comma delimited file. For example, My file looks like this. abc,aaa,xyz,xxx,def pqr,ggg,eee,iii,qqq zyx,lmo,pqr,abc,fff and I want my output to look like this, all fields sorted... (3 Replies)
Discussion started by: swethapatil
3 Replies

7. UNIX for Dummies Questions & Answers

sort -reverse order

I need to sort the particular column only in reverse order how i can give it.. if i give the -r option the whole file is getting sorted in reverse order. 1st 2nd col 3rd C col 4th col 5th col ------------------------------------------- C... (7 Replies)
Discussion started by: sivakumar.rj
7 Replies

8. Shell Programming and Scripting

reverse sort

Hello, How do i sort a csv file. i should be sorting column1(varchar),column2*(varchar) in ascending and column4 in descending order(numeric datatype). I tried few combinations of sort, but doesn't seem to be getting the right result. sort -t "," -k 1 -k 2 -k 4nr file any help is... (3 Replies)
Discussion started by: markjason
3 Replies

9. Shell Programming and Scripting

Converting Tab delimited file to Comma delimited file in Unix

Hi, Can anyone let me know on how to convert a Tab delimited file to Comma delimited file in Unix Thanks!! (22 Replies)
Discussion started by: charan81
22 Replies

10. Shell Programming and Scripting

sort a file in reverse order

I a file with log entries... I want to sort it so that the last line in the file is first and the first line is last.. eg. Sample file 1 h a f 8 6 After sort should look like 6 8 f a h 1 (11 Replies)
Discussion started by: frustrated1
11 Replies
Login or Register to Ask a Question