Sorting based on multiple delimiters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting based on multiple delimiters
# 1  
Old 05-02-2011
Sorting based on multiple delimiters

Hello,
I have data where words are separated by a delimiter. In this case "="
The number of delimiters in a line can vary from 4to 8. The norm is 4.
Is it possible to have a script where the file could be separated starting with highest number of delimiters and ending with the lowest
An example is given below:

INPUT
Code:
a=b=c=d=e
a=b=c=d=e=d=g=a=b
a=b=c=d=e=f
a=b=c=d=e=f=g
a=b=c=d=e=f=g=a
a=b=c=d=e=f=g=a=c
a=b=c=d=e=f=h
a=b=n=d=e=f
a=b=p=d=e
h=b=c=d=e=f=g=a

EXPECTED OUTPUT
What I would like is the following:
Code:
8 delimiters
a=b=c=d=e=d=g=a=b
a=b=c=d=e=f=g=a=c
7 delimiters
h=b=c=d=e=f=g=a
a=b=c=d=e=f=g=a
6 delimiters
a=b=c=d=e=f=g
a=b=c=d=e=f=h
5 delimiters
a=b=n=d=e=f
a=b=c=d=e=f
4 delimiters
a=b=c=d=e
a=b=p=d=e

The file is very large around 300,000 lines.
I know that a regex can do the job, but I don't know how to introduce regexes in awk or perl

Many thanks in advance


SORRY FOR MULTIPLE POSTING. MY NETWORK COLLAPSED JUST WHEN I WAS SUBMITTING THE FILE

Moderator's Comments:
Mod Comment Please use [code] and [/code] tags when posting code, data or logs etc. to preserve formatting and enhance readability. Also please refrain from writing subjects in all upper-case letters to get more attention. Thank you.

Last edited by zaxxon; 05-03-2011 at 03:31 AM.. Reason: code tags and subject
# 2  
Old 05-03-2011
Code:
awk -F = 'NR==1{max=NF;min=NF}
         {max=(max>NF)?max:NF;min=(min<NF)?min:NF;a[NF]=(a[NF]=="")?$0:a[NF] ORS $0}
    END{for (i=max;i>=min;i--) {if (a[i]!="") print i-1 " delimiters" ORS a[i]}}' infile

8 delimiters
a=b=c=d=e=d=g=a=b
a=b=c=d=e=f=g=a=c
7 delimiters
a=b=c=d=e=f=g=a
h=b=c=d=e=f=g=a
6 delimiters
a=b=c=d=e=f=g
a=b=c=d=e=f=h
5 delimiters
a=b=c=d=e=f
a=b=n=d=e=f
4 delimiters
a=b=c=d=e
a=b=p=d=e

This User Gave Thanks to rdcwayx For This Post:
# 3  
Old 05-03-2011
Hello,
I tested the file and what I get is the message
0 delimiters
followed by the full set of sample test data.
I checked the script abd the syntax shows that the files should be sorted as per number of delimiters.
What has gone wrong ?
I am enclosing the testdata as a zip file.
Many thanks
# 4  
Old 05-03-2011
Try this:
Code:
awk -F= '{print NF, $0}' infile | sort -k1 -nr | awk '!d||$1!=d{d=$1; print d-1 " delimiters"}{print $2}'


Last edited by kevintse; 05-03-2011 at 01:29 AM..
# 5  
Old 05-03-2011
Quote:
Originally Posted by gimley
Hello,
I tested the file and what I get is the message
0 delimiters
followed by the full set of sample test data.
I checked the script abd the syntax shows that the files should be sorted as per number of delimiters.
What has gone wrong ?
I am enclosing the testdata as a zip file.
Many thanks
No problem I found.

If you run the awk in Solaris, please replace the command with nawk or /usr/xpg4/bin/awk
Code:
awk -F = 'NR==1{max=NF;min=NF}
         {max=(max>NF)?max:NF;min=(min<NF)?min:NF;a[NF]=(a[NF]=="")?$0:a[NF] ORS $0}
    END{for (i=max;i>=min;i--) {if (a[i]!="") print i-1 " delimiters" ORS a[i]}}' test |head -10

6 delimiters
pathan=inayat=khan=rashid=khan=sahebzadi=m
shiv=ram=tandale=ganesh=laxman=hirabai=m
5 delimiters
gore=bibi=sakina=irfanali=tayeba=f
jamadar=aves=ahmed=ashfaque=sherbano=m
ram=tandale=ganesh=laxman=hirabai=m
4 delimiters
kale=amita=bhanudas=shobha=f
lande=amit=chandrabhan=asha=m

---------- Post updated at 04:32 PM ---------- Previous update was at 04:25 PM ----------

Quote:
Originally Posted by kevintse
Try this:
Code:
awk -F= '{print NF, $0}' infile | sort -k1 -nr | awk '!d||$1!=d{d=$1; print d-1 " delimiters"}{print $2}'

Clever way.

little adjust (!a[$1]++) to look better, and -k1 is useless.
Code:
awk -F= '{print NF, $0}' infile | sort -nr |awk '!a[$1]++ {print $1-1 " delimiters" }{print $2}'

# 6  
Old 05-03-2011
Quote:
Originally Posted by rdcwayx
No problem I found.

If you run the awk in Solaris, please replace the command with nawk or /usr/xpg4/bin/awk
Code:
awk -F = 'NR==1{max=NF;min=NF}
         {max=(max>NF)?max:NF;min=(min<NF)?min:NF;a[NF]=(a[NF]=="")?$0:a[NF] ORS $0}
    END{for (i=max;i>=min;i--) {if (a[i]!="") print i-1 " delimiters" ORS a[i]}}' test |head -10

6 delimiters
pathan=inayat=khan=rashid=khan=sahebzadi=m
shiv=ram=tandale=ganesh=laxman=hirabai=m
5 delimiters
gore=bibi=sakina=irfanali=tayeba=f
jamadar=aves=ahmed=ashfaque=sherbano=m
ram=tandale=ganesh=laxman=hirabai=m
4 delimiters
kale=amita=bhanudas=shobha=f
lande=amit=chandrabhan=asha=m

---------- Post updated at 04:32 PM ---------- Previous update was at 04:25 PM ----------



Clever way.

little adjust (!a[$1]++) to look better, and -k1 is useless.
Code:
awk -F= '{print NF, $0}' infile | sort -nr |awk '!a[$1]++ {print $1-1 " delimiters" }{print $2}'

!a[$1]++ does look better, but it exposes a little overhead than !d||$1!=d, because it has to increment a[$1] by 1 for each line.
And again, -k1 is not useless. it is still for performance reason, if it is left out, sort has to take the entire line to sort the output, while if it is present, sort only needs to sort the first field(the delimiter count).
# 7  
Old 05-03-2011
Hello,
Unluckily I am working in Windows and have to fall back on GAWK/NAWK for windows.
Maybe this is the reason why I get the message
0 delimiters.
I should have mentioned the same to you at the outset. Sorry for the hassle. Any turn-around is possible?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert Columns before the last Column based on the Count of Delimiters

Hi, I have a requirement where in I need to insert delimiters before the last column of the total delimiters is less than a specified number. Say if the delimiters is less than 139, I need to insert 2 columns ( with blanks) before the last field awk -F 'Ç' '{ if (NF-1 < 139)} END { "Insert 2... (5 Replies)
Discussion started by: arunkesi
5 Replies

2. Shell Programming and Scripting

awk multiple delimiters

Hi Folks, This is the first time I ever encountered this situation My input file is of this kind cat input.txt 1 PAIXAF 0 1 1 -9 0 0 0 1 2 0 2 1 2 1 7 PAIXEM 0 7 1 -9 1 0 2 0 1 2 2 1 0 2 9 PAKZXY 0 2 1 -9 2 0 1 1 1 0 1 2 0 1 Till the sixth column (which is -9), I want my columns to... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

3. Shell Programming and Scripting

treating multiple delimiters[solved]

Hi, I need to display the last column value in the below o/p. sam2 PS 03/10/11 0 441 Unable to get o/p with this awk code awk -F"+" '{ print $4 }' pwdchk.txt I need to display 441(in this eg.) and also accept it as a variable to treat it with if condition and take a decision.... (1 Reply)
Discussion started by: sam_bd
1 Replies

4. Shell Programming and Scripting

Concatinating the lines based on number of delimiters

Hi, I have a problem to concatenate the lines based on number of delimiters (if the delimiter count is 9 then concatenate all the fields & remove the new line char bw delimiters and then write the following data into second line) in a file. my input file content is Title| ID| Owner|... (4 Replies)
Discussion started by: bi.infa
4 Replies

5. Shell Programming and Scripting

Sorting problem: Multiple delimiters, multiple keys

Hello If you wanted to sort a .csv file that was filled with lines like this: <Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr> (H : , M, S: ) by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys,... (20 Replies)
Discussion started by: Ryan.
20 Replies

6. Shell Programming and Scripting

AWK with multiple delimiters

I have the following string sample: bla bla bla bla bla I would like to extract the "123" using awk. I thought about awk -F"]" '{ print $1 }' but it doesn't work Any ideas ? (7 Replies)
Discussion started by: gdub
7 Replies

7. Shell Programming and Scripting

sorting(both Ascending & Descending) files based on multiple fields

Hi All, I am encountered with a problem while sorting a file based on multiple columns . I need to sort like: (field2,ascending) , (field3,ascending) ,(field8,descending) , (field7,ascending),(field13,ascending). So far i was sorting only in ascending order but here i need to use one... (1 Reply)
Discussion started by: apjneeraj
1 Replies

8. Shell Programming and Scripting

Sorting based on Multiple columns

Hi, I have a requirement whereby I have to sort a flat file based on Multiple Columns (similar to ORDER BY Clause of Oracle). I am getting 10 columns in the flat file and I want the file to be sorted on 1st, 3rd, 4th, 7th and 9th columns in ascending order. The flat file is pipe seperated. Any... (15 Replies)
Discussion started by: dharmesht
15 Replies

9. Shell Programming and Scripting

Cut based on Two Delimiters at one go

Hi I wanted to cut the feilds comming after % and After $ at one go can we do some thing like this cut -f 2 -d "%|$" (But it doesnot work) Input File BWPG %TCPRP1 $SCSPR000 BWPH %TCPRP1 $SCSPR003 BWPI %TRTYUP ResourceDescription="IMPRIMANTE " $BWOPTY BWPJ %ZOMBIE ... (4 Replies)
Discussion started by: pbsrinivas
4 Replies

10. Shell Programming and Scripting

Sorting a flat file based on multiple colums(using character position)

Hi, I have an urgent task here. I am required to sort a flat file based on multiple columns which are based on the character position in that line. I am restricted to use the character position instead of the space and sort +1 +2 etc to do the sorting. I understand that there is a previous... (8 Replies)
Discussion started by: cucubird
8 Replies
Login or Register to Ask a Question