How to Sort Records Uniquely?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to Sort Records Uniquely?
# 1  
Old 02-18-2008
How to Sort Records Uniquely?

I have a file containing many records separated by a % that I would like to sort uniquely (and if possible with a count of dupes) while maintaining the integrity of each record.
File looks like this:
Code:
%
srcip: 5.6.7.8
srcburb: internal
dstip: 1.2.3.4
dstport: 2000
dstburb: external
protocol: 6
%
srcip: 11.12..13.14
srcburb: external
dstip: 3.4.5.6
dstport: 137
dstburb: external
protocol: 6
%
srcip: 5.6.7.8
srcburb: internal
dstip: 1.2.3.4
dstport: 2000
dstburb: external
protocol: 6
%

Output would be:
Code:
%
srcip: 5.6.7.8
srcburb: internal
dstip: 1.2.3.4
dstport: 2000
dstburb: external
protocol: 6
%
srcip: 11.12..13.14
srcburb: external
dstip: 3.4.5.6
dstport: 137
dstburb: external
protocol: 6
%

I tried to use sort -t '%' -u filename but that doesnt maintain the records.

Any ideas?
# 2  
Old 02-18-2008
Code:
perl -0045 -ne'$/="%";print unless $X{$_}++' filename

Awk:

Code:
awk '!x[$0]++' RS="%" ORS="%" filename

P.S. This will ouput the unique records, but won't sort the input.

Last edited by radoulov; 02-19-2008 at 04:51 AM.. Reason: RS is a single character, so not only GNU Awk
# 3  
Old 02-18-2008
working towards a solution

[Note: \n is the newline]

use a sed or tr command to convert the \n to say a ~
then another sed or tr to convert the % to % and \n

now, each set of data is on one line

a sort command could then be used to sort and/or show unique
to do the final output, convert the ~ back to the \n
# 4  
Old 02-18-2008
You mean something like this:

Code:
tr '%' '~'<filename|tr '~\n' '\n@'|sort -u|tr '\n@' '%\n'

# 5  
Old 02-18-2008
Quote:
Originally Posted by radoulov
P.S. This will ouput the unique records, but won't sort the input.
It's not clear what key should be used for the sort. But if it were the string as a whole, we could try:

Code:
perl -0045 -ne'$/="%";$X{$_}++ } END { foreach (sort keys %X) { chop; print $_,"Count: ", $X{$_},"\n%"; ' filename

[/code]

You can change "sort keys" to "sort { EXPR } keys" where EXPR is an expression that compares $a and $b and returns 0, -1, or 1. The implied default is "$a <=> $b", but you can customize it to sort on a subfield, like:

Code:
foreach (sort {
   ($a =~ /dstport: (\d+)/m) <=> ($b =~ /dstport:  (\d+)/m); 
} keys %X) ....

# 6  
Old 02-18-2008
Quote:
Originally Posted by radoulov
Code:
perl -0045 -ne'$/="%";print unless $X{$_}++' filename

GNU Awk:

Code:
awk '!x[$0]++' RS="%" ORS="%" filename

P.S. This will ouput the unique records, but won't sort the input.
Thanks... the awk code seems to be doing the trick. Is there any way to count the # of instances of each record within awk?

Edit:
Found typo and got perl code working... Any way to count instances?

Last edited by earnstaf; 02-18-2008 at 02:58 PM..
# 7  
Old 02-18-2008
Quote:
Originally Posted by otheus
It's not clear what key should be used for the sort. But if it were the string as a whole, we could try:

Code:
perl -0045 -ne'$/="%";$X{$_}++ } END { foreach (sort keys %X) { chop; print $_,"Count: ", $X{$_},"\n%"; ' filename

[/code]

You can change "sort keys" to "sort { EXPR } keys" where EXPR is an expression that compares $a and $b and returns 0, -1, or 1. The implied default is "$a <=> $b", but you can customize it to sort on a subfield, like:

Code:
foreach (sort {
   ($a =~ /dstport: (\d+)/m) <=> ($b =~ /dstport:  (\d+)/m); 
} keys %X) ....

otheus,

This looks good and I see you have an output for "count" but I'm a little confused on how to use it in practice. The keys, in particular, is something I'm not familiar at all with. If we have it sort on the subfield dstport, and then match somewhere, but srcip is different within the same record, what is the output?

Also, do I need to write a perl script or can this be done from the command line?

I ran the command you have in your original block of code, and it gives the output that radoulov's code gives with a Count at the bottom of each record (which is blank, meaning it just reads "Count:" with nothing filling it.) I would guess that has something to do with the foreach in your next block of code that I'm not really sure how to implement.

Last edited by earnstaf; 02-18-2008 at 03:07 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract delta records using with "comm" and "sort" commands combination

Hi All, I have 2 pipe delimited files viz., file_old and file_new. I'm trying to compare these 2 files, and extract all the different rows between them into a new_file. comm -3 < sort file_old < sort file_new > new_file I am getting the below error: -ksh: sort: cannot open But if I do... (7 Replies)
Discussion started by: njny
7 Replies

2. Shell Programming and Scripting

How to read records in a file and sort it?

I have a file which has number of pipe delimited records. I am able to read the records....but I want to sort it after reading. i=0 while IFS="|" read -r usrId dataOwn expire email group secProf startDt endDt smhRole RoleCat DataProf SysRole MesgRole SearchProf do print $usrId $dataOwn... (4 Replies)
Discussion started by: harish468
4 Replies

3. UNIX for Advanced & Expert Users

How to uniquely distinguish between two USB ports??

Hi all, I am facing a problem while writing a shell script. My machine has two USB ports- left port and right port. whenever I connect USBS to both the ports, entry is generated as /sys/block/sdc and /sys/block/sdd and I mount the USBs to a particular directory. But I need to know... (3 Replies)
Discussion started by: Pkumar Sachin
3 Replies

4. UNIX for Dummies Questions & Answers

Alphabetical sort for multi line records contains in a single file

Hi all, I So, I've got a monster text document comprising a list of various company names and associated info just in a long list one after another. I need to sort them alphabetically by name... The text document looks like this: Company Name: the_first_company's_name_here Address:... (2 Replies)
Discussion started by: quee1763
2 Replies

5. Shell Programming and Scripting

Unix sort for fixed length columns and records

I was trying to use the AIX 6.1 sort command to sort fixed-length data records, sorting by specific columns only. It took some time to figure out how to get it to work, so I wanted to share the solution. The sort man page wasn't much help, because it talks about field delimeters (default space... (1 Reply)
Discussion started by: CheeseHead1
1 Replies

6. Shell Programming and Scripting

sort a file which has 3.7 million records

hi, I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated... sort: Write error while merging. Thanks (6 Replies)
Discussion started by: greenworld
6 Replies

7. Shell Programming and Scripting

Based on num of records in file1 need to check records in file2 to set some condns

Hi All, I have two files say file1 and file2. I want to check the number of records in file1 and if its atleast 2 (i.e., 2 or greater than 2 ) then I have to check records in file2 .If records in file2 is atleast 1 (i.e. if its not empty ) i have to set some conditions . Could you pls... (3 Replies)
Discussion started by: mavesum
3 Replies

8. Shell Programming and Scripting

Sort & Split records in a file

Hi, I am new to scripting. I need a script to sort and the records in a file and then split them into different files. For example, the file is: H1...................... H2...................... D2.................... D2.................... H1........................... (15 Replies)
Discussion started by: Sunitha_edi82
15 Replies

9. Shell Programming and Scripting

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (19 Replies)
Discussion started by: svenkatareddy
19 Replies

10. Solaris

How to remove duplicate records with out sort

Can any one give me command How to delete duplicate records with out sort. Suppose if the records like below: 345,bcd,789 123,abc,456 234,abc,456 712,bcd,789 out tput should be 345,bcd,789 123,abc,456 Key for the records is 2nd and 3rd fields.fields are seperated by colon(,). (2 Replies)
Discussion started by: svenkatareddy
2 Replies
Login or Register to Ask a Question