sort and dedup problem


 
Thread Tools Search this Thread
Operating Systems AIX sort and dedup problem
# 1  
Old 10-21-2008
sort and dedup problem

I have a file with contents:

1|4|oho hosfadu|
1|3|sdfsd fds|
2|2|sdfg|
2|1|sdf a|
3|5|ouhuh hu|

I would like to do three things to it;
1- first, sort it on the first two fields
2- get a unique count on the first field
3- and write the first two unique rows (uniqueness based off the 1st field) to a file:

Step 1:
1|3|sdfsd fds|
1|4|oho hosfadu|
2|1|sdf a|
2|2|sdfg|
3|5|ouhuh hu|

Step 2:
unique count = 3

Step 3:
1|3|sdfsd fds|
1|4|oho hosfadu|
2|1|sdf a|
2|2|sdfg|



Thanks,

- CB
# 2  
Old 10-21-2008
#3 is not clear - the output is not unique, or did you mean "output all rows that match the first two unique numbers in field 1"?
# 3  
Old 10-21-2008
Sorry, let me clarify #3:

write the first two unique ids (first column) to a file.

So that would be:

1|3|sdfsd fds|
1|4|oho hosfadu|
2|1|sdf a|
2|2|sdfg|
# 4  
Old 10-21-2008
This is not fine tuned code
Code:
#!/bin/ksh
echo "Step 1"
sort -n -t'|'  -k1.1,1.4 -k 2.1,2.4 inputfile > outputfile
echo "Step 2: unique count \c"
awk -F'|' '!arr[$1]++' outputfile > tmp.tmp
cat tmp.tmp | wc -l             # not a UUOC done on purpose
set -A arr $( head -2 tmp.tmp | tr -s '\n' ' ')
echo "Step 3:"
grep -e ''^${arr[0]}"  -e "^$arr[1]}"  outputfile

# 5  
Old 10-21-2008
Thanks... can you explain your solution to step# 3. Another thing is that I chose '2' unique ids to be written to a file as an example. That number could change (will be passed through a parameter).

So if this is hard-coded, could you provide an option where I can use a parameter?

Thanks,

- CB
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Sort problem!

Hi, I have a file having content: 123 123 1234 12131 121 23 1212 1212121 23421 1212 1213123 I want to remove the repeated lines from it, i.e. I just want the any number just one time without any sorting in it. The problem is that I am not getting result from 'uniq' command. as... (2 Replies)
Discussion started by: nixhead
2 Replies

2. Shell Programming and Scripting

Dedup a large file(30M rows)

Hi, I have a large file with number of records in there. I need some help to find only first row based on a key and ignore other rows with the same key. I tried few things but file is huge(30 million rows). So need some solution that is very efficient. e.g Junk|Apple|7|Random|data|here...... (2 Replies)
Discussion started by: ran123
2 Replies

3. UNIX for Advanced & Expert Users

Problem with sort +4

Apologies if this should be in 'unix for dummies' thread.. I have a large file containing records like this: 16 Feb 02:49 s_A123_ctas_log.20100216024000.bin 26 Feb 02:55 s_B123_ctas_log.20100226024000.bin 05 Mar 05:22 s_A127_ctas_log.20100305024000.bin I want to sort it by column 4... (2 Replies)
Discussion started by: Grueben
2 Replies

4. Shell Programming and Scripting

problem with sort

Hi all, i want to sort by the (1-8) columns and (9-7) columns: my file: MARTINEZ---PAUL --DUPOND---EDDY --DURANDJACQUES --DUPOND--ALAIN output: --DUPOND--ALAIN --DUPOND---EDDY --DURANDJACQUES MARTINEZ---PAUL (6 Replies)
Discussion started by: saw7
6 Replies

5. Programming

sort problem

I am in need of some direction. First off I want to admit this is an assignment but I have hit a block. I need to sort, by the number of times a string occurs (count), and output the top 10. I have found what number gives me the top 10 so from there I need to know how to sort them. Any... (1 Reply)
Discussion started by: Cn00b
1 Replies

6. Shell Programming and Scripting

sort problem

I have file (srv_lst) with the contents as ... 9.2 IRMD115 8.1 IRMD115 and I am using the sort as to get the bigger version as : sort -r -u +1 $srv_lst | sort -k 1,1r and the output is 9.2 which is good .. if I have the contents of file srv_lst as : 9.2 IRMD115 10.2 IRMD115 ... (4 Replies)
Discussion started by: talashil
4 Replies

7. Shell Programming and Scripting

how to use awk to sort this problem out

there has several numbers which are:1,2,3,45,6,7,8,9,0,10,34,34,54,122,6756,54,87,99,2,1,45; how to write a shell script orts the above numbers into descending order and puts them into and arrray and also find and prints the minimum and maximum of those numbers, and finds and prints the average... (4 Replies)
Discussion started by: sonicstage
4 Replies

8. UNIX for Dummies Questions & Answers

SORT problem on SUN

Hello, I tried to sort on column2 followed by column1 and notice how the "updated" value in column1 is not sorted correctly! Can you tell me if i have the sort statement setup correctly please, thanks much! sort -t "|" -k2 -k1 sortin > sortout ... (2 Replies)
Discussion started by: bobk544
2 Replies

9. UNIX for Dummies Questions & Answers

Problem with sort

I am attempting to sort a file using the following command: sort +0 -t"|" infilename > outfilename I am getting the following error: sort: 0653-657 A write error occurred while sorting. The file size is 15036274 bytes This is an AIX 5.2 version I believe this is a problem with the... (1 Reply)
Discussion started by: jyoung
1 Replies
Login or Register to Ask a Question