Find All duplicates based on multiple keys Post: 302871601

Sponsored Content

Top Forums Shell Programming and Scripting Find All duplicates based on multiple keys Post 302871601 by unme on Wednesday 6th of November 2013 12:28:53 PM

11-06-2013

Registered User

Find All duplicates based on multiple keys

Hi All,

Input.txt

Code:

123,ABC,XYZ1,A01,IND,I68,IND,NN
123,ABC,XYZ1,A01,IND,I67,IND,NN
998,SGR,St,R834,scot,R834,scot,NN
985,SGR0399,St,R180,T15,R180,T1,YY
985,SGR0399,St,R180,T15,R180,T1,NN
985,SGR0399,St,R180,T15,R180,T1,NN
2943,SGR?99,St,R68,Scot,R77,Scot,YY
2943,SGR?99,St,R68,Scot,R77,scot,NN

dup.txt

Code:

985,SGR0399,St,R180,T15,R180,T1,YY
985,SGR0399,St,R180,T15,R180,T1,NN
985,SGR0399,St,R180,T15,R180,T1,NN
2943,SGR?99,St,R68,Scot,R77,Scot,YY
2943,SGR?99,St,R68,Scot,R77,scot,NN

uniq.txt

Code:

123,ABC,XYZ1,A01,IND,I68,IND,NN
123,ABC,XYZ1,A01,IND,I67,IND,NN
998,SGR,St,R834,scot,R834,scot,NN

Scenario is to find duplicates based on 4 columns (1,2,4,6), I've tried below code but all records were moving to uniq.txt. Could you please assit me.

Code:

sort -t, -k1 -k2 -k4 -k6 Input.txt|awk -F, '{
a[$1]++
b[$2]++
c[$4]++
d[$6]++
y[NR] = $0
} END {
for(i=1; i<=NR; i++)
{
tmp = y[i]
split(tmp,z)
print tmp> (((a[z[1]] && b[z[1]] && c[z[1]] && d[z[1]] )>1) ? "dup.txt" : "uniq.txt")
}
}'

Thanks in advance,
U

Last edited by vbe; 11-06-2013 at 01:34 PM.. Reason: mssing / hehe

unme

View Public Profile for unme

Find all posts by unme

10 More Discussions You Might Find Interesting

1. Programming

marge tow files based on keys

how can i marge two files depend som key for example: the first file include many records of information for X person and the second file have one record of information for each X person shortly i want to mak first :match between the two files then insert data from the second to the first...

2. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars

3. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ...

4. Shell Programming and Scripting

select values based on keys

HI The input 1st column has specific keys like 1 with value a,b and c. 2 with b,b,d and 3 with a,a a. when ever c appears as one of the values the result will be key ........ c (You can see in the out put as 1 w...... 6.... c) and same follows for d. Thanx:) I'm learning awk scripting. If...

5. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ...

6. Shell Programming and Scripting

Sorting problem: Multiple delimiters, multiple keys

Hello If you wanted to sort a .csv file that was filled with lines like this: <Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr> (H : , M, S: ) by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys,...

7. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique .

8. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ...

9. Shell Programming and Scripting

Looping in Perl based on defined keys in Map

Hello All, I am writing the below script where it will connect to database and returns the results. #!/sw/gcm/perl510/bin/perl use SybaseC; &openConnection; &loadvalues; sub openConnection { $dbproc = new SybaseC(SYDB}, $ENV{DBDFLTUSR}, $ENV{DBDFLTPWD}); if...

10. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are...

LEARN ABOUT DEBIAN

x2sys_merge

X2SYS_MERGE(1gmt)					       Generic Mapping Tools						 X2SYS_MERGE(1gmt)

NAME

       x2sys_merge - Merge an updated COEs tables

SYNOPSIS

       x2sys_merge -Amain_COElist.d -Mnew_COElist.d

DESCRIPTION

       x2sys_merge  will read two crossovers data base and output the contents of the main one updated with the COEs in the second one. The second
       file should only contain updated COEs relatively to the first one.  That is, it MUST NOT contain any new  two  tracks  intersections  (This
       point  is  NOT checked in the code). This program is useful when, for any good reason like file editing NAV correction or whatever, one had
       to recompute only the COEs between the edited files and the rest of the database.

       -A     Specify the file main_COElist.d with the main crossover error data base.

       -M     Specify the file new_COElist.d with the newly computed crossover error data base.

OPTIONS

       No space between the option flag and the associated arguments.

       EXAMPLES
	      To update the main COE_data.txt with the new COEs estimations saved in the smaller COE_fresh.txt, try

	      x2sys_merge -ACOE_data.txt -MCOE_fresh.txt > COE_updated.txt

SEE ALSO

       x2sys_binlist(1), x2sys_cross(1), x2sys_datalist(1), x2sys_get(1), x2sys_init(1), x2sys_list(1), x2sys_put(1), x2sys_report(1)

GMT 4.5.7							    15 Jul 2011 						 X2SYS_MERGE(1gmt)

10 More Discussions You Might Find Interesting

1. Programming

marge tow files based on keys

Discussion started by: Ehab

2. Shell Programming and Scripting

removing duplicates based on key

Discussion started by: pukars4u

3. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

Discussion started by: Sebben

4. Shell Programming and Scripting

select values based on keys

Discussion started by: repinementer

5. Shell Programming and Scripting

Sum a column value based on multiple keys

Discussion started by: nirnkv

6. Shell Programming and Scripting

Sorting problem: Multiple delimiters, multiple keys

Discussion started by: Ryan.

7. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Discussion started by: pandeesh

8. Shell Programming and Scripting

Remove duplicates based on a field's value

Discussion started by: anniecarv

9. Shell Programming and Scripting

Looping in Perl based on defined keys in Map

Discussion started by: filter

10. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Discussion started by: linuxlearner123

LEARN ABOUT DEBIAN

x2sys_merge