Sponsored Content
Top Forums Shell Programming and Scripting Find All duplicates based on multiple keys Post 302871601 by unme on Wednesday 6th of November 2013 12:28:53 PM
Old 11-06-2013
Find All duplicates based on multiple keys

Hi All,

Input.txt
Code:
123,ABC,XYZ1,A01,IND,I68,IND,NN
123,ABC,XYZ1,A01,IND,I67,IND,NN
998,SGR,St,R834,scot,R834,scot,NN
985,SGR0399,St,R180,T15,R180,T1,YY
985,SGR0399,St,R180,T15,R180,T1,NN
985,SGR0399,St,R180,T15,R180,T1,NN
2943,SGR?99,St,R68,Scot,R77,Scot,YY
2943,SGR?99,St,R68,Scot,R77,scot,NN

dup.txt
Code:
985,SGR0399,St,R180,T15,R180,T1,YY
985,SGR0399,St,R180,T15,R180,T1,NN
985,SGR0399,St,R180,T15,R180,T1,NN
2943,SGR?99,St,R68,Scot,R77,Scot,YY
2943,SGR?99,St,R68,Scot,R77,scot,NN

uniq.txt
Code:
123,ABC,XYZ1,A01,IND,I68,IND,NN
123,ABC,XYZ1,A01,IND,I67,IND,NN
998,SGR,St,R834,scot,R834,scot,NN

Scenario is to find duplicates based on 4 columns (1,2,4,6), I've tried below code but all records were moving to uniq.txt. Could you please assit me.
Code:
sort -t, -k1 -k2 -k4 -k6 Input.txt|awk -F, '{
a[$1]++
b[$2]++
c[$4]++
d[$6]++
y[NR] = $0
} END {
for(i=1; i<=NR; i++)
{
tmp = y[i]
split(tmp,z)
print tmp> (((a[z[1]] && b[z[1]] && c[z[1]] && d[z[1]] )>1) ? "dup.txt" : "uniq.txt")
}
}'

Thanks in advance,
U

Last edited by vbe; 11-06-2013 at 01:34 PM.. Reason: mssing / hehe
 

10 More Discussions You Might Find Interesting

1. Programming

marge tow files based on keys

how can i marge two files depend som key for example: the first file include many records of information for X person and the second file have one record of information for each X person shortly i want to mak first :match between the two files then insert data from the second to the first... (2 Replies)
Discussion started by: Ehab
2 Replies

2. Shell Programming and Scripting

removing duplicates based on key

HI I am having a file like this 1234 12345678 1234567890123 4321 43215678 432156789028433435 I want to get ouput as 1234567890123 432156789028433435 based on key position 1-4 I am using ksh can anyone give me an idea Thanks pukars (1 Reply)
Discussion started by: pukars4u
1 Replies

3. UNIX for Dummies Questions & Answers

Joining files based on multiple keys

I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon. For example, File1 Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5 ---- data delimited by semicolon --- ... (1 Reply)
Discussion started by: Sebben
1 Replies

4. Shell Programming and Scripting

select values based on keys

HI The input 1st column has specific keys like 1 with value a,b and c. 2 with b,b,d and 3 with a,a a. when ever c appears as one of the values the result will be key ........ c (You can see in the out put as 1 w...... 6.... c) and same follows for d. Thanx:) I'm learning awk scripting. If... (3 Replies)
Discussion started by: repinementer
3 Replies

5. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies

6. Shell Programming and Scripting

Sorting problem: Multiple delimiters, multiple keys

Hello If you wanted to sort a .csv file that was filled with lines like this: <Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr> (H : , M, S: ) by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys,... (20 Replies)
Discussion started by: Ryan.
20 Replies

7. UNIX for Dummies Questions & Answers

Removing duplicates based on key

Hi, I have the input file with the below data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 I need to remove the duplicates based on the first field only. I need the output like: 12345|12|34 3456|12|90 15670|12|13 The first field needs to be unique . (4 Replies)
Discussion started by: pandeesh
4 Replies

8. Shell Programming and Scripting

Remove duplicates based on a field's value

Hi All, I have a text file with three columns. I would like a simple script that removes lines in which column 1 has duplicate entries, but use the largest value in column 3 to decide which one to keep. For example: Input file: 12345a rerere.rerere len=23 11111c fsdfdf.dfsdfdsf len=33 ... (3 Replies)
Discussion started by: anniecarv
3 Replies

9. Shell Programming and Scripting

Looping in Perl based on defined keys in Map

Hello All, I am writing the below script where it will connect to database and returns the results. #!/sw/gcm/perl510/bin/perl use SybaseC; &openConnection; &loadvalues; sub openConnection { $dbproc = new SybaseC(SYDB}, $ENV{DBDFLTUSR}, $ENV{DBDFLTPWD}); if... (2 Replies)
Discussion started by: filter
2 Replies

10. Shell Programming and Scripting

Combine multiple rows based on selected column keys

Hello I want to collapse a file with multiple rows into consolidated lines of entries based on selected columns as the 'key'. Example: 1 2 3 Abc def ghi 1 2 3 jkl mno p qrts 6 9 0 mno def Abc 7 8 4 Abc mno mno abc 7 8 9 mno mno abc 7 8 9 mno j k So if columns 1, 2 and 3 are... (6 Replies)
Discussion started by: linuxlearner123
6 Replies
X2SYS_REPORT(1gmt)					       Generic Mapping Tools						X2SYS_REPORT(1gmt)

NAME
x2sys_report - Report statistics from crossover data base SYNOPSIS
x2sys_report -Ccolumn -TTAG [ coedbase.txt ] [ -A ] [ -I[list] ] [ -L[corrtable] ] [ -Nnx_min ] [ -Qe|i ] [ -Rwest/east/south/north[r] ] [ -Strack ] [ -V ] DESCRIPTION
x2sys_report will read the input crossover ASCII data base coedbase.txt (or stdin) and report on the statistics of crossovers (n, mean, stdev, rms, weight) for each track. Options are available to let you exclude tracks and limit the output. -C Specify which data column you want to process. Crossovers related to this column name must be present in the crossover data base. -T Specify the x2sys TAG which tracks the attributes of this data type. OPTIONS
No space between the option flag and the associated arguments. coedbase.txt The name of the input ASCII crossover error data base as produced by x2sys_cross. If not given we read standard input instead. -A Eliminate COEs by distributing the COE between the two tracks in proportion to track weight and producing (dist, adjustment) spline knots files for each track (for the selected column). Such adjustments may be used by x2sys_datalist. The adjustment files are called track.column.adj and are placed in the $X2SYS_HOME/TAG directory. For background information on how these adjustments are designed, see Mittal [1984]. -I Name of ASCII file with a list of track names (one per record) that should be excluded from consideration [Default includes all tracks]. -L Apply optimal corrections to the chosen observable. Append the correction table to use [Default uses the correction table TAG_cor- rections.txt which is expected to reside in the $X2SYS_HOME/TAG directory]. For the format of this file, see x2sys_solve. -N Only report data from tracks involved in at least nx_min crossovers [all tracks]. -Q Append e for external crossovers or i for internal crossovers only [Default is external]. -R west, east, south, and north specify the Region of interest, and you may specify them in decimal degrees or in [+-]dd:mm[:ss.xxx][W|E|S|N] format. Append r if lower left and upper right map coordinates are given instead of w/e/s/n. The two shorthands -Rg and -Rd stand for global domain (0/360 and -180/+180 in longitude respectively, with -90/+90 in latitude). Alterna- tively, specify the name of an existing grid file and the -R settings (and grid spacing, if applicable) are copied from the grid. For Cartesian data just give xmin/xmax/ymin/ymax. This option bases the statistics on those COE that fall inside the specified domain. -S Name of a single track. If given we restrict output to those crossovers involving this track [Default output is crossovers involv- ing any track pair]. -V Selects verbose mode, which will send progress reports to stderr [Default runs "silently"]. EXAMPLES
To report statistics of all the external magnetic crossovers associated with the tag MGD77 from the file COE_data.txt, restricted to occupy a certain region in the south Pacific, try x2sys_report COE_data.txt -V -TMGD77 -R180/240/-60/-30 -Cmag > mag_report.txt To report on the faa crossovers globally that involves track 12345678, try x2sys_report COE_data.txt -V -TMGD77 -Cfaa -S12345678 > faa_report.txt REFERENCES
Mittal, P. K.(1984), Algorithm for error adjustment of potential field data along a survey network, Geophysics, 49(4), 467-469. SEE ALSO
x2sys_binlist(1) x2sys_cross(1) x2sys_datalist(1) x2sys_get(1) x2sys_init(1) x2sys_list(1) x2sys_put(1) x2sys_solve(1) GMT 4.5.7 15 Jul 2011 X2SYS_REPORT(1gmt)
All times are GMT -4. The time now is 07:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy