Discarding records with duplicate fields Post: 303043654

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Records Duplicate

Hi Everyone, I have a flat file of 1000 unique records like following : For eg Andy,Flower,201-987-0000,12/23/01 Andrew,Smith,101-387-3400,11/12/01 Ani,Ross,401-757-8640,10/4/01 Rich,Finny,245-308-0000,2/27/06 Craig,Ford,842-094-8740,1/3/04 . . . . . . Now I want to duplicate...

2. Shell Programming and Scripting

compare fields in a file with duplicate records

Hi: I've been searching the net but didnt find a clue. I have a file in which, for some records, some fields coincide. I want to compare one (or more) of the dissimilar fields and retain the one record that fulfills a certain condition. For example, on this file: 99 TR 1991 5 06 ...

3. Shell Programming and Scripting

combine duplicate records

I have a .DAT file like below 23666483030000653-B94030001OLFXXX000000120081227 23797049900000654-E71060001OLFXXX000000220081227 23699281320000655 E71060002OLFXXX000000320081227 22885068900000652 B86860003OLFXXX592123320081227 22885068900000652 B86860003ODL-SP592123420081227...

4. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Hi, I have a file with these records abc xyz xyz pqr uvw cde cde In my o/p file , I want all the non duplicate rows to be shown. o/p abc pqr uvw Any suggestions how to do this? Thanks for the help. rs

5. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks

6. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create...

7. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles...

8. Shell Programming and Scripting

Remove duplicate records

Hi, i am working on a script that would remove records or lines in a flat file. The only difference in the file is the "NOT NULL" word. Please see below example of the input file. INPUT FILE:> CREATE a ( TRIAL_CLIENT NOT NULL VARCHAR2(60), TRIAL_FUND NOT NULL...

9. Shell Programming and Scripting

Duplicate records

Gents, I have a file which contends duplicate records in column 1, but the values in column 2 are different. 3099753489 3 3099753489 5 3101954341 12 3101954341 14 3102153285 3 3102153285 5 3102153297 3 3102153297 5 I will like to get something like this: output desired...

10. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP...

LEARN ABOUT DEBIAN

tabmerge

TABMERGE(1p)						User Contributed Perl Documentation					      TABMERGE(1p)

NAME

       tabmerge - unify delimited files on common fields

SYNOPSIS

	 tabmerge [action] [options] file1 file2 [...]

       Actions:

	 --min		      Take only fields present in all files [DEFAULT]
	 --max		      Take all fields present
	 -f|--fields=f1[,f2]  Take only the fields mentioned in the
			      comma-separated list

       Options:

	 -l|--list	      List available fields
	 --fs=x 	      Use "x" as the field separator
			      (default is tab "	")
	 --rs=x 	      Use "x" as the record separator
			      (default is newline "
")
	 -s|--sort=f1[,f2]    Sort data ASCII-betically on field(s)
	 --stdout	      Print data in original delimited format
			      (i.e., not in a table format)

	 --help 	      Show brief help and quit
	 --man		      Show full documentation

DESCRIPTION

       This program merges the fields -- not the rows -- of delimited text files.  That is, if several files are almost but not quite entirely
       unlike each other in their structure (in their field names, numbers or orders), this script allows you to easily unify the files into one
       file with all the same fields.  The output can be based on fields as determined by the three "action" flags.

       For the following examples, consider three files that contain the following fields:

	 +------------+---------------------------------+
	 | File       | Fields				|
	 +------------+---------------------------------+
	 | merge1.tab | name, type, position		|
	 | merge2.tab | name, type, position, lod_score |
	 | merge3.tab | name, position			|
	 +------------+---------------------------------+

       To list all available fields in the files and the number of times they are present:

	 $ tabmerge --list merge*
	 +-----------+-------------------+
	 | Field     | No. Times Present |
	 +-----------+-------------------+
	 | lod_score | 1		 |
	 | name      | 3		 |
	 | position  | 3		 |
	 | type      | 2		 |
	 +-----------+-------------------+

       To merge the files on the minimum overlapping fields:

	 $ tabmerge merge*
	 +----------+----------+
	 | name     | position |
	 +----------+----------+
	 | RM104    | 2.30     |
	 | RM105    | 4.5      |
	 | TX5509   | 10.4     |
	 | UU189    | 19.0     |
	 | Xpsm122  | 3.3      |
	 | Xpsr9556 | 4.5      |
	 | DRTL     | 2.30     |
	 | ALTX     | 4.5      |
	 | DWRF     | 10.4     |
	 +----------+----------+

       To merge the files and include all the fields:

	 $ tabmerge --max merge*
	 +-----------+----------+----------+--------+
	 | lod_score | name	| position | type   |
	 +-----------+----------+----------+--------+
	 |	     | RM104	| 2.30	   | RFLP   |
	 |	     | RM105	| 4.5	   | RFLP   |
	 |	     | TX5509	| 10.4	   | AFLP   |
	 | 2.4	     | UU189	| 19.0	   | SSR    |
	 | 1.2	     | Xpsm122	| 3.3	   | Marker |
	 | 1.2	     | Xpsr9556 | 4.5	   | Marker |
	 |	     | DRTL	| 2.30	   |	    |
	 |	     | ALTX	| 4.5	   |	    |
	 |	     | DWRF	| 10.4	   |	    |
	 +-----------+----------+----------+--------+

       To merge and extract just the "name" and "type" fields:

	 $ tabmerge -f name,type merge*
	 +----------+--------+
	 | name     | type   |
	 +----------+--------+
	 | RM104    | RFLP   |
	 | RM105    | RFLP   |
	 | TX5509   | AFLP   |
	 | UU189    | SSR    |
	 | Xpsm122  | Marker |
	 | Xpsr9556 | Marker |
	 | DRTL     |	     |
	 | ALTX     |	     |
	 | DWRF     |	     |
	 +----------+--------+

       To merge the files on just the "name" and "lod_score" fields and sort on the name:

	 $ tabmerge -f name,lod_score -s name merge*
	 +----------+-----------+
	 | name     | lod_score |
	 +----------+-----------+
	 | ALTX     |		|
	 | DRTL     |		|
	 | DWRF     |		|
	 | RM104    |		|
	 | RM105    |		|
	 | TX5509   |		|
	 | UU189    | 2.4	|
	 | Xpsm122  | 1.2	|
	 | Xpsr9556 | 1.2	|
	 +----------+-----------+

       To do the same but mimic the original tab-delimited input:

	 $ tabmerge -f name,lod_score -s name --stdout merge*
	 name	 lod_score
	 ALTX
	 DRTL
	 DWRF
	 RM104
	 RM105
	 TX5509
	 UU189	 2.4
	 Xpsm122 1.2
	 Xpsr9556	 1.2

       Why would you want to do this?  Suppose you have several delimited text files with nearly the same structure and want to create just one
       file from them, but the fields may be in a different order in each file and/or some files may contain more or fewer fields than others.
       (As far-fetched as it may seem, it happens to the author more than he'd like.)

SEE ALSO

       o   Text::RecordParser

       o   Text::TabularDisplay

AUTHOR

       Ken Youens-Clark <kclark@cpan.org>.

LICENSE AND COPYRIGHT

       Copyright (C) 2006-10 Ken Youens-Clark.	All rights reserved.

       This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
       the Free Software Foundation; version 2.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

perl v5.10.1							    2010-07-26							      TABMERGE(1p)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Records Duplicate

Discussion started by: ganesh123

2. Shell Programming and Scripting

compare fields in a file with duplicate records

Discussion started by: rleal

3. Shell Programming and Scripting

combine duplicate records

Discussion started by: kshuser

4. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Discussion started by: rs123

5. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Discussion started by: pandeesh

6. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Discussion started by: machomaddy

7. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

Discussion started by: vestport

8. Shell Programming and Scripting

Remove duplicate records

Discussion started by: reignangel2003

9. Shell Programming and Scripting

Duplicate records

Discussion started by: jiam912

10. Shell Programming and Scripting

Duplicate records

Discussion started by: jiam912

LEARN ABOUT DEBIAN

tabmerge