finding duplicates in columns and removing lines Post: 302189023

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing lines that are (same in content) based on columns

I have a file which looks like AA BB CC DD EE FF GG HH KK AA BB GG HH KK FF CC DD EE AA BB CC DD EE UU VV XX ZZ AA BB VV XX ZZ UU CC DD EE .... I want the script to give me only one line based on duplicate contents: AA BB CC DD EE FF GG HH KK AA BB CC DD EE UU VV XX ZZ

2. Shell Programming and Scripting

Help removing lines with duplicated columns

Hi Guys... Please Could you help me with the following ? aaaa bbbb cccc sdsd aaaa bbbb cccc qwer as you can see, the 2 lines are matched in three fields... how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ? Thanks

3. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found. Eg. data... AAAA00000000000000XXXX0000 0000000000... upto50 chars...

4. Shell Programming and Scripting

Removing duplicates from string (not duplicate lines)

please help me in getting following: Input Desired output x="foo" foo x="foo foo" foo x="foo foo" foo x="foo abc foo" foo abc x="foo foo1 foo2" foo foo1 foo2 I need to remove duplicated from string..

5. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ...

6. Shell Programming and Scripting

Help in removing duplicates

I have an input file abc.txt with info like: abcd rateuse inklite robet rateuse abcd I need to remove duplicates from the file (eg: abcd,rateuse) from the file and need to place the contents in same file abc.txt if needed can be placed in another file. can anyone help me in this :(

7. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ...

8. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Hi, I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns.. i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which...

9. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one...

10. Shell Programming and Scripting

Removing carriage returns from multiple lines in multiple files of different number of columns

Hello Gurus, I have a multiple pipe separated files which have records going over multiple Lines. End of line separator is \n and records going over multiple lines have <CR> as separator. below is example from one file. 1|ABC DEF|100|10 2|PQ RS T|200|20 3| UVWXYZ|300|30 4| GHIJKL|400|40...

LEARN ABOUT OSF1

join

join(1) 						      General Commands Manual							   join(1)

NAME

       join - Joins the lines of two files

SYNOPSIS

   Current syntax
       join [-a file_number | -v file_number] [-e string] [-o number.field,...] [-t character] [-1 field] [-2 field] file1 file2

   Obsolescent syntax
       [join] [-a number] [-e string] [-j number | field | number  field] [-o number.field,...] [-t character] file1 file2

       The  join command reads file1 and file2 and joins lines in the files that contain common fields, or otherwise according to the options, and
       writes the results to standard output.

STANDARDS

       Interfaces documented on this reference page conform to industry standards as follows:

       join:  XCU5.0

       Refer to the standards(5) reference page for more information about industry standards and associated tags.

OPTIONS

       Joins on the fieldth field of file1. Fields are decimal integers starting with 1.  Joins on the fieldth field of file2. Fields are  decimal
       integers  starting with 1.  Produces an output line for each unpairable line found in file1 if number is 1, or file2 if number is 2.  With-
       out -a, join produces output only for lines containing a common field.  If both -a 1 and -a 2 are used, all unpairable lines will  be  out-
       put.   Replaces	empty output fields with string.  Joins the two files on field of file number, where number is 1 for file1 or 2 for file2.
       If you do not specify number, join uses field in each file.  Without -j, join uses the first field in each file. The default value for both
       number and field is 1. (Obsolescent)

	      If  you enter only a 1 or a 2 as an argument to -j, join interprets this argument as the file number (number); integers greater than
	      2 are interpreted as the field number (field).  Therefore, if you want to specify a field number of 2, you must precede this  speci-
	      fication	with  a  number argument; otherwise, the join program interprets the 2 as the file number (number).  Produces output lines
	      consisting of the fields specified in one or more number.field arguments, where number is 1 for file1 or 2 for file2, and field is a
	      field  number.   Multiple  -o arguments should be separated with commas.	Uses character (a single character) as the field separator
	      character in the input and the output.  Every appearance of character in a line is significant.  The default separator is  a  space.
	      If you do not specify -t, join also recognizes the tab and newline characters as separators.

	      With  default field separation, the collating sequence is that of sort -b.  If you specify -t, the sequence is that of a plain sort.
	      To specify a tab character, enclose it in '' (single quotes).  Produces an output line  for  each  unpairable  line  in  file_number
	      (where  file_number  is  1 or 2), instead of the default output.	If both -v 1 and -v 2 are specified, produces output lines for all
	      unpairable lines.

OPERANDS

       The pathnames of files to be used as input.  If - (hyphen) is specified for either file, standard input is read.

DESCRIPTION

       The join field is the field in the input files that join looks at to determine what will be included in the output.  One  line  appears	in
       the  output  for  each identical join field appearing in both file1 and file2.  The output line consists of the join field, the rest of the
       line from file1, then the rest of the line from file2.

       Both input files must be sorted according to the collating sequence specified by the LC_COLLATE	environment  variable,	if  set,  for  the
       fields where they are to be joined (usually the first field in each line).

       Fields  are  normally  separated  by a space, a tab character, or a newline character.  In this case, join treats consecutive separators as
       one, and discards leading separators.  Use the -t option to specify another field separator.

EXIT STATUS

       The following exit values are returned: Successful completion.  An error occurred.

EXAMPLES

       Note that the vertical alignment shown in these examples may not be consistent with your output.  To perform a simple join operation on two
       files, phonedir and names, whose first fields are the same, enter: join	phonedir  names

	      If phonedir contains the following telephone directory:

	      Binst	      555-6235 Dickerson       555-1842 Eisner		555-1234 Green		 555-2240 Hrarii	  555-0256 Janatha
	      555-7358 Lewis	       555-3237 Takata		555-5341 Wozni		 555-1234

	      and names is this listing of names and department numbers:

	      Eisner	      Dept. 389 Frost		Dept. 217 Green 	  Dept. 311 Takata	    Dept. 454 Wozni	      Dept. 520

	      then join phonedir names displays: Eisner 	 555-1234	  Dept.  389  Green	       555-2240 	Dept.  311  Takata
	      555-5341	      Dept. 454 Wozni		555-1234	Dept. 520

	      Each line consists of the join field (the last name), followed by the rest of the line found in phonedir and the rest of the line in
	      names.  To display unmatched lines as well as matched lines, enter: join	-a 2  phonedir	names

	      If phonedir contains:

	      Binst	      555-6235 Dickerson       555-1842 Eisner		555-1234 Green		 555-2240 Hrarii	  555-0256 Janatha
	      555-7358 Lewis	       555-3237 Takata		555-5341 Wozni		 555-1234

	      and names contains:

	      Eisner	      Dept. 389 Frost		Dept. 217 Green 	  Dept. 311 Takata	    Dept. 454 Wozni	      Dept. 520

	      then  join  -a  2 phonedir names displays: Eisner 	 555-1234	 Dept. 389 Frost			   Dept. 217 Green
	      555-2240	      Dept. 311 Takata		555-5341	Dept. 454 Wozni 	  555-1234	  Dept. 520

	      This performs the same join operation as in the first example, and also lists the lines of names that have no match in phonedir.	It
	      includes	Frost's  name and department number in the listing, although there is no entry for Frost in phonedir.  To display selected
	      fields, enter: join  -o 2.3,2.1,1.2 phonedir names

	      This displays the following fields:

	      Field 3 of names (Department Number)

	      Field 1 of names (Last Name)

	      Field 2 of phonedir (Telephone Number)

	      If phonedir contains:

	      Binst	      555-6235 Dickerson       555-1842 Eisner		555-1234 Green		 555-2240 Hrarii	  555-0256 Janatha
	      555-7358 Lewis	       555-3237 Takata		555-5341 Wozni		 555-1234

	      and names contains:

	      Eisner	      Dept. 389 Frost		Dept. 217 Green 	  Dept. 311 Takata	    Dept. 454 Wozni	      Dept. 520

	      then  join  -o  2.3,2.1,1.2  phonedir names displays: 389     Eisner  555-1234 311     Green   555-2240 454     Takata  555-5341 520
	      Wozni   555-1234 To perform the join operation on a field other than the first, enter: sort -b -k 2,3 phonedir | join -1 2 - numbers

	      This combines the lines in phonedir and names, comparing the second field of phonedir to the first field of numbers.

	      First, this sorts phonedir by the second field because both files must be sorted by their join fields. The output of  sort  is  then
	      piped  to  join.	The  -	(dash) by itself causes the join command to use this output as its first file. The -1 2 defines the second
	      field of the sorted phonedir as the join field. This is compared to the first field of numbers because its join field is not  speci-
	      fied with a -2 option.

	      If phonedir contains:

	      Binst	      555-6235 Dickerson       555-1842 Eisner		555-1234 Green		 555-2240 Hrarii	  555-0256 Janatha
	      555-7358 Lewis	       555-3237 Takata		555-5341 Wozni		 555-1234

	      and numbers contains:

	      555-0256 555-1234 555-5555 555-7358

	      then sort ... | join ...	displays: 555-0256	  Hrarii 555-1234	 Eisner 555-1234	Wozni 555-7358	      Janatha

	      Each number in numbers is listed with the name listed in phonedir for that number.  Note that join lists all the matches for a given
	      field.   In  this  case, join lists both Eisner and Wozni as having the telephone number 555-1234. The number 555-5555 is not listed
	      because it does not appear in phonedir.

ENVIRONMENT VARIABLES

       The following environment variables affect the execution of join: Provides a default value for the internationalization variables that  are
       unset or null. If LANG is unset or null, the corresponding value from the default locale is used.  If any of the internationalization vari-
       ables contain an invalid setting, the utility behaves as if none of the variables had been defined.  If set to a  non-empty  string  value,
       overrides  the  values of all the other internationalization variables.	Determines the locale for the interpretation of sequences of bytes
       of text data as characters (for example, single-byte as opposed to multi-byte characters in arguments and  input  files).   Determines  the
       locale for the format and contents of diagnostic messages written to standard error.  Determines the location of message catalogues for the
       processing of LC_MESSAGES.

SEE ALSO

       Commands:  awk(1), cmp(1), comm(1), cut(1), diff(1), grep(1), paste(1), sdiff(1), sed(1), sort(1), uniq(1)

       Standards:  standards(5)

																	   join(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing lines that are (same in content) based on columns

Discussion started by: adsforall

2. Shell Programming and Scripting

Help removing lines with duplicated columns

Discussion started by: yahyaaa

3. Shell Programming and Scripting

Finding duplicates from positioned substring across lines

Discussion started by: gapprasath

4. Shell Programming and Scripting

Removing duplicates from string (not duplicate lines)

Discussion started by: vickylife