Sponsored Content
Top Forums Shell Programming and Scripting Extract and exclude rows based on duplicate values Post 303000174 by RudiC on Wednesday 5th of July 2017 05:51:15 PM
Old 07-05-2017
Try also
Code:
sort -t"|" -k2,2 file | uniq -s4 -w8 -D
abc|NN190486|qqq
def|NN190486|www
jkl|NN607265|rrr
mno|NN607265|ttt
stu|NN615002|uuu
vwx|NN615002|iii
BCD|NN628717|ppp
EFG|NN628717|aaa
HIJ|NN628717|sss
sort -t"|" -k2,2 file | uniq -s4 -w8 -u
ghi|NN603762|eee
pqr|NN613879|yyy
yzA|NN618555|ooo

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract duplicate fields in rows

I have a input file with formating: 6000000901 ;36200103 ;h3a01f496 ; 2000123605 ;36218982 ;heefa1328 ; 2000273132 ;36246985 ;h08c5cb71 ; 2000041207 ;36246985 ;heef75497 ; Each fields is seperated by semi-comma. Sometime, the second files is... (6 Replies)
Discussion started by: anhtt
6 Replies

2. Shell Programming and Scripting

How to extract duplicate rows

I have searched the internet for duplicate row extracting. All I have seen is extracting good rows or eliminating duplicate rows. How do I extract duplicate rows from a flat file in unix. I'm using Korn shell on HP Unix. For.eg. FlatFile.txt ======== 123:456:678 123:456:678 123:456:876... (5 Replies)
Discussion started by: bobbygsk
5 Replies

3. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I want to duplicate a row if found two or more values in a particular column for corresponding row which is delimitted by comma. Input abc,line one,value1 abc,line two, value1, value2 abc,line three,value1 needs to converted to abc,line one,value1 abc,line two, value1 abc,line... (8 Replies)
Discussion started by: Incrediblian
8 Replies

4. UNIX for Dummies Questions & Answers

forming duplicate rows based on value of a key

if the key (A or B or ...others) has 4 in its 3rd column the 1st A row has to form 4 dupicates along with the all the values of A in 4th column (2.9, 3.8, 4.2) . Hope I explain the question clearly. Cheers Ruby input "A" 1 4 2.9 "A" 2 5 ... (7 Replies)
Discussion started by: ruby_sgp
7 Replies

5. Shell Programming and Scripting

How to extract duplicate rows

Hi! I have a file as below: line1 line2 line2 line3 line3 line3 line4 line4 line4 line4 I would like to extract duplicate lines (not unique, triplicate or quadruplicate lines). Output will be as below: line2 line2 I would appreciate if anyone can help. Thanks. (4 Replies)
Discussion started by: chromatin
4 Replies

6. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

7. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

8. Shell Programming and Scripting

Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed. example of my file... (3 Replies)
Discussion started by: clarissab
3 Replies

9. Shell Programming and Scripting

Average values of duplicate rows

I have this file input.txt. I want to take average column-wise for the rows having duplicate gene names. Gene Sample_1 Sample_2 Sample_3 gene_A 2 4 5 gene_B 1 2 3 gene_A 0 5 7 gene_B 4 5 6 gene_A 11 12 13 gene_C 2 3 4 Desired output: gene_A 4.3 7 8.3 gene_B 2.5 3.5 4.5 gene_C 2 3 4... (6 Replies)
Discussion started by: Sanchari
6 Replies

10. Shell Programming and Scripting

Extract duplicate rows with conditions

Gents Can you help please. Input file 5490921425 1 7 1310342 54909214251 5490921425 2 1 1 54909214252 5491120937 1 1 3 54911209371 5491120937 3 1 1 54911209373 5491320785 1 ... (4 Replies)
Discussion started by: jiam912
4 Replies
UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME
uniq -- report or filter out repeated lines in a file SYNOPSIS
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]] DESCRIPTION
The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first. The following options are available: -c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space. -d Only output lines that are repeated in the input. -f num Ignore the first num fields in each input line when doing comparisons. A field is a string of non-blank characters separated from adjacent fields by blanks. Field numbers are one based, i.e., the first field is field one. -s chars Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the first chars characters after the first num fields will be ignored. Character numbers are one based, i.e., the first character is character one. -u Only output lines that are not repeated in the input. -i Case insensitive comparison of lines. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of uniq as described in environ(7). EXIT STATUS
The uniq utility exits 0 on success, and >0 if an error occurs. COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation. SEE ALSO
sort(1) STANDARDS
The uniq utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002. HISTORY
A uniq command appeared in Version 3 AT&T UNIX. BSD
December 17, 2009 BSD
All times are GMT -4. The time now is 12:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy