Sponsored Content
Top Forums Shell Programming and Scripting Removing duplicates [sort , uniq] Post 302080024 by sharatz83 on Friday 14th of July 2006 04:06:09 PM
Old 07-14-2006
Removing duplicates [sort , uniq]

Hey Guys,
I have file which looks like this,

Contig201#numbPA
Contig1452#nmdynD6PA
dm022p15.r#CG6461PA
dm005e16.f#SpatPA
IGU001_0015_A06.f#CG17593PA

I need to remove duplicates based on the chracter matching upto '#'.

for example if we consider this..

Contig201#numbPA
Contig1452#nmdynD6PA
Contig201#numbPA
Contig1452#nmdynD6PA
dm022px15.r#CG6461PA
Contig201#Xg123
Contig1452#Xg345
dm022px15.r#PA12sy
Contig430#Pp345

I need the output to be ...

Contig201#numbPA
Contig1452#nmdynD6PA
dm022px15.r#CG6461PA
Contig430#Pp345

I know I could have used [sort infile | uniq -w9 > outfile] if the characters upto '#' were of the same length.

peace
sharat
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

sort/uniq

I have a file: Fred Fred Fred Jim Fred Jim Jim If sort is executed on the listed file, shouldn't the output be?: Fred Fred Fred Fred Jim Jim Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies

2. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

4. Shell Programming and Scripting

Help with Uniq and sort

The key is first field i want only uniq record for the first field in file. I want the output as or output as Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies

5. Shell Programming and Scripting

sort | uniq question

Hello, I have a large data file: 1234 8888 bbb 2745 8888 bbb 9489 8888 bbb 1234 8888 aaa 4838 8888 aaa 3977 8888 aaa I need to remove duplicate lines (where the first column is the duplicate). I have been using: sort file.txt | uniq -w4 > newfile.txt However, it seems to keep the... (11 Replies)
Discussion started by: palex
11 Replies

6. Shell Programming and Scripting

Sort and uniq after comparision

Hi All, I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date. ... (1 Reply)
Discussion started by: nua7
1 Replies

7. Shell Programming and Scripting

Sort field and uniq

I have a flatfile A.txt 2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12 2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11 How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies

8. Shell Programming and Scripting

Uniq or sort -u or similar only between { }

Hi ! I am trying to remove doubbled entrys in a textfile only between delimiters. Like that example but i dont know how to do that with sort or similar. input: { aaa aaa } { aaa aaa } output: { aaa } { (8 Replies)
Discussion started by: fugitivus
8 Replies

9. UNIX for Dummies Questions & Answers

Uniq and sort -u

Hello all, Need to pick your brains, I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters. So I want to do a sort -u file Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies

10. Shell Programming and Scripting

Sort & Uniq -u

Hi All, Below the actual file which i like to sort and Uniq -u /opt/oracle/work/Antony/Shell_Script> cat emp.1st 2233|a.k. shukula |g.m. |sales |12/12/52 |6000 1006|chanchal singhvi |director |sales |03/09/38 |6700... (8 Replies)
Discussion started by: Antony Ankrose
8 Replies
UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME
uniq -- report or filter out repeated lines in a file SYNOPSIS
uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]] DESCRIPTION
The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file. If input_file is a single dash ('-') or absent, the standard input is read. If output_file is absent, standard output is used for output. The second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first. The following options are available: -c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space. -d Only output lines that are repeated in the input. -f num Ignore the first num fields in each input line when doing comparisons. A field is a string of non-blank characters separated from adjacent fields by blanks. Field numbers are one based, i.e., the first field is field one. -s chars Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the first chars characters after the first num fields will be ignored. Character numbers are one based, i.e., the first character is character one. -u Only output lines that are not repeated in the input. -i Case insensitive comparison of lines. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of uniq as described in environ(7). EXIT STATUS
The uniq utility exits 0 on success, and >0 if an error occurs. COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation. SEE ALSO
sort(1) STANDARDS
The uniq utility conforms to IEEE Std 1003.1-2001 (``POSIX.1'') as amended by Cor. 1-2002. HISTORY
A uniq command appeared in Version 3 AT&T UNIX. BSD
December 17, 2009 BSD
All times are GMT -4. The time now is 11:32 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy