07-14-2006
Removing duplicates [sort , uniq]
Hey Guys,
I have file which looks like this,
Contig201#numbPA
Contig1452#nmdynD6PA
dm022p15.r#CG6461PA
dm005e16.f#SpatPA
IGU001_0015_A06.f#CG17593PA
I need to remove duplicates based on the chracter matching upto '#'.
for example if we consider this..
Contig201#numbPA
Contig1452#nmdynD6PA
Contig201#numbPA
Contig1452#nmdynD6PA
dm022px15.r#CG6461PA
Contig201#Xg123
Contig1452#Xg345
dm022px15.r#PA12sy
Contig430#Pp345
I need the output to be ...
Contig201#numbPA
Contig1452#nmdynD6PA
dm022px15.r#CG6461PA
Contig430#Pp345
I know I could have used [sort infile | uniq -w9 > outfile] if the characters upto '#' were of the same length.
peace
sharat
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a file:
Fred
Fred
Fred
Jim
Fred
Jim
Jim
If sort is executed on the listed file, shouldn't the output be?:
Fred
Fred
Fred
Fred
Jim
Jim
Jim (3 Replies)
Discussion started by: jimmyflip
3 Replies
2. Shell Programming and Scripting
Input File is :
-------------
25060008,0040,03,
25136437,0030,03,
25069457,0040,02,
80303438,0014,03,1st
80321837,0009,03,1st
80321977,0009,03,1st
80341345,0007,03,1st
84176527,0047,03,1st
84176527,0047,03,
20000735,0018,03,1st
25060008,0040,03,
I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies
3. UNIX for Dummies Questions & Answers
Hello experts,
I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter
sort -u -k 2,2 File.csv > Output.csv
File.csv
File Name|Document Name|Document Title|Organization
Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies
4. Shell Programming and Scripting
The key is first field i want only uniq record for the first field in file.
I want the output as
or output as
Appreciate help on this (4 Replies)
Discussion started by: pinnacle
4 Replies
5. Shell Programming and Scripting
Hello,
I have a large data file:
1234 8888 bbb
2745 8888 bbb
9489 8888 bbb
1234 8888 aaa
4838 8888 aaa
3977 8888 aaa
I need to remove duplicate lines (where the first column is the duplicate). I have been using:
sort file.txt | uniq -w4 > newfile.txt
However, it seems to keep the... (11 Replies)
Discussion started by: palex
11 Replies
6. Shell Programming and Scripting
Hi All,
I have a text file with the format shown below. Some of the records are duplicated with the only exception being date (Field 15). I want to compare all duplicate records using subscriber number (field 7) and keep only those records with greater date.
... (1 Reply)
Discussion started by: nua7
1 Replies
7. Shell Programming and Scripting
I have a flatfile A.txt
2012/12/04 14:06:07 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:07:22 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 17:13:27 |trees|Boards 2, 3|denver|mekong|mekong12
2012/12/04 14:07:39 |rain|Boards 1|tampa|merced|merced11
How do i sort and get... (3 Replies)
Discussion started by: sabercats
3 Replies
8. Shell Programming and Scripting
Hi !
I am trying to remove doubbled entrys in a textfile only between delimiters.
Like that example but i dont know how to do that with sort or similar.
input:
{
aaa
aaa
}
{
aaa
aaa
}
output:
{
aaa
}
{ (8 Replies)
Discussion started by: fugitivus
8 Replies
9. UNIX for Dummies Questions & Answers
Hello all,
Need to pick your brains,
I have a 10Gb file where each row is a name, I am expecting about 50 names in total. So there are a lot of repetitions in clusters.
So I want to do a
sort -u file
Will it be considerably faster or slower to use a uniq before piping it to sort... (3 Replies)
Discussion started by: senhia83
3 Replies
10. Shell Programming and Scripting
Hi All,
Below the actual file which i like to sort and Uniq -u
/opt/oracle/work/Antony/Shell_Script> cat emp.1st
2233|a.k. shukula |g.m. |sales |12/12/52 |6000
1006|chanchal singhvi |director |sales |03/09/38 |6700... (8 Replies)
Discussion started by: Antony Ankrose
8 Replies
UNIQ(1) User Commands UNIQ(1)
NAME
uniq - report or omit repeated lines
SYNOPSIS
uniq [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
With no options, matching lines are merged to the first occurrence.
Mandatory arguments to long options are mandatory for short options too.
-c, --count
prefix lines by the number of occurrences
-d, --repeated
only print duplicate lines, one for each group
-D, --all-repeated[=METHOD]
print all duplicate lines groups can be delimited with an empty line METHOD={none(default),prepend,separate}
-f, --skip-fields=N
avoid comparing the first N fields
--group[=METHOD]
show all items, separating groups with an empty line METHOD={separate(default),prepend,append,both}
-i, --ignore-case
ignore differences in case when comparing
-s, --skip-chars=N
avoid comparing the first N characters
-u, --unique
only print unique lines
-z, --zero-terminated
end lines with 0 byte, not newline
-w, --check-chars=N
compare no more than N characters in lines
--help display this help and exit
--version
output version information and exit
A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars.
Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without
'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.
GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report uniq translation bugs to <http://translationproject.org/team/>
AUTHOR
Written by Richard M. Stallman and David MacKenzie.
COPYRIGHT
Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
SEE ALSO
comm(1), join(1), sort(1)
The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the
command
info coreutils 'uniq invocation'
should give you access to the complete manual.
GNU coreutils 8.22 June 2014 UNIQ(1)