12-23-2008
Finding duplicates from positioned substring across lines
I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.
Eg. data...
AAAA00000000000000XXXX0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
output:
Duplicates are found for XXXY.
I'm new to unix scripting. Can anyone provide me direction?
~GAP
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports.
I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
Discussion started by: dangral
3 Replies
2. Shell Programming and Scripting
hii,
i want to know the shell command for finding the last occurance of a substring in string..
i can use grep command or sed to find out the occurance of a substring in a string but how do i find out the last occurance.shud i use grep amd and cut the string everytime and store it in a new... (7 Replies)
Discussion started by: cutelucks
7 Replies
3. Shell Programming and Scripting
I am trying to figure out how to scan a file like so:
1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims... (17 Replies)
Discussion started by: totus
17 Replies
4. Shell Programming and Scripting
I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention:
YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT
What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies
5. Shell Programming and Scripting
Input:
a
b
b
c
d
d
I need:
a
c
I know how to get this (the lines that have duplicates) :
b
d
sort file | uniq -d
But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies
6. Shell Programming and Scripting
I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far:
function dups_filenames
{
file2=""
file1=""
file=""
dn=""
ch=""
pn=""
... (6 Replies)
Discussion started by: chipblah84
6 Replies
7. Shell Programming and Scripting
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Discussion started by: baskivs
2 Replies
8. UNIX for Dummies Questions & Answers
Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again.
I need to:
For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies
9. Shell Programming and Scripting
I have unix file like below
>newuser
newuser
<hello
hello
newone
I want to find the unique values in the file(excluding <,>),so that the out put should be
>newuser
<hello
newone
can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies
10. UNIX for Beginners Questions & Answers
I have a text file that has some data like:
PADHOGOA1 IOP055_VINREG5_1 ( .IO(VINREG5_1), .MONI(), .MON_D(px_IOP055_VINREG5_1_MON_D), .R0T(px_IOP054_VINREG5_0_R0T), .IO1() );
PADV30MA0 IOP056_VOUT3_IN ( .IO(VOUT3_IN), .V30M(px_IOP056_VOUT3_IN_V30M));
PADV30MA0 IOP057_VOUT3_OUT (... (2 Replies)
Discussion started by: utkarshkhanna44
2 Replies
UNIQ(1) BSD General Commands Manual UNIQ(1)
NAME
uniq -- report or filter out repeated lines in a file
SYNOPSIS
uniq [-cdu] [-f fields] [-s chars] [input_file [output_file]]
DESCRIPTION
The uniq utility reads the standard input comparing adjacent lines, and writes a copy of each unique input line to the standard output. The
second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are
not adjacent, so it may be necessary to sort the files first.
The following options are available:
-c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.
-d Don't output lines that are not repeated in the input.
-f fields
Ignore the first fields in each input line when doing comparisons. A field is a string of non-blank characters separated from adja-
cent fields by blanks. Field numbers are one based, i.e. the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the
first chars characters after the first fields fields will be ignored. Character numbers are one based, i.e. the first character is
character one.
-u Don't output lines that are repeated in the input.
If additional arguments are specified on the command line, the first such argument is used as the name of an input file, the second is used
as the name of an output file.
The uniq utility exits 0 on success, and >0 if an error occurs.
COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation.
SEE ALSO
sort(1)
STANDARDS
The uniq utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible.
BSD
January 6, 2007 BSD