Home Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Finding duplicates from positioned substring across lines

Tags
duplicates, finding duplicates, shell scripts, substring

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 12-23-2008
Question Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.

Eg. data...

AAAA00000000000000XXXX0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars

output:
Duplicates are found for XXXY.

I'm new to unix scripting. Can anyone provide me direction?

~GAP
# 2  
Old 12-23-2008
Code:
awk '{ arr[substr($0,50,4))]++ } 
      END { for (i in arr) { if (arr[i]>1) {print arr[i], i}}}' inputfile

# 3  
Old 12-24-2008
Code:
nawk '{
str=substr($0,19,4)
_[str]++
}
END{
  for(i in _)
    if(_[i]>1)
       print "Duplicated found for "i
}' a.txt

Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
UNIX scripting for finding duplicates and null records in pk columns praveenraj.1991 Shell Programming and Scripting 5 05-11-2014 04:20 AM
Finding duplicates in a file excluding specific pattern shiva2985 Shell Programming and Scripting 7 06-11-2013 04:09 AM
Finding duplicates then copying, almost there, maybe? Rhinoskin UNIX for Dummies Questions & Answers 2 12-16-2011 12:45 AM
finding duplicates in csv based on key columns baskivs Shell Programming and Scripting 2 11-24-2011 02:28 AM
Help finding non duplicates chipblah84 Shell Programming and Scripting 6 06-03-2011 03:10 AM
How to delete lines in a file that have duplicates or derive the lines that aper once necroman08 Shell Programming and Scripting 3 07-17-2009 05:07 AM
Finding longest common substring among filenames cmcnorgan Shell Programming and Scripting 1 12-12-2008 07:41 PM
finding duplicates in columns and removing lines totus Shell Programming and Scripting 17 11-29-2008 10:27 AM
finding the last substring... cutelucks Shell Programming and Scripting 7 11-04-2006 05:48 AM
finding duplicates with perl dangral Shell Programming and Scripting 3 01-28-2003 11:50 AM


All times are GMT -4. The time now is 01:23 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password