×
UNIX.COM Login
Username:
Password:  
Show Password






👤


Shell Programming and Scripting

BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Finding duplicates from positioned substring across lines

Tags
finding duplicates

👤 Login to reply

 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-23-2008
gapprasath gapprasath is offline
Registered User
 
Join Date: Dec 2008
Last Activity: 12 February 2009, 6:59 PM EST
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Question Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.

Eg. data...

AAAA00000000000000XXXX0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars

output:
Duplicates are found for XXXY.

I'm new to unix scripting. Can anyone provide me direction?

~GAP
Sponsored Links
    #2  
Old 12-23-2008
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 16 July 2018, 1:19 PM EDT
Location: NM
Posts: 11,438
Thanks: 637
Thanked 1,175 Times in 1,081 Posts


Code:
awk '{ arr[substr($0,50,4))]++ } 
      END { for (i in arr) { if (arr[i]>1) {print arr[i], i}}}' inputfile

Sponsored Links
    #3  
Old 12-24-2008
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
 
Join Date: Jun 2007
Last Activity: 11 November 2016, 3:44 AM EST
Location: Beijing China
Posts: 1,305
Thanks: 0
Thanked 26 Times in 26 Posts


Code:
nawk '{
str=substr($0,19,4)
_[str]++
}
END{
  for(i in _)
    if(_[i]>1)
       print "Duplicated found for "i
}' a.txt

Sponsored Links
👤 Login to reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Finding longest common substring among filenames cmcnorgan Shell Programming and Scripting 1 12-12-2008 07:41 PM
finding duplicates in columns and removing lines totus Shell Programming and Scripting 17 11-29-2008 10:27 AM
duplicates lines with one column different dhanamurthy Shell Programming and Scripting 10 05-07-2008 05:38 AM
finding the last substring... cutelucks Shell Programming and Scripting 7 11-04-2006 05:48 AM
finding duplicates with perl dangral Shell Programming and Scripting 3 01-28-2003 11:50 AM



All times are GMT -4. The time now is 06:29 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.