Sponsored Content
Top Forums Shell Programming and Scripting Finding duplicates from positioned substring across lines Post 302271063 by gapprasath on Tuesday 23rd of December 2008 06:20:06 PM
Old 12-23-2008
Question Finding duplicates from positioned substring across lines

I have million's of records each containing exactly 50 characters and have to check the uniqueness of 4 character substring of 50 character (postion known prior) and report if any duplicates are found.

Eg. data...

AAAA00000000000000XXXX0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars
AAAA00000000000000XXXY0000 0000000000... upto50 chars

output:
Duplicates are found for XXXY.

I'm new to unix scripting. Can anyone provide me direction?

~GAP
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

finding duplicates with perl

I have a huge file (over 30mb) that I am processing through with perl. I am pulling out a list of filenames and placing it in an array called @reports. I am fine up till here. What I then want to do is go through the array and find any duplicates. If there is a duplicate, output it to the screen.... (3 Replies)
Discussion started by: dangral
3 Replies

2. Shell Programming and Scripting

finding the last substring...

hii, i want to know the shell command for finding the last occurance of a substring in string.. i can use grep command or sed to find out the occurance of a substring in a string but how do i find out the last occurance.shud i use grep amd and cut the string everytime and store it in a new... (7 Replies)
Discussion started by: cutelucks
7 Replies

3. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

4. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies

5. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

6. Shell Programming and Scripting

Help finding non duplicates

I am currently creating a script to find filenames that are listed once in an input file (find non duplicates). I then want to report those single files in another file. Here is the function that I have so far: function dups_filenames { file2="" file1="" file="" dn="" ch="" pn="" ... (6 Replies)
Discussion started by: chipblah84
6 Replies

7. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

8. UNIX for Dummies Questions & Answers

Finding duplicates then copying, almost there, maybe?

Hi everyone. I'm trying to help my wife with a project, she has exported 200 images from many different folders, unfortunately there was a problem with the export and I need to find the master versions so that she doesn't have to go through and select them again. I need to: For each image in... (2 Replies)
Discussion started by: Rhinoskin
2 Replies

9. Shell Programming and Scripting

Finding duplicates in a file excluding specific pattern

I have unix file like below >newuser newuser <hello hello newone I want to find the unique values in the file(excluding <,>),so that the out put should be >newuser <hello newone can any body tell me what is command to get this new file. (7 Replies)
Discussion started by: shiva2985
7 Replies

10. UNIX for Beginners Questions & Answers

Finding a word through substring in a file

I have a text file that has some data like: PADHOGOA1 IOP055_VINREG5_1 ( .IO(VINREG5_1), .MONI(), .MON_D(px_IOP055_VINREG5_1_MON_D), .R0T(px_IOP054_VINREG5_0_R0T), .IO1() ); PADV30MA0 IOP056_VOUT3_IN ( .IO(VOUT3_IN), .V30M(px_IOP056_VOUT3_IN_V30M)); PADV30MA0 IOP057_VOUT3_OUT (... (2 Replies)
Discussion started by: utkarshkhanna44
2 Replies
DOCHECKGROUPS(8)					    InterNetNews Documentation						  DOCHECKGROUPS(8)

NAME
docheckgroups - Process checkgroups and output a list of changes SYNOPSIS
docheckgroups [-u] [include-pattern [exclude-pattern]] DESCRIPTION
docheckgroups is usually run by controlchan in order to process checkgroups control messages. It reads a list of newsgroups along with their descriptions on its standard input. That list should be formatted like the newsgroups(5) file: each line contains the name of a newsgroup followed by one or more tabulations and its description. docheckgroups will only check the presence of newsgroups which match include-pattern (an egrep expression like "^comp..*$" for newsgroups starting with "comp.") and which do not match exclude-pattern (also an egrep expression) except for newsgroups mentioned in the pathetc/localgroups file. This file is also formatted like the newsgroups(5) file and should contain local newsgroups which would otherwise be mentioned for removal. There is no need to put local newsgroups of hierarchies for which no checkgroups control messages are sent, unless you manually process checkgroups texts for them. Lines beginning with a hash sign ("#") are not taken into account in this file. All the newsgroups and descriptions mentioned in pathetc/localgroups are appended to the processed checkgroups. If exclude-pattern is given, include-pattern should also be given before (you can use an empty string ("") if you want to include all the newsgroups). Be that as it may, docheckgroups will only check newsgroups in the top-level hierarchies which are present in the checkgroups. Then, docheckgroups checks the active and newsgroups files and displays on its standard output a list of changes, if any. It does not change anything by default; it only points out what should be changed: o Newsgroups which should be removed (they are in the active file but not in the checkgroups) and the relevant ctlinnd commands to achieve that; o Newsgroups which should be added (they are not in the active file but in the checkgroups) and the relevant ctlinnd commands to achieve that; o Newsgroups which are incorrectly marked as moderated or unmoderated (they are both in the active file and the checkgroups but their status differs) and the relevant ctlinnd commands to fix that; o Descriptions which should be removed (they are in the newsgroups file but not in the checkgroups); o Descriptions which should be added (they are not in the newsgroups file but in the checkgroups). The output of docheckgroups can be fed into mod-active (it will pause the news server, update the active file accordingly, reload it and resume the work of the news server) or into the shell (commands for ctlinnd will be processed one by one). In order to update the newsgroups file, the -u flag must be given to docheckgroups. When processing a checkgroups manually, it is always advisable to first check the raw output of docheckgroups. Then, if everything looks fine, use mod-active and the -u flag. OPTIONS
-u If this flag is given, docheckgroups will update the newsgroups file: it removes obsolete descriptions and adds new ones. It also sorts this file alphabetically and improves its general format (see newsgroups(5) for an explanation of the preferred number of tabulations). EXAMPLES
So as to better understand how docheckgroups works, here are examples with the following active file: a.first 0000000000 0000000001 y a.second.announce 0000000000 0000000001 y a.second.group 0000000000 0000000001 y b.additional 0000000000 0000000001 y b.third 0000000000 0000000001 y c.fourth 0000000000 0000000001 y the following newsgroups file (using tabulations): a.first First group. a.second.announce Announce group. a.second.group Second group. b.third Third group. c.fourth Fourth group. and the following localgroups file (using tabulations): b.additional A local newsgroup I want to keep. The checkgroups we process is in the file test which contains: a.first First group. a.second.announce Announce group. (Moderated) a.second.group Second group. b.third Third group. c.fourth Fourth group. If we run: cat test | docheckgroups docheckgroups will output that a.second.announce is incorrectly marked as unmoderated and that its description is obsolete. Besides, two new descriptions will be mentioned for addition (the new one for a.second.announce and the missing description for b.additional -- it should indeed be in the newsgroups file and not only in localgroups). Now that we have checked the output of docheckgroups and that we agree with the changes, we run it with the -u flag to update the newsgroups file and we redirect the standard output to mod-active to update the active file: cat test | docheckgroups -u | mod-active That's all! Now, suppose we run: cat test | docheckgroups "^c..*$" Nothing is output (indeed, everything is fine for the c.* hierarchy). It would have been similar if the test file had only contained the checkgroups for the c.* hierarchy (docheckgroups would not have checked a.* and b.*, even if they had been in include-pattern). In order to check both a.* and c.*, you can run: cat test | docheckgroups "^a..*$|^c..*$" And if you want to check a.* but not a.second.*, you can run: cat test | docheckgroups "^a..*$" "^a.second..*$" In our example, docheckgroups will then mention a.second.announce and a.second.group for removal since they are in the active file (the same goes for their descriptions). Notwithstanding, if you do want to keep a.second.announce, just add this group to localgroups and docheckgroups will no longer mention it for removal. FILES
pathbin/docheckgroups The Shell script itself used to process checkgroups. pathetc/localgroups The list of local newsgroups along with their descriptions. HISTORY
Documentation written by Julien Elie for InterNetNews. $Id: docheckgroups.pod 8357 2009-02-27 17:56:00Z iulius $ SEE ALSO
active(5), controlchan(8), ctlinnd(8), mod-active(8), newsgroups(5). INN 2.5.2 2009-05-21 DOCHECKGROUPS(8)
All times are GMT -4. The time now is 08:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy