Script to find duplicate pattern in a file irrespective of case


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script to find duplicate pattern in a file irrespective of case
# 1  
Old 10-12-2012
Script to find duplicate pattern in a file irrespective of case

We have a configuration file in Unix. In that we have entries like below. if it ends with ":", then it is the end of record. We need to find our if there is any duplicate entries like ABCD irrespective of the case.

Code:
ABCD:\
  :conn.retry.stwait=00.00.30:\
  :sess.pnode.max=255:\
  :sess.snode.max=255:\
  :sess.default=1:\
  :comm.info=abcd.nam.nsroot.net;1364:\
  :pacing.send.count=0:

Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 10-12-2012 at 03:01 PM.. Reason: code tags, please!
# 2  
Old 10-12-2012
Many tools have case insensitivity and forward reference in their regex. Regular expressions have many flavors and extensions. For instance, this sed finds such:
Code:
sed -in '
  :loop
  $d
  /\\$/{
    N
    b loop
    }
  /\(:[a-z][^:]*:\).*\1/p
 ' your_in_file

Narrative flow: sed runs in case insensitive mode and with no automatic output, I create a branch target so I can loop, if I hit EOF while collecting a \ continued line, I bail out with a delete, if there is a \ at the end of the line, read another line into the buffer and recheck eof and \ at end of line, grab \(\) each colon+letter+not-colon-however-many-times+colon and see if it occurs later \1, and if any such, print. The letter keeps me from grabbing inter-field areas like colon+\+end-of-line+spaces+colon ':\\\n *:' as a field.

Last edited by DGPickett; 10-12-2012 at 03:46 PM..
This User Gave Thanks to DGPickett For This Post:
# 3  
Old 10-12-2012
This problem seems easier to me in awk than in sed:
Code:
awk -F" *:" '$1=="" {next}
{       if(list[toupper($1)]++)
                printf("%s on line %d has been seen %d times\n",
                        $1, NR, list[toupper($1)])
}' in

In case it isn't obvious what is going on here. This makes the assumption that any line starting with zero or more spaces followed by a colon is a continuation line, and any other line is the 1st line in a configuration record. It converts the 1st field to uppercase and counts how many have been seen with the name in the first field. If more than one has been seen; it reports the name, input line number, and the number of time it has been seen each time it finds a duplicate entry.

If your configuration file has comments on lines starting with a particular string, this script can easily be modified to skip them.

Last edited by Don Cragun; 10-12-2012 at 06:41 PM.. Reason: Add explanation of how it works
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 10-15-2012
Well, a more structured file would help pick the right tool. While awk has more built in orientation toward delimited fields and can work spanning lines with alternate separators, sed is fast and simple, with skills that are very easy to reuse on a wide variety of problems and that work interactively in vi at the :, and here, the separtors here are negative: not-escaped line feeds.
# 5  
Old 10-16-2012
Thanks Don. it works. I would like to ignore spaces or # as the start of a configuration. It should take as the start of the record, only if the record starts with a character/number and line ends with :\

Code:
ABCD:\

---------- Post updated at 12:07 PM ---------- Previous update was at 12:03 PM ----------

Thanks.. i tried the sed command.. but i am getting the below error.

Code:
sed: illegal option -- i


Last edited by Scott; 10-17-2012 at 12:52 PM.. Reason: Code tags
# 6  
Old 10-16-2012
My bad, -i is in-place edit, also very handy. For case insensitive regex, you need the I modifier: sed, a stream editor

Code:
 /regexp/I
 \%regexp%I 
                        The I modifier to regular-expression matching is a GNU extension
                         that causes the regexp to be matched in a case-insensitive manner.
 
 
$ sed -n '
  :loop
  $d
  /\\$/{
    N
    b loop
    }
  /\(:[a-z][^:]*:\).*\1/Ip
 ' your_in_file


Last edited by DGPickett; 10-16-2012 at 05:50 PM..
# 7  
Old 10-17-2012
Thanks.. still no luck. getting below error.

Code:
sed:   /\(:[a-z][^:]*:\).*\1/Ip is not a recognized function.


Last edited by Scott; 10-17-2012 at 12:52 PM.. Reason: Code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to find duplicate values

The data below consits of items with Class, Sub Class and Property values. I would like to find out same value being captured for different property values for a same Class/Sub Class combination (with in an Item & across items). Like 123 being captured for PAD1, PAD2, PAD4 for ABC-DEF, 456 captured... (4 Replies)
Discussion started by: aramacha
4 Replies

2. Shell Programming and Scripting

[Solved] Find duplicate and add pattern in sed/awk

<Update> I have the solution: sed 's/\{3\}/&;&;---;4/' The thread can be marked as solved! </Update> Hi There, I'm working on a script processing some data from a website into cvs format. There is only one final problem left I can't find a solution. I've processed my file... (0 Replies)
Discussion started by: lolworlds
0 Replies

3. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

4. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

5. Shell Programming and Scripting

File pattern in Case

Hi , I have writen a scipt and passing one Parameter. In the scipt i want verify the parameter patteren using Case statement. exp: sh script.sh 1213 Code: i want verify the paramater values as only number not charater. can you please advise. (2 Replies)
Discussion started by: koti_rama
2 Replies

6. Shell Programming and Scripting

logrotate irrespective of the size of a file/directory

hi, How to logrotate irrespective of the size of a file/directory...? Please help me in this regard... (4 Replies)
Discussion started by: Dedeepthi
4 Replies

7. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

8. UNIX for Advanced & Expert Users

Updating entire column irrespective of any data in a file

Hi, I have a file A.txt (tab separated) as below: pavan chennai/tes/bangalore 100 sunil mangalore/abc/mumbai 230 kumar delhi/nba/andhra 310 I want to change only second column as below . Rest of columns as it is ;The ouput file is also tab... (4 Replies)
Discussion started by: kpavan2004
4 Replies

9. Shell Programming and Scripting

Script to find file name for non matching pattern

Hi, I want to list only the file names which do not contain a specific keyword or search string. OS: Solaris Also is there any way ; through the same script I can save the output of search to a CSV (comma seperated) so that the file can be used for inventory purpose. Any assistance will... (5 Replies)
Discussion started by: sujoy101
5 Replies

10. Shell Programming and Scripting

Find script with input pattern file

Howdy: I have a file with 140+ file name patterns. Each prefix can have dozens of files with different extension names. e.g. 1-S51 1113-G6V 1117-G6V 1119-G6V 1127-G6V 12XW-AF5W 14-UA8N I need to search in 12 directories, (/data/lgc1/basin_mas to /data/lgc12/basin_mas) for all the... (8 Replies)
Discussion started by: iguanathompson
8 Replies
Login or Register to Ask a Question