Sponsored Content
Top Forums Shell Programming and Scripting Script to find duplicate pattern in a file irrespective of case Post 302714751 by DGPickett on Friday 12th of October 2012 02:37:41 PM
Old 10-12-2012
Many tools have case insensitivity and forward reference in their regex. Regular expressions have many flavors and extensions. For instance, this sed finds such:
Code:
sed -in '
  :loop
  $d
  /\\$/{
    N
    b loop
    }
  /\(:[a-z][^:]*:\).*\1/p
 ' your_in_file

Narrative flow: sed runs in case insensitive mode and with no automatic output, I create a branch target so I can loop, if I hit EOF while collecting a \ continued line, I bail out with a delete, if there is a \ at the end of the line, read another line into the buffer and recheck eof and \ at end of line, grab \(\) each colon+letter+not-colon-however-many-times+colon and see if it occurs later \1, and if any such, print. The letter keeps me from grabbing inter-field areas like colon+\+end-of-line+spaces+colon ':\\\n *:' as a field.

Last edited by DGPickett; 10-12-2012 at 03:46 PM..
This User Gave Thanks to DGPickett For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find script with input pattern file

Howdy: I have a file with 140+ file name patterns. Each prefix can have dozens of files with different extension names. e.g. 1-S51 1113-G6V 1117-G6V 1119-G6V 1127-G6V 12XW-AF5W 14-UA8N I need to search in 12 directories, (/data/lgc1/basin_mas to /data/lgc12/basin_mas) for all the... (8 Replies)
Discussion started by: iguanathompson
8 Replies

2. Shell Programming and Scripting

Script to find file name for non matching pattern

Hi, I want to list only the file names which do not contain a specific keyword or search string. OS: Solaris Also is there any way ; through the same script I can save the output of search to a CSV (comma seperated) so that the file can be used for inventory purpose. Any assistance will... (5 Replies)
Discussion started by: sujoy101
5 Replies

3. UNIX for Advanced & Expert Users

Updating entire column irrespective of any data in a file

Hi, I have a file A.txt (tab separated) as below: pavan chennai/tes/bangalore 100 sunil mangalore/abc/mumbai 230 kumar delhi/nba/andhra 310 I want to change only second column as below . Rest of columns as it is ;The ouput file is also tab... (4 Replies)
Discussion started by: kpavan2004
4 Replies

4. Shell Programming and Scripting

find out duplicate records in file?

Dear All, I have one file which looks like : account1:passwd1 account2:passwd2 account3:passwd3 account1:passwd4 account5:passwd5 account6:passwd6 you can see there're two records for account1. and is there any shell command which can find out : account1 is the duplicate record in... (3 Replies)
Discussion started by: tiger2000
3 Replies

5. Shell Programming and Scripting

logrotate irrespective of the size of a file/directory

hi, How to logrotate irrespective of the size of a file/directory...? Please help me in this regard... (4 Replies)
Discussion started by: Dedeepthi
4 Replies

6. Shell Programming and Scripting

File pattern in Case

Hi , I have writen a scipt and passing one Parameter. In the scipt i want verify the parameter patteren using Case statement. exp: sh script.sh 1213 Code: i want verify the paramater values as only number not charater. can you please advise. (2 Replies)
Discussion started by: koti_rama
2 Replies

7. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

8. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

9. Shell Programming and Scripting

[Solved] Find duplicate and add pattern in sed/awk

<Update> I have the solution: sed 's/\{3\}/&;&;---;4/' The thread can be marked as solved! </Update> Hi There, I'm working on a script processing some data from a website into cvs format. There is only one final problem left I can't find a solution. I've processed my file... (0 Replies)
Discussion started by: lolworlds
0 Replies

10. Shell Programming and Scripting

awk script to find duplicate values

The data below consits of items with Class, Sub Class and Property values. I would like to find out same value being captured for different property values for a same Class/Sub Class combination (with in an Item & across items). Like 123 being captured for PAD1, PAD2, PAD4 for ABC-DEF, 456 captured... (4 Replies)
Discussion started by: aramacha
4 Replies
TV_SORT(1p)						User Contributed Perl Documentation					       TV_SORT(1p)

NAME
tv_sort - Sort XMLTV listings files by date, and add stop times. SYNOPSIS
tv_sort [--help] [--by-channel] [--output FILE] [FILE...] DESCRIPTION
Read XMLTV data and write out the same data sorted in date order. Where stop times of programmes are missing, guess them from the start time of the next programme on the same channel. For the last programme of a channel, no stop time can be added. Tv_sort also performs some sanity checks such as making sure no two programmes on the same channel overlap. --output FILE write to FILE rather than standard output --by-channel sort first by channel id, then by date within each channel. --duplicate-error If the input contains the same programme more than once, consider this as an error. Default is to silently ignore duplicate entries. The time sorting is by start time, then by stop time. Without --by-channel, if start times and stop times are equal then two programmes are sorted by internal channel id. With --by-channel, channel id is compared first and then times. You can think of tv_sort as converting XMLTV data into a canonical form, useful for diffing two files. EXAMPLES
At a typical Unix shell or Windows command prompt: tv_sort <in.xml >out.xml tv_sort in.xml --output out.xml These are different ways of saying the same thing. AUTHOR
Ed Avis, ed@membled.com perl v5.14.2 2006-03-02 TV_SORT(1p)
All times are GMT -4. The time now is 12:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy