Remove duplicate occurrences of text pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate occurrences of text pattern
# 1  
Old 01-14-2016
Question Remove duplicate occurrences of text pattern

Hi folks!

I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#.

# is depicting the line number in the file

Code:
some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text here 
some text here folder2/folder2 some text here folder2/folder2 some text here folder2/folder2 some text here 
some text here folder3/folder3 some text here folder3/folder3 some text here folder3/folder3 some text here 
.... up to line 1000

I'm trying to remove the duplicate occurrences so i can end up with the following:
Code:
some text here folder1/ some text here folder1/ some text here folder1/ some text here 
some text here folder2/ some text here folder2/ some text here folder2/ some text here 
some text here folder3/ some text here folder3/ some text here folder3/ some text here 
.... up to line 1000

Thanks so much for any help!

Last edited by martinsmith; 01-15-2016 at 12:10 AM.. Reason: Describing the issue in more detail
# 2  
Old 01-14-2016
Hello martinsmith,

Could you please try following and let me know if this helps.
i- If your complete data is as same as you have shown, means each line has it's same LINE with line number and not more thn 4 fields in Input_file then following may help.
Code:
awk '{Line="Line" NR":";Folder="folder" NR"/";print Line OFS Folder OFS Folder OFS Folder}' Input_file

2nd: If you may have different data like different LINE numbers. number of columns(But considering that columns which have LINE string willhave only 2 columns serated by /) may vary than following may help you in same.
Code:
awk '{for(i=2;i<=NF;i++){split($i, A,"/");if(A[1]==A[2]){Q=Q?Q OFS A[1] "/":A[1] "/"} else {Q=Q?Q OFS $i:$i};}print $1 OFS Q;Q=""}'  Input_file

Output will be as follows in both above conditions.
Code:
Line1: folder1/ folder1/ folder1/
Line2: folder2/ folder2/ folder2/
Line3: folder3/ folder3/ folder3/
Line4: folder4/ folder4/ folder4/
Line5: folder5/ folder5/ folder5/

Thanks,
R. Singh

Last edited by RavinderSingh13; 01-14-2016 at 11:41 PM.. Reason: Added a comment now to solution.
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 01-15-2016
Quote:
Originally Posted by RavinderSingh13
Hello martinsmith,

Could you please try following and let me know if this helps.
i- If your complete data is as same as you have shown, means each line has it's same LINE with line number and not more thn 4 fields in Input_file then following may help.
Code:
awk '{Line="Line" NR":";Folder="folder" NR"/";print Line OFS Folder OFS Folder OFS Folder}' Input_file

2nd: If you may have different data like different LINE numbers. number of columns(But considering that columns which have LINE string willhave only 2 columns serated by /) may vary than following may help you in same.
Code:
awk '{for(i=2;i<=NF;i++){split($i, A,"/");if(A[1]==A[2]){Q=Q?Q OFS A[1] "/":A[1] "/"} else {Q=Q?Q OFS $i:$i};}print $1 OFS Q;Q=""}'  Input_file

Output will be as follows in both above conditions.
Code:
Line1: folder1/ folder1/ folder1/
Line2: folder2/ folder2/ folder2/
Line3: folder3/ folder3/ folder3/
Line4: folder4/ folder4/ folder4/
Line5: folder5/ folder5/ folder5/

Thanks,
R. Singh
Hi R. Singh,

Thanks very much. Your solution does work. Unfortunately i did not describe my issue more clearly so it did not work for my problem. I have updated the question with more clarification.

So basically on each line i have a whole bunch of different text, and within each line of text i have 26 occurrences of folder#/folder# at various places between the text. I just need the duplicate removed.

Thanks so much!
# 4  
Old 01-15-2016
Hello martinsmith,

I could see my 2nd code works in POST#2, following is the code for same.
Let's say we have following Input_file:
Code:
some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text here 
some text here folder2/folder2 some text here folder2/folder2 some text here folder2/folder2 some text here 
some text here folder3/folder3 some text here folder3/folder3 some text here folder3/folder3 some text here

When I run code as follows.
Code:
awk '{for(i=2;i<=NF;i++){split($i, A,"/");if(A[1]==A[2]){Q=Q?Q OFS A[1] "/":A[1] "/"} else {Q=Q?Q OFS $i:$i};}print $1 OFS Q;Q=""}'  Input_file

Output will be as follows.
Code:
some text here folder1/ some text here folder1/ some text here folder1/ some text here
some text here folder2/ some text here folder2/ some text here folder2/ some text here
some text here folder3/ some text here folder3/ some text here folder3/ some text here

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 01-15-2016
Please, try:
Code:
 perl -pe 's:(folder\d+)/\1:$1/:g' martinsmith.file

Code:
some text here folder1/ some text here folder1/ some text here folder1/ some text here
some text here folder2/ some text here folder2/ some text here folder2/ some text here
some text here folder3/ some text here folder3/ some text here folder3/ some text here

This User Gave Thanks to Aia For This Post:
# 6  
Old 01-15-2016
Hi R. Singh,

Yes the 2nd one worked perfectly. I overlooked it!
Code:
awk '{for(i=2;i<=NF;i++){split($i, A,"/");if(A[1]==A[2]){Q=Q?Q OFS A[1] "/":A[1] "/"} else {Q=Q?Q OFS $i:$i};}print $1 OFS Q;Q=""}'  Input_file

Thanks so much for your help. It's much appreciated, will save me a lot of work Smilie

Cheers
# 7  
Old 01-15-2016
You could also try:
Code:
sed 'sX \([^ ]*\)/\1X \1/Xg' file

This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove comments like pattern from text

Hi , We need to remove comment like pattern from a code text. The possible comment expressions are as follows. Input BizComment : Special/*@ Name:bzt_53_3aea640a_51783afa_5d64_0 BizHidden:true @*/ /* lookup Disease Category Therapuetic Class */ a=b;... (6 Replies)
Discussion started by: VikashKumar
6 Replies

2. Shell Programming and Scripting

How to remove duplicate text blocks from a file?

Hi All I have a list of files which will have duplicate list of blocks of text. Following is a sample of the file, I have removed the sensitive information from the file. All the code samples starts from <TR BGCOLOR="white"> and Ends with IP address and two html tags like this. 10.14.22.22... (3 Replies)
Discussion started by: mahasona
3 Replies

3. Windows & DOS: Issues & Discussions

Remove duplicate lines from text files.

So, I have text files, one "fail.txt" And one "color.txt" I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file. Afterwards there shall be no duplicate lines. (1 Reply)
Discussion started by: pasc
1 Replies

4. Shell Programming and Scripting

Remove duplicate line starting with a pattern

HI, I have the below input file /* ----------------- cmdsDlyStartFWJ -----------------*/ UNIX_JOB CMDS065J RUN ANY CMDNAME sleep 5 AGENT CMDSHP USER proddata RUN MON,TUE,WED,THU,FRI DELAYSUB 02:00 /* "Triggers daily file watcher jobs" */ ENVAR... (5 Replies)
Discussion started by: varun22486
5 Replies

5. Shell Programming and Scripting

Filter or remove duplicate block of text without distinguishing marks or fields

Hello, Although I have found similar questions, I could not find advice that could help with our problem. The issue: We have several hundreds text files containing repeated blocks of text (I guess back at the time they were prepared like that to optmize printing). The block of texts... (13 Replies)
Discussion started by: samask
13 Replies

6. Shell Programming and Scripting

Help with remove last text of a file that have specific pattern

Input file matrix-remodelling_associated_8_ aurora_interacting_1_ L20 von_factor_A_domain_1 ATP_containing_3B_ . . Output file matrix-remodelling_associated_8 aurora_interacting_1 L20 von_factor_A_domain_1 ATP_containing_3B . . (3 Replies)
Discussion started by: perl_beginner
3 Replies

7. Shell Programming and Scripting

How to remove all text except pattern

i have nasty html file with 2000+ simbols in 1 row...i need to remove whole the code except title="Some title..." and store those into file with titles (the whole text is in variable text) i've tried something like this: echo $text | sed 's/.*\(title=\".+\"\).*/\1/' > titles.html BUT it does... (13 Replies)
Discussion started by: Lukasito
13 Replies

8. Shell Programming and Scripting

Count the number of occurrences of a pattern between each occurrence of a different pattern

I need to count the number of occurrences of a pattern, say 'key', between each occurrence of a different pattern, say 'lu'. Here's a portion of the text I'm trying to parse: lu S1234L_149_m1_vg.6, part-att 1, vdp-att 1 p-reserver IID 0xdb registrations: key 4156 4353 0000 0000 ... (3 Replies)
Discussion started by: slipstream
3 Replies

9. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

10. Shell Programming and Scripting

Remove duplicate text

Hello, I have a log file which is generated by a script which looks like this: userid: 7 starttime: Sat May 24 23:24:13 CEST 2008 endtime: Sat May 24 23:26:57 CEST 2008 total time spent: 2.73072 minutes / 163.843 seconds date: Sat Jun 7 16:09:03 CEST 2008 userid: 8 starttime: Sun May... (7 Replies)
Discussion started by: dejavu88
7 Replies
Login or Register to Ask a Question