Difficulty cleaning references to duplicated images in HTML code | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Difficulty cleaning references to duplicated images in HTML code

Shell Programming and Scripting


Tags
csv, html, images, replace, search

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 28 June 2013, 5:04 PM EDT
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Difficulty cleaning references to duplicated images in HTML code

Hi,

I need to search and replace references to duplicated images in HTML code. There are several groups of duplicated images, which are visually the same, but with different filenames. I managed to find the duplicated files themselves, but now I need to clean the code too. I have a CSV file with each group of duplicated images organized:


Code:
Group ID,Duplicated image filename, Number of duplicates
0,13429.png,3 
0,18064.png,3
0,25025.png,3
1,14136.png,4
1,17382.png,4
1,19243.png,4
1,25389.png,4
2,21560.png,2
2,5529.png,2
3,3523.png,2
3,4811.png,2

and so on...

The references to duplicated images are scattered throughout hundreds of HTML files. The task is to get the <img> tags that references duplicates pointing to just one unique image in each group. I'm wondering if some script magic could get it done easily.

HTML (before): different files, same visual appearance

Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="18064.png" />...text...<img src="18064.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="17382.png" />...text...<img src="19243.png" />...text...<img src="25389.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="5529.png" />


HTML (after): unique file in each group

Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

I searched for some solutions here in the forum, with no success.

Any help you can give would be greatly appreciated.

Last edited by mdart; 01-30-2013 at 01:32 PM..
Sponsored Links
    #2  
Old 01-30-2013
RudiC RudiC is online now Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 7:27 AM EDT
Location: Aachen, Germany
Posts: 3,939
Thanks: 63
Thanked 936 Times in 888 Posts
Not sure I understand what you want to accomplish. Can I paraphrase it like so: replace in all files selected every occurrence of second ff member in group by first, i.e. 18064.png, 25025.png with 13429.png; 17382.png, 19243.png, 25389.png with 14136.png and so on?
Sponsored Links
    #3  
Old 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 28 June 2013, 5:04 PM EDT
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
@RudiC: Yes, that's correct. Sorry if I wasn't very clear.
    #4  
Old 01-30-2013
RudiC RudiC is online now Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 7:27 AM EDT
Location: Aachen, Germany
Posts: 3,939
Thanks: 63
Thanked 936 Times in 888 Posts
OK, try this very crude approach, which may need serious polishing:
Code:
awk -F, 'NR==FNR {Ar[$1]=Ar[$1](Ar[$1]?"|":"")$2;
                  if (!Rr[$1])Rr[$1]=$2; next}
         {for (i in Ar) gsub (Ar[i], Rr[i])}
         1
        ' file file1
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

The Following User Says Thank You to RudiC For This Useful Post:
mdart (01-30-2013)
Sponsored Links
    #5  
Old 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 28 June 2013, 5:04 PM EDT
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Thanks, that worked! Sorry for the newbie question, but how can I run it in more than one file at once?
Sponsored Links
    #6  
Old 01-30-2013
RudiC RudiC is online now Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 7:27 AM EDT
Location: Aachen, Germany
Posts: 3,939
Thanks: 63
Thanked 936 Times in 888 Posts
You can, but how you do it depends on some other factors, like how to collect/find the input files, output concatenated or in separate files.
If all files are in the same directory which is your working directory, this will do:
Code:
awk '...' file.csv *.html

If you have them in a file.txt, try
Code:
awk '...' file.csv $(cat file.txt)

(not sure if this is a UUOC, and there's a better way)
If you need the output separated, try replacing the singular 1 in line 4 by
Code:
{print > FILENAME"new"}

Sponsored Links
    #7  
Old 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 28 June 2013, 5:04 PM EDT
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Brilhant, RudiC, this is going to be extremelly useful!

---------- Post updated 01-31-13 at 12:15 AM ---------- Previous update was 01-30-13 at 06:46 PM ----------

I managed to output the results in a new file with


Code:
{print >> "new"}

Is there a way to just overwrite the original files? It's necessary to replace them with the results anyway.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Cleaning AWK code Jotne Shell Programming and Scripting 8 10-06-2012 10:57 AM
Referring to attached images in html email body through mailx biswasbaishali Shell Programming and Scripting 5 03-15-2010 01:13 PM
html link to images in /tmp directory Solerous Web Programming 2 11-25-2008 01:00 PM
how can compile cpp code containing references to java classes surinder Programming 1 10-07-2008 10:44 AM



All times are GMT -4. The time now is 07:27 AM.