Difficulty cleaning references to duplicated images in HTML code | Unix Linux Forums | Shell Programming and Scripting

  Unix/Linux Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Difficulty cleaning references to duplicated images in HTML code

Shell Programming and Scripting


Tags
csv, html, images, replace, search

Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 11 December 2014, 1:46 PM EST
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Difficulty cleaning references to duplicated images in HTML code

Hi,

I need to search and replace references to duplicated images in HTML code. There are several groups of duplicated images, which are visually the same, but with different filenames. I managed to find the duplicated files themselves, but now I need to clean the code too. I have a CSV file with each group of duplicated images organized:


Code:
Group ID,Duplicated image filename, Number of duplicates
0,13429.png,3 
0,18064.png,3
0,25025.png,3
1,14136.png,4
1,17382.png,4
1,19243.png,4
1,25389.png,4
2,21560.png,2
2,5529.png,2
3,3523.png,2
3,4811.png,2

and so on...

The references to duplicated images are scattered throughout hundreds of HTML files. The task is to get the <img> tags that references duplicates pointing to just one unique image in each group. I'm wondering if some script magic could get it done easily.

HTML (before): different files, same visual appearance

Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="18064.png" />...text...<img src="18064.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="17382.png" />...text...<img src="19243.png" />...text...<img src="25389.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="5529.png" />


HTML (after): unique file in each group

Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

I searched for some solutions here in the forum, with no success.

Any help you can give would be greatly appreciated.

Last edited by mdart; 01-30-2013 at 02:32 PM..
Sponsored Links
    #2  
Old Unix and Linux 01-30-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 6 March 2015, 1:46 PM EST
Location: Aachen, Germany
Posts: 5,439
Thanks: 94
Thanked 1,419 Times in 1,336 Posts
Not sure I understand what you want to accomplish. Can I paraphrase it like so: replace in all files selected every occurrence of second ff member in group by first, i.e. 18064.png, 25025.png with 13429.png; 17382.png, 19243.png, 25389.png with 14136.png and so on?
Sponsored Links
    #3  
Old Unix and Linux 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 11 December 2014, 1:46 PM EST
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
@RudiC: Yes, that's correct. Sorry if I wasn't very clear.
    #4  
Old Unix and Linux 01-30-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 6 March 2015, 1:46 PM EST
Location: Aachen, Germany
Posts: 5,439
Thanks: 94
Thanked 1,419 Times in 1,336 Posts
OK, try this very crude approach, which may need serious polishing:
Code:
awk -F, 'NR==FNR {Ar[$1]=Ar[$1](Ar[$1]?"|":"")$2;
                  if (!Rr[$1])Rr[$1]=$2; next}
         {for (i in Ar) gsub (Ar[i], Rr[i])}
         1
        ' file file1
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

The Following User Says Thank You to RudiC For This Useful Post:
mdart (01-30-2013)
Sponsored Links
    #5  
Old Unix and Linux 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 11 December 2014, 1:46 PM EST
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Thanks, that worked! Unix or Linux Image Sorry for the newbie question, but how can I run it in more than one file at once?
Sponsored Links
    #6  
Old Unix and Linux 01-30-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 6 March 2015, 1:46 PM EST
Location: Aachen, Germany
Posts: 5,439
Thanks: 94
Thanked 1,419 Times in 1,336 Posts
You can, but how you do it depends on some other factors, like how to collect/find the input files, output concatenated or in separate files.
If all files are in the same directory which is your working directory, this will do:
Code:
awk '...' file.csv *.html

If you have them in a file.txt, try
Code:
awk '...' file.csv $(cat file.txt)

(not sure if this is a UUOC, and there's a better way)
If you need the output separated, try replacing the singular 1 in line 4 by
Code:
{print > FILENAME"new"}

Sponsored Links
    #7  
Old Unix and Linux 01-30-2013
mdart mdart is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 11 December 2014, 1:46 PM EST
Location: Brazil
Posts: 5
Thanks: 3
Thanked 0 Times in 0 Posts
Brilhant, RudiC, this is going to be extremelly useful! Unix or Linux Image

---------- Post updated 01-31-13 at 12:15 AM ---------- Previous update was 01-30-13 at 06:46 PM ----------

I managed to output the results in a new file with


Code:
{print >> "new"}

Is there a way to just overwrite the original files? It's necessary to replace them with the results anyway.
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Unix or Linux Image More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Cleaning AWK code Jotne Shell Programming and Scripting 8 10-06-2012 11:57 AM
Referring to attached images in html email body through mailx biswasbaishali Shell Programming and Scripting 5 03-15-2010 02:13 PM
html link to images in /tmp directory Solerous Web Programming 2 11-25-2008 02:00 PM
how can compile cpp code containing references to java classes surinder Programming 1 10-07-2008 11:44 AM



All times are GMT -4. The time now is 04:02 PM.