Difficulty cleaning references to duplicated images in HTML code


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Difficulty cleaning references to duplicated images in HTML code
# 1  
Old 01-30-2013
Difficulty cleaning references to duplicated images in HTML code

Hi,

I need to search and replace references to duplicated images in HTML code. There are several groups of duplicated images, which are visually the same, but with different filenames. I managed to find the duplicated files themselves, but now I need to clean the code too. I have a CSV file with each group of duplicated images organized:

Code:
Group ID,Duplicated image filename, Number of duplicates
0,13429.png,3 
0,18064.png,3
0,25025.png,3
1,14136.png,4
1,17382.png,4
1,19243.png,4
1,25389.png,4
2,21560.png,2
2,5529.png,2
3,3523.png,2
3,4811.png,2

and so on...

The references to duplicated images are scattered throughout hundreds of HTML files. The task is to get the <img> tags that references duplicates pointing to just one unique image in each group. I'm wondering if some script magic could get it done easily.

HTML (before): different files, same visual appearance
Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="18064.png" />...text...<img src="18064.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="17382.png" />...text...<img src="19243.png" />...text...<img src="25389.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="5529.png" />


HTML (after): unique file in each group
Code:
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

I searched for some solutions here in the forum, with no success.

Any help you can give would be greatly appreciated.

Last edited by mdart; 01-30-2013 at 02:32 PM..
# 2  
Old 01-30-2013
Not sure I understand what you want to accomplish. Can I paraphrase it like so: replace in all files selected every occurrence of second ff member in group by first, i.e. 18064.png, 25025.png with 13429.png; 17382.png, 19243.png, 25389.png with 14136.png and so on?
# 3  
Old 01-30-2013
@RudiC: Yes, that's correct. Sorry if I wasn't very clear.
# 4  
Old 01-30-2013
OK, try this very crude approach, which may need serious polishing:
Code:
awk -F, 'NR==FNR {Ar[$1]=Ar[$1](Ar[$1]?"|":"")$2;
                  if (!Rr[$1])Rr[$1]=$2; next}
         {for (i in Ar) gsub (Ar[i], Rr[i])}
         1
        ' file file1
<!-- group 0 -->
<img src="13429.png" />...text...<img src="13429.png" />...text...<img src="13429.png" />

<!-- group 1 -->
<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />...text...<img src="14136.png" />

<!-- group 2 -->
<img src="21560.png" />...text...<img src="21560.png" />

This User Gave Thanks to RudiC For This Post:
# 5  
Old 01-30-2013
Thanks, that worked! Smilie Sorry for the newbie question, but how can I run it in more than one file at once?
# 6  
Old 01-30-2013
You can, but how you do it depends on some other factors, like how to collect/find the input files, output concatenated or in separate files.
If all files are in the same directory which is your working directory, this will do:
Code:
awk '...' file.csv *.html

If you have them in a file.txt, try
Code:
awk '...' file.csv $(cat file.txt)

(not sure if this is a UUOC, and there's a better way)
If you need the output separated, try replacing the singular 1 in line 4 by
Code:
{print > FILENAME"new"}

# 7  
Old 01-30-2013
Brilhant, RudiC, this is going to be extremelly useful! Smilie

---------- Post updated 01-31-13 at 12:15 AM ---------- Previous update was 01-30-13 at 06:46 PM ----------

I managed to output the results in a new file with

Code:
{print >> "new"}

Is there a way to just overwrite the original files? It's necessary to replace them with the results anyway.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Web Development

Changing Images in HTML

Hi, I recently bought shared hosting at asphostportal.com. Now, I have little problem. Could you help me please? How to change the images after every 5 seconds in html? The images should display in the same places for every 5 seconds? can anybody send me code please. Thanks. (4 Replies)
Discussion started by: minnawanda
4 Replies

2. Shell Programming and Scripting

Bash Script to find/sort/move images/duplicate images from USB drive

Ultimately, I'm looking to create a script that allows me to plug in a usb drive with lots of jpegs on it & copy them over to a folder on my hard drive. So in the process of copying I am looking to hash check them, record dupes to a file, copy only 1 of the identical files (if it doesn't exsist... (1 Reply)
Discussion started by: JonaQuinn
1 Replies

3. UNIX and Linux Applications

[solved]Moving server...need to find all hard code IP references

I'm moving my web server to a different datacenter. OS is CentOS 5.8 Apache 2.2.3 qmail NcFTPd Its been 12 years since I relocated a server. Lots of brain cells lost since then...:-) I need to identify all the config files that contain the server's IP addresses. Memory has provided... (0 Replies)
Discussion started by: scasey
0 Replies

4. Shell Programming and Scripting

Cleaning AWK code

Hi I need some help to clean my code used to get city location. wget -q -O - http://www.ip2location.com/ | grep chkRegionCity | awk 'END { print }' | awk -F"" '{print $4}' It gives me the city but have a leading space. I am sure this could all be done by one single AWK Also if possible... (8 Replies)
Discussion started by: Jotne
8 Replies

5. Shell Programming and Scripting

SED help - cleaning up code, extra spaces won't go away

Hello, W/in the script I'm working on, I have a need to take a column from a file, and format it so I can have a variable that will egrep for & invert the regex from another file. My solution is this: VAR=`awk -F, '{print $2}' $FAIL | sed 's/-i/\|/g'` VAR2=`echo $VAR | sed 's/... (5 Replies)
Discussion started by: Matthias03
5 Replies

6. Shell Programming and Scripting

Referring to attached images in html email body through mailx

encoding type for images? (5 Replies)
Discussion started by: biswasbaishali
5 Replies

7. Web Development

html link to images in /tmp directory

Because of permission issues, I need to link to images in my web page which are stored in /tmp which of course is located in the root directory but my actual html page is much further down in another directory. I thought the the following code should work, but the image comes up as a broken link:... (2 Replies)
Discussion started by: Solerous
2 Replies

8. Programming

how can compile cpp code containing references to java classes

hi there is example (on link given below )of such code that contains java class reference in c++ program. http://slackware.cs.utah.edu/pub/slackware/slackware-7.1/docs/Linux-HOWTO/Process-Monitor-HOWTO I am new in linux environment. and not able to compile it. when i compile it through... (1 Reply)
Discussion started by: surinder
1 Replies
Login or Register to Ask a Question