Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-21-2012
Registered User
 
Join Date: Feb 2012
Posts: 5
Thanks: 1
Thanked 0 Times in 0 Posts
Copy files into another directory

I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.

Eg: My documents exist in /web/Documents (with sub-folders) and my html files exist in /web/html

My users were kind to me and made sure that they did both absolute linking and relative linking meaning they used <a href="Documents/***.doc"> and <a href="http://xxx.com/Documents/***.doc">

And of course not everyone was case-sensitive when linking.

Can someone help me in figuring out how do I accomplish this.

Thanks in advance.

---------- Post updated at 02:11 PM ---------- Previous update was at 12:22 PM ----------

I am able to get a huge list of all the links (irrelevant of whether they are within the domain or not) by using

perl -nle 'print " $&" if /(?<=href=")[^">]+/' *.html

This gives me a list of all my links within the folder.
eg:
Documents/....
mailto:
external website links
xxxx.html (other html documents within the domain)


How do I go further from here...
Sponsored Links
    #2  
Old 02-21-2012
balajesuri's Avatar
#! /bin/bash
 
Join Date: Apr 2009
Location: India
Posts: 1,561
Thanks: 14
Thanked 438 Times in 423 Posts
Welcome to the forum.
1. Please post few lines from the HTML file.. lines containing both absolute linking and relative linking (preferably covering all possibilities that needs to be parsed)
2. And please use code tags for codes and data samples.
Sponsored Links
    #3  
Old 02-22-2012
Registered User
 
Join Date: Feb 2012
Posts: 5
Thanks: 1
Thanked 0 Times in 0 Posts
Here are the examples of links in a html file. I have changed the webpage names and wordings but to give you a jist of what it would look like

Possible cases for documents are pdf, doc, docx, ppt, pptx, xls, xlsx, jpg. I want to be able to copy any of the above files into a separate directory (retaining the folder structure)

Absolute linking

Code:
<a href="http://mywebsite.com/Documents/comm_ed/regform.pdf">

Relative linking

Code:
<a href="Documents/PDF/handbook.pdf" target="_blank">
<a href-"Documents/htb/1112.doc" title="test" target="_blank">
<a href="Documents/life/2011-12 HANDBOOK.pdf">
<a href="documents/science/oral06R2.pdf">
<a href="Documents/arts&amp;letters/F 11 FINAL.doc">
<a href="documents/b_office/Office%20Change%20Request.doc">
<a href="../../html/Documents/htb/211_diverse.xls">


Note: There are space in the names of the pdf's and they use upper and lower case 'd'

External site ( I really don't care for this but it shows up in my query)

Code:
<a href="http://yahoo.com">

Email link ( I really don't care for this but it shows up in my query)

Code:
<a href="mailto:webmaster@mywebsite.com">

Linking to page within the site ( I really don't care for this but it shows up in my query)

Code:
<a href="anotherpage.html">



Ideally, I would like to be able to create directories and copy the files as well. Eg: if my list has Documents/PDF/document1.pdf I want to copy it to a location say in my destination 'copy' folder copy/Documents/PDF/document1.pdf

I am hoping to keep the directory hierarchy so I don't break any existing links in the html files.

Thank you so much for your help.
    #4  
Old 02-22-2012
balajesuri's Avatar
#! /bin/bash
 
Join Date: Apr 2009
Location: India
Posts: 1,561
Thanks: 14
Thanked 438 Times in 423 Posts

Code:
#! /bin/bash
while IFS='"' read a file c
do
    echo $file | grep -qi 'documents'
    [ $? -ne 0 ] && continue
    file=`echo $file | sed 's/.*\(documents\/.*\)/\1/i'`
    mkdir -p copy/`dirname $file`
    cp $file copy/$file
done < inputfile.xml


Last edited by balajesuri; 02-22-2012 at 10:49 PM..
Sponsored Links
    #5  
Old 02-24-2012
Registered User
 
Join Date: Feb 2012
Posts: 5
Thanks: 1
Thanked 0 Times in 0 Posts
Thank you. It's creating the directories but not copying the files.

---------- Post updated at 11:55 AM ---------- Previous update was at 11:16 AM ----------

I guess it helps if I give the error message

cp: cannot stat `Documents/htb/able%20Pop%20Blocker.pdf': No such file or directory

My deduction is that the %20 should be a space. If I manually copy it I would be doing something like
Documents/htb/able\ Pop\ Blocker.pdf

So how do I replace the %20 with \(space)
Sponsored Links
    #6  
Old 02-24-2012
Mead Rotor
 
Join Date: Aug 2005
Location: Saskatchewan
Posts: 16,371
Thanks: 490
Thanked 2,534 Times in 2,417 Posts

Code:
sed -i 's/%20/ /g' file.html

Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to copy all files into the same directory dreamer0085 UNIX for Dummies Questions & Answers 3 01-26-2011 12:22 PM
Copy files from a directory by ftp hippa77 UNIX for Dummies Questions & Answers 3 10-06-2007 02:26 PM
Copy files from one directory to another HAA Shell Programming and Scripting 1 07-06-2006 02:47 AM
Copy files from one directory to another hd2006 Shell Programming and Scripting 0 06-07-2006 02:29 PM
copy files from one directory to another directory zip_zip UNIX for Dummies Questions & Answers 5 09-14-2003 06:16 PM



All times are GMT -4. The time now is 02:29 AM.