|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Copy files into another directory
I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.
Eg: My documents exist in /web/Documents (with sub-folders) and my html files exist in /web/html My users were kind to me and made sure that they did both absolute linking and relative linking meaning they used <a href="Documents/***.doc"> and <a href="http://xxx.com/Documents/***.doc">And of course not everyone was case-sensitive when linking. Can someone help me in figuring out how do I accomplish this. Thanks in advance. ---------- Post updated at 02:11 PM ---------- Previous update was at 12:22 PM ---------- I am able to get a huge list of all the links (irrelevant of whether they are within the domain or not) by using perl -nle 'print " $&" if /(?<=href=")[^">]+/' *.html This gives me a list of all my links within the folder. eg: Documents/.... mailto: external website links xxxx.html (other html documents within the domain) How do I go further from here... |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Welcome to the forum.
1. Please post few lines from the HTML file.. lines containing both absolute linking and relative linking (preferably covering all possibilities that needs to be parsed) 2. And please use code tags for codes and data samples. |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Here are the examples of links in a html file. I have changed the webpage names and wordings but to give you a jist of what it would look like Possible cases for documents are pdf, doc, docx, ppt, pptx, xls, xlsx, jpg. I want to be able to copy any of the above files into a separate directory (retaining the folder structure) Absolute linking Code:
<a href="http://mywebsite.com/Documents/comm_ed/regform.pdf"> Relative linking Code:
<a href="Documents/PDF/handbook.pdf" target="_blank"> <a href-"Documents/htb/1112.doc" title="test" target="_blank"> <a href="Documents/life/2011-12 HANDBOOK.pdf"> <a href="documents/science/oral06R2.pdf"> <a href="Documents/arts&letters/F 11 FINAL.doc"> <a href="documents/b_office/Office%20Change%20Request.doc"> <a href="../../html/Documents/htb/211_diverse.xls"> Note: There are space in the names of the pdf's and they use upper and lower case 'd' External site ( I really don't care for this but it shows up in my query) Code:
<a href="http://yahoo.com"> Email link ( I really don't care for this but it shows up in my query) Code:
<a href="mailto:webmaster@mywebsite.com"> Linking to page within the site ( I really don't care for this but it shows up in my query) Code:
<a href="anotherpage.html"> Ideally, I would like to be able to create directories and copy the files as well. Eg: if my list has Documents/PDF/document1.pdf I want to copy it to a location say in my destination 'copy' folder copy/Documents/PDF/document1.pdf I am hoping to keep the directory hierarchy so I don't break any existing links in the html files. Thank you so much for your help. |
|
#4
|
||||
|
||||
|
Code:
#! /bin/bash
while IFS='"' read a file c
do
echo $file | grep -qi 'documents'
[ $? -ne 0 ] && continue
file=`echo $file | sed 's/.*\(documents\/.*\)/\1/i'`
mkdir -p copy/`dirname $file`
cp $file copy/$file
done < inputfile.xmlLast edited by balajesuri; 02-22-2012 at 10:49 PM.. |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Thank you. It's creating the directories but not copying the files.
---------- Post updated at 11:55 AM ---------- Previous update was at 11:16 AM ---------- I guess it helps if I give the error message ![]() cp: cannot stat `Documents/htb/able%20Pop%20Blocker.pdf': No such file or directory My deduction is that the %20 should be a space. If I manually copy it I would be doing something like Documents/htb/able\ Pop\ Blocker.pdf So how do I replace the %20 with \(space) |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Code:
sed -i 's/%20/ /g' file.html |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How to copy all files into the same directory | dreamer0085 | UNIX for Dummies Questions & Answers | 3 | 01-26-2011 12:22 PM |
| Copy files from a directory by ftp | hippa77 | UNIX for Dummies Questions & Answers | 3 | 10-06-2007 02:26 PM |
| Copy files from one directory to another | HAA | Shell Programming and Scripting | 1 | 07-06-2006 02:47 AM |
| Copy files from one directory to another | hd2006 | Shell Programming and Scripting | 0 | 06-07-2006 02:29 PM |
| copy files from one directory to another directory | zip_zip | UNIX for Dummies Questions & Answers | 5 | 09-14-2003 06:16 PM |
|
|