I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.
Eg: My documents exist in /web/Documents (with sub-folders) and my html files exist in /web/html
My users were kind to me and made sure that they did both absolute linking and relative linking meaning they used <a href="Documents/***.doc"> and <a href="http://xxx.com/Documents/***.doc">
And of course not everyone was case-sensitive when linking.
Can someone help me in figuring out how do I accomplish this.
Thanks in advance.
---------- Post updated at 02:11 PM ---------- Previous update was at 12:22 PM ----------
I am able to get a huge list of all the links (irrelevant of whether they are within the domain or not) by using
perl -nle 'print " $&" if /(?<=href=")[^">]+/' *.html
This gives me a list of all my links within the folder.
eg:
Documents/....
mailto:
external website links
xxxx.html (other html documents within the domain)
Welcome to the forum.
1. Please post few lines from the HTML file.. lines containing both absolute linking and relative linking (preferably covering all possibilities that needs to be parsed)
2. And please use code tags for codes and data samples.
Here are the examples of links in a html file. I have changed the webpage names and wordings but to give you a jist of what it would look like
Possible cases for documents are pdf, doc, docx, ppt, pptx, xls, xlsx, jpg. I want to be able to copy any of the above files into a separate directory (retaining the folder structure)
Absolute linking
Relative linking
Note: There are space in the names of the pdf's and they use upper and lower case 'd'
External site ( I really don't care for this but it shows up in my query)
Email link ( I really don't care for this but it shows up in my query)
Linking to page within the site ( I really don't care for this but it shows up in my query)
Ideally, I would like to be able to create directories and copy the files as well. Eg: if my list has Documents/PDF/document1.pdf I want to copy it to a location say in my destination 'copy' folder copy/Documents/PDF/document1.pdf
I am hoping to keep the directory hierarchy so I don't break any existing links in the html files.
How to copy files from one directory to another directory with the subfolders copied.
If i have folder1/sub1/sub2/* it needs to copy files to folder2/sub1/sub2/*.
I do not want to create sub folders in folder2.
Can copy command create them automatically?
I tried cp -a and cp -R but did... (4 Replies)
I have directory that has some billion file inside , i tried copy some files for specific date but it's always did not respond for long time and did not give any result.. i tried everything with find command and also with xargs..
even this command find . -mtime -2 -print | xargs ls -d did not... (2 Replies)
I can't find how to do this.
I want to take a bulk of files, and copy/move a specific number of them (say 1000) to a newly created directory. Once that directory is full, I want to create a new folder and copy/move another batch of files, and so on.
Seems like there should be an easy way to... (6 Replies)
Hi all,
i have 2 directory of files, the first directory(ext1directory) contain files of extension .ext1 and the second directory(allextdirectory) contains files of multiple extensions (.ext1,.ext2,.ext3,..)
so i want to copy the files from directory 2(allextdirectory) that have the same name... (8 Replies)
Dear All,
Again I have another simple question. :confused:
I want to write a csh which can copy all files of a current directory with a new name in the same directory, I mean:
If I have tree bird apple as files in a directory I want to give ,say number 007 as argument to my csh and it copies... (3 Replies)
Iam in the process of copying a directory with thousands of directories and files into a new directory.
I need to preserve permissions, owner, group, date and timestamps, everything.
Iam using AIX and would need help of writing the command whether it is cp-RP or cpio.
Apprecaite your... (3 Replies)
I've been able to find all the extensionless files named photos using the command:
find /usr/local/apache/htdocs -name photos -print0
I need to copy those files to the name photos.php in their same directory.
I've found a bunch of xarg examples for moving to other directories but I wasn't... (7 Replies)
Hi
when copy the files from one directory to another as like below,it is tried to copy *. as a file.
cp /home/rha/*.
My objective is to copy all the files (don't care about case sensitive),
Thanks in advance for your valuable reply. (1 Reply)
I need to copy about 13 Tb of data from one directory and subdirectories to the other (another mount point). If I run this as a cron, say between 10 pm and 7 am, not all of the files will be copied over. Is there a way of 'resuming' the copy the following evenings until all files are copied over? (0 Replies)