The above downloads and create unique entries for the 97 links in the text file. However, each new file is saved as CM080 with a FILE extention. Is there a way to convert each file in that directory to a .txt? The 97 files are in C:\Users\cmccabe\Desktop\list\geneticslab.emory.edu\tests.
Afterwards, you could run a rename in a loop. Assuming that the directory only contains the files you want, you can:-
Code:
cd target_directory
for file in *
do
mv $file $file.txt
done
If the file names gets longer and/or the number of files increases, you may hit a limit on the length of the command line when * is expanded, so bear that in mind.
All files will be renamed, so if you have a1.filea2.file & already have a1.file.txt and a2.file.txt then results might be a little unpredictable. It may well work that it will rename a1.file to a1.file.txt and then rename the same file to be a1.file.txt.txt which might be very confusing, so make sure you start with an empty directory before you download the files and rename them.
You know you have just renamed a html file to a txt file, don't you?
One can hardly convert a html file to a txt file, I mean in a way that the html tags disappear (yes, you can parse it with sed, but it's not recommended)
What do you think about this (yes, it looks complicated, but it might be a way better solution)...
In the download folder;
(Make sure there are only files downloaded from link.txt, just in case...)
extracts the "path" to the download link for the appropriate pdf file.
creates a file tcode-pdf.txt with testcode-pdfname pairs (later, this is used in the renaming process)
generates a download list
Code:
wget -x -i /tmp/list2.txt
This time, wget will download PDFs
Code:
awk '{ A[$1]=$2; next} END { for (i in A) print "mv \x27"A[i]"\x27",i".pdf" }' /tmp/tcode-pdf.txt | sh
This awk command will generate commands (and execute them) to rename the cryptic filename of the pdf to testcode.pdf
E.g. test-pdf.php?testid=4125 to MM123.pdf
Code:
for i in *.pdf; do
pdftotext "$i"
done
convert pdfs to txt files.
I've experimented with one test-code and the output looks very viable
Last edited by junior-helper; 12-12-2014 at 07:33 PM..
Reason: substituted "pdftotext *.pdf" with a for loop
I am trying out your code junior-helper and have gotten to the:
Code:
awk '{ A[$1]=$2; next} END { for (i in A) print "mv \x27"A[i]"\x27",i".pdf" }' tcode-pdf.txt | sh
I am getting this error:
Code:
mv: cannot stat `test-pdf.php?testid=4405': No such file or directory
mv: cannot stat `test-pdf.php?testid=4143': No such file or directory
mv: cannot stat `test-pdf.php?testid=4432': No such file or directory
mv: cannot stat `test-pdf.php?testid=4421': No such file or directory
mv: cannot stat `test-pdf.php?testid=4415': No such file or directory
mv: cannot stat `test-pdf.php?testid=4434': No such file or directory
mv: cannot stat `test-pdf.php?testid=4391': No such file or directory
all the newly created files are in a new file path:
OK, I think I know what might be the issue. In your posting #1 you said
Quote:
each new file is saved as CM080 with a FILE extention
So I'm suspecting that the new files (pdfs in this case) might have such extension too.
(Note: when I downloaded the files, neither the html files (e.g. CM080) nor the pdfs (e.g. test-pdf.php?testid=4405) had any extensions.)
If you provide the output of head -3 tcode-pdf.txt and ls C:\Users\cmccabe\Desktop\list\geneticslab.emory.edu.txt\tests2\geneticslab.emory.edu\tests | head -3 I'm sure I can tweak that awk command to behave like it was intended.
I dont want to use for loop since it is using a lot of resources especially to a thousand files. Wanting to have a while? or something will find files that has been modifed or created yesteraday. View it. And search for soemthing and save it to a certain folder.
for i in `find ./ -mtime... (3 Replies)
Hello, this is my first thread here :)
So i have a text file that contains words in each line like
abcd
efgh
ijkl
mnop
and i have 4 txt files, i want to add each line to each file, like file 1 gets abcd at the end; file 2 gets efgh at the end ....
I tried with:
cat test | while read -r... (6 Replies)
I need a hint for reading manpage (I did rtfm really) of cpio to do this task as in the headline described. I want to put all files of a certain type, lets say all *.txt files or any other format. Spread in more than hundreds of subdirectories in one directory I would like to select them and just... (3 Replies)
Hi friends,
I am pretty new to shell scripting, please help me in this Scenario.
for example, If I have one file called input.txt
once I run the script,
1.It has to delete the old input.txt and create the new input.txt (if old input.txt is not there, no offence, just it has to create a... (2 Replies)
this is what i have to find the files modified within the past 24 hours
find . -mtime -1 -type f -print0 | xargs -0 tar rvf "$archive.tar"
however i need to save/name this archive as the current date (MM-DD,YYYY.tar.gz)
how do i doo this (1 Reply)
Hello,
I have a note pad at /usr/abc location with the following content, since it is a huge file i need to split it into multiple .txt files.
A123|akdhj |21kjsdff |b212b1b21 |0
A123asdasd |assdd |asdasdsdqw|6
A123|QEWQ |NMTGHJK |zxczxczx|3
A123|GEGBGH |RTYBN ... (15 Replies)
HI All,
I am coding a shell script which will pick all the .csv files in a particular directoryand write it in to a .txt file, this .txt file i will use as a source in datastage for processing.
now after the processing is done I have to move and archive all the files in the .txt file to a... (5 Replies)
I have many types of files (Eg: *.log, *.rpt, *.txt, *.dat) in a directory. I want to display all file types except *.txt.
What is the command to display all files except "*.txt" (9 Replies)
Hi, hopefully this is a fairly simple Q&A.
I have a clean file list of approximately 180 filenames with no directory or slashes in front of the filename nor any extension or dot ".". I would like to read from this list, find these files recursively down through directory trees, copy the files... (1 Reply)