Save files in directory as txt

12-12-2014

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Save files in directory as txt

Code:

 wget -x -i link.txt

The above downloads and create unique entries for the 97 links in the text file. However, each new file is saved as CM080 with a FILE extention. Is there a way to convert each file in that directory to a .txt? The 97 files are in C:\Users\cmccabe\Desktop\list\geneticslab.emory.edu\tests.

Thank you

link.txt (3.9 KB)

Last edited by rbatte1; 12-12-2014 at 12:30 PM.. Reason: Removed some CODE tags to make general text more readable

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

12-12-2014

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Afterwards, you could run a rename in a loop. Assuming that the directory only contains the files you want, you can:-

Code:

cd target_directory
for file in *
do
   mv $file $file.txt
done

If the file names gets longer and/or the number of files increases, you may hit a limit on the length of the command line when * is expanded, so bear that in mind.

All files will be renamed, so if you have a1.file a2.file & already have a1.file.txt and a2.file.txt then results might be a little unpredictable. It may well work that it will rename a1.file to a1.file.txt and then rename the same file to be a1.file.txt.txt which might be very confusing, so make sure you start with an empty directory before you download the files and rename them.

I hope that this helps.

Robin

This User Gave Thanks to rbatte1 For This Post:

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

12-12-2014

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

That worked great... Thank you

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

12-12-2014

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hi cmccabe,

Following command may also help in same too.

Code:

find -maxdepth 1 -type f -name "*" -exec bash -c 'echo mv $0 ${0}".txt"' {} \;

You can remove echo if happy with the results.

Thanks,
R. Singh

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

12-12-2014

Registered User

344, 126

Join Date: Aug 2014

Last Activity: 28 June 2017, 4:04 PM EDT

Posts: 344

Thanks Given: 37

Thanked 126 Times in 114 Posts

You know you have just renamed a html file to a txt file, don't you?
One can hardly convert a html file to a txt file, I mean in a way that the html tags disappear (yes, you can parse it with sed, but it's not recommended)

What do you think about this (yes, it looks complicated, but it might be a way better solution)...

In the download folder;
(Make sure there are only files downloaded from link.txt, just in case...)

Code:

awk '/pdf/ {
    gsub(/^.*href = "|".*/,"",$0)
    print FILENAME,$0 >> "/tmp/tcode-pdf.txt"
    print "http://geneticslab.emory.edu/tests/"$0 >> "/tmp/list2.txt"
}' *

The above awk

extracts the "path" to the download link for the appropriate pdf file.
creates a file tcode-pdf.txt with testcode-pdfname pairs (later, this is used in the renaming process)
generates a download list

Code:

wget -x -i /tmp/list2.txt

This time, wget will download PDFs

Code:

awk '{ A[$1]=$2; next} END { for (i in A) print "mv \x27"A[i]"\x27",i".pdf" }' /tmp/tcode-pdf.txt | sh

This awk command will generate commands (and execute them) to rename the cryptic filename of the pdf to testcode.pdf
E.g. test-pdf.php?testid=4125 to MM123.pdf

Code:

for i in *.pdf; do
 pdftotext "$i"
done

convert pdfs to txt files.

I've experimented with one test-code and the output looks very viable

Last edited by junior-helper; 12-12-2014 at 07:33 PM.. Reason: substituted "pdftotext *.pdf" with a for loop

junior-helper

View Public Profile for junior-helper

Find all posts by junior-helper

12-12-2014

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

I am trying out your code junior-helper and have gotten to the:

Code:

 awk '{ A[$1]=$2; next} END { for (i in A) print "mv \x27"A[i]"\x27",i".pdf" }' tcode-pdf.txt | sh

I am getting this error:

Code:

 mv: cannot stat `test-pdf.php?testid=4405': No such file or directory
mv: cannot stat `test-pdf.php?testid=4143': No such file or directory
mv: cannot stat `test-pdf.php?testid=4432': No such file or directory
mv: cannot stat `test-pdf.php?testid=4421': No such file or directory
mv: cannot stat `test-pdf.php?testid=4415': No such file or directory
mv: cannot stat `test-pdf.php?testid=4434': No such file or directory
mv: cannot stat `test-pdf.php?testid=4391': No such file or directory

all the newly created files are in a new file path:

Code:

 C:\Users\cmccabe\Desktop\list\geneticslab.emory.edu.txt\tests2\geneticslab.emory.edu\tests

but even if I do a cd to that directory I get the same error. The code seems very useful and helpful. Thank you

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

12-12-2014

Registered User

344, 126

Join Date: Aug 2014

Last Activity: 28 June 2017, 4:04 PM EDT

Posts: 344

Thanks Given: 37

Thanked 126 Times in 114 Posts

OK, I think I know what might be the issue. In your posting #1 you said

Quote:

each new file is saved as CM080 with a FILE extention

So I'm suspecting that the new files (pdfs in this case) might have such extension too.
(Note: when I downloaded the files, neither the html files (e.g. CM080) nor the pdfs (e.g. test-pdf.php?testid=4405) had any extensions.)

If you provide the output of head -3 tcode-pdf.txt and
ls C:\Users\cmccabe\Desktop\list\geneticslab.emory.edu.txt\tests2\geneticslab.emory.edu\tests | head -3 I'm sure I can tweak that awk command to behave like it was intended.

junior-helper

View Public Profile for junior-helper

Find all posts by junior-helper

Shell Programming and Scripting

Save files in directory as txt

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Dig and concatenate all files yesterday then save it to another directory

Discussion started by: invinzin21

2. UNIX for Beginners Questions & Answers

How can i add each line from a txt file to different files in the same directory?

Discussion started by: azaiiez

3. Shell Programming and Scripting

Cpio all *.txt-files out of folders to just one directory

Discussion started by: 1in10

4. Shell Programming and Scripting

Get the input from user and save it as .txt file

Discussion started by: Padmanabhan

5. Shell Programming and Scripting

I need to back up a bunch of files on a directory and save that file as the current date....

Discussion started by: bugenhagen_

6. Shell Programming and Scripting

Pattern search and save it as .txt file with some name..

Discussion started by: j_panky

7. Shell Programming and Scripting

moving the files in a.txt files to a different directory

Discussion started by: subhasri_2020

8. Shell Programming and Scripting

Checking if the files in a directory have a txt extension

Discussion started by: pantelis

9. UNIX for Dummies Questions & Answers

List all files except *.txt in a directory

Discussion started by: apsprabhu

10. Shell Programming and Scripting

Read from fileList.txt, copy files from directory tree

Discussion started by: fxvisions