Hundreds of files need manual preparation. Does shell script could do it automatically?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Hundreds of files need manual preparation. Does shell script could do it automatically?
# 1  
Old 06-01-2015
Hundreds of files need manual preparation. Does shell script could do it automatically?

Hello friends,

I have hundreds files in hand, which need extract some data from logs and read these data into an input file.
Here I will explain in detail using these two files as attached. read some data from .log file and write it into the .in file.
**explanation is given inside two stars**

Here shows the final .in file:
Code:
Title of the job
feco4_s **it needs to be replaced by the name of log file**
Important Comment: The number of lines in the input file, and the
   *exact* typography for text input needs to be *rigorously* followed !!!!!!!!!
Temperatures (Next two lines: "List", or "Interval", then no of temps.)
   List
   1
Values (If "Interval", give start and finish temp.)
  298.15
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
 -1.
Calculated properties of the species (e.g. from Gaussian output)
Electronic Energy
 -1715.8231094  **grep "SCF Done:" will get this number, but always use the number from the last line. there could be a lot of these matches **
Nature of species (Atom = 0, Linear = 1, or General = 3)
  3
Rotational factor (symmetry factor)
  2. **grep "Rotational symmetry number"; all the matches will give the same number, use the number here and put a dot at the end**
Electronic degeneracy (multiplicity)
  1. **grep "Multiplicity"; all the matches will give the same number, use the number here and put a dot at the end**
Molecular Weight (amu)
167.91460 ** grep "Molecular mass" **
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761 **grep "     Eigenvalues -- "; here is very tricky; three numbers are given without any space, 
which means we need extract three numbers from a string of numbers and dots. the number are always with five digits follows the each dot, 
no matter how many digits in the integer parts**
Vibrational Frequencies (first line: number; then freqs. in cm-1)
  21 ** grep " Frequencies -- "; will show a list of numbers; 
remove the first column of text and put all the numbers from three columns into a single column. 
it doesn't matter what the sequence is. the number in the first line is the 'total number' of the numbers below**
  45.7886
  96.4952
 353.2741
 427.8395
 499.7992
 640.4272
2062.0748
  66.7894
  96.9991
 363.5129
 430.9072
 537.9941
 640.7995
2062.6585
  79.7836
 335.4020
 422.7705
 448.4787
 595.3209
2042.2981
2151.4066

I understand this is non trivial work to do it with bash. probably it's better to do it using python or some other advanced programming languages. However, I don't know any of them. I have done some preparations to search these data from log and manually put these data into the .in file. However, there are still hundreds of them waiting to be prepared.
If you could give me some help on this point, it would be save me from desperation. I thank you very much for your kind help in advance! All your help will be much appreciated! Thanks.
Zhen
# 2  
Old 06-01-2015
This might be the long way of getting it done. But it seems to work.
You can put it into a for loop if you need to process each value.

Code:
cat feco4_s.log.txt | grep -v '[A-Z]' | grep -v '[a-z]' | grep '[0-9]\.[0-9]'

# 3  
Old 06-01-2015
This is as far as I get on first attempt; use it as a template to be enhanced. For the correct order of lines of information, it might be worthwhile to create a template form to be read and then filled in from the log files.
Code:
awk '
FNR==1                  {gsub (/^.*\/|\..*$/, "", FILENAME)
                         print FILENAME 
                        }
/SCF Done:/             {print "Electronic Energy"
                         print $5
                        }
/Molecular mass/        {print "Molecular Weight (amu)"
                         print $(NF-1)
                        }
/Temperature/           {print "Values (If \"Interval\", give start and finish temp.)"
                         print $3
                        }
/Pressure/              {print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
                         print ($NF=="Atm."?"-":"") $(NF-1)
                        }
/Multiplicity/          {print "Electronic degeneracy (multiplicity)"
                         print $NF "."
                        }
/Rotational.*ber/       {print "Rotational factor (symmetry factor)"
                         print $NF
                        }
/Frequencies/           {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}   
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
                         printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
                                $3, $4, $5
                        }
END                     {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
                         print 3 * FC
                         for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
                        }    
' /tmp/feco4_s.log.txt

resulting in
Code:
feco4_s
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Values (If "Interval", give start and finish temp.)
Kelvin.
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Rotational factor (symmetry factor)
2.
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066

This User Gave Thanks to RudiC For This Post:
# 4  
Old 06-01-2015
Super-intelligent!!! Thanks a million! It works like a charm! Especially, reading the three numbers I tried many times without any success.
Could you please explain a little bit on "
Code:
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")}

"? It's genius!

---------- Post updated at 02:34 AM ---------- Previous update was at 01:54 AM ----------

Quote:
Originally Posted by RudiC
This is as far as I get on first attempt; use it as a template to be enhanced. For the correct order of lines of information, it might be worthwhile to create a template form to be read and then filled in from the log files.
Code:
awk '
FNR==1                  {gsub (/^.*\/|\..*$/, "", FILENAME)
                         print FILENAME 
                        }
/SCF Done:/             {print "Electronic Energy"
                         print $5
                        }
/Molecular mass/        {print "Molecular Weight (amu)"
                         print $(NF-1)
                        }
/Temperature/           {print "Values (If \"Interval\", give start and finish temp.)"
                         print $3
                        }
/Pressure/              {print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
                         print ($NF=="Atm."?"-":"") $(NF-1)
                        }
/Multiplicity/          {print "Electronic degeneracy (multiplicity)"
                         print $NF "."
                        }
/Rotational.*ber/       {print "Rotational factor (symmetry factor)"
                         print $NF
                        }
/Frequencies/           {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}   
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
                         printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
                                $3, $4, $5
                        }
END                     {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
                         print 3 * FC
                         for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
                        }    
' /tmp/feco4_s.log.txt

resulting in
Code:
feco4_s
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Values (If "Interval", give start and finish temp.)
Kelvin.
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Rotational factor (symmetry factor)
2.
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066

Hi RudiC, thanks for the smart code to help me out of heavy manual work. It's only one place does'n fit to the format of the input. That is multiple matches for /SCF Done/ will be presented in the log file, and only the last match is the correct value. Here, the script reads all of them (two of them, although the same value here, but not necessarily equal for other case). Would you please help me to fix it? Many thanks for your great job!

Last edited by Scrutinizer; 06-01-2015 at 03:12 PM.. Reason: codE tags
# 5  
Old 06-01-2015
As I said, that's an incomplete proposal only, to be expanded. I've run out of stamina a bit.

Quote:
Originally Posted by liuzhencc
.
.
.
Could you please explain a little bit on "
Code:
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")}

"? It's genius!
This one is looking for n (literal dot and five digits), and adds a space (FS) after each. If $0 is modified, it's evaluated again so new fields $4 and $5 are created, which then are printed. Should $4 and $5 already exists because integer parts are smaller and space padded, it doesn't hurt as (and if) many awk use multiple FS as one per default.


---------- Post updated at 02:34 AM ---------- Previous update was at 01:54 AM ----------



Quote:
.
.
.
only the last match is the correct value.
.
.
.
To eliminate those duplicates would lead us away from immediate printing to parsing and storing the desired values and then print them in the end section. That would be the major part of implementing of what I mentioned, the template form file to be filled in with actual data. May take a while - no promises.
# 6  
Old 06-01-2015
Quote:
Originally Posted by RudiC
To eliminate those duplicates would lead us away from immediate printing to parsing and storing the desired values and then print them in the end section. That would be the major part of implementing of what I mentioned, the template form file to be filled in with actual data. May take a while - no promises.
It already have helped me a lot! Thanks!
I have tried to split the awk from one to three, which means we possibly could insert an individual awk to read the last value of "SCF Done:" like
"
Code:
awk '/SCF D/{foundstring=$5} END{print foundstring}'

"

However, the other two parts dead with unexpected outputs.

Code:
awk '
FNR==1                  {gsub (/^.*\/|\..*$/, "", FILENAME)
                         print "Title of the job"
                         print FILENAME
                         print "Important Comment: The number of lines in the input file, and the"
                         print "*exact* typography for text input needs to be *rigorously* followed!!"
                         print "Temperatures (Next two lines: \"List\", or \"Interval\", then no of temps.)"
                         print "  List"
                         print "  1"
                         print "Values (If \"Interval\", give start and finish temp.)"
                         print "  298.15"
                         print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
                         print " -1."
                         print "Calculated properties of the species (e.g. from Gaussian output)"
                         print "Electronic Energy"
                        }' feco4_s.log
awk '/SCF Done:/        {foundstring=$5} END{print foundstring}' feco4_s.log
awk '                   {print "Nature of species (Atom = 0, Linear = 1, or General = 3)"
                         print "  3"
                        }
/Rotational.*ber/       {print "Rotational factor (symmetry factor)"
                         print $NF
                        }
/Multiplicity/          {print "Electronic degeneracy (multiplicity)"
                         print $NF "."
                        }
/Molecular mass/        {print "Molecular Weight (amu)"
                         print $(NF-1)
                        }
/Frequencies/           {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
                         printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
                                $3, $4, $5
                        }
END                     {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
                         print 3 * FC
                         for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
                        }
' feco4_s.log

Thanks again for your effort and your kindness!
# 7  
Old 06-01-2015
DON'T! That kills the logics.

Given we have a form like
Code:
Title of the job
FNMV
Important Comment: The number of lines in the input file, and the
   *exact* typography for text input needs to be *rigorously* followed !!!!!!!!!
Temperatures (Next two lines: "List", or "Interval", then no of temps.)
   List
   1
Values (If "Interval", give start and finish temp.)
TEMPV
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
PRV
Calculated properties of the species (e.g. from Gaussian output)
Electronic Energy
ELENV
Nature of species (Atom = 0, Linear = 1, or General = 3)
  3
Rotational factor (symmetry factor)
ROTFACTV
Electronic degeneracy (multiplicity)
MULTIV
Molecular Weight (amu)
MOLWEIV
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
MOMINV
Vibrational Frequencies (first line: number; then freqs. in cm-1)
VIBFREQV

produced from your desired output, this

Code:
awk '
                                                # parse form file, fill in values found in data file
                                                # do this first so we don''t overwrite values by accident should a pattern fit
/FNMV/                  {print FNMV; next} 
/TEMPV/                 {print TEMPV; next}
/PRV/                   {print PRV; next}  
/ELENV/                 {print ELENV; next}   
/ROTFACTV/              {print ROTFACTV; next}
/MULTIV/                {print MULTIV; next} 
/MOLWEIV/               {print MOLWEIV; next}
/MOMINV/                {for (i=1; i<=3; i++)  print MOMINV[i]; next}
/VIBFREQV/              {print 3 * FC
                         for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
                         next}

FNR != NR               {print; next}

                                                # parse data file, save relevant fields
NR==1                   {FNMV=FILENAME
                         gsub (/^.*\/|\..*$/, "", FNMV)
                        }
/SCF Done:/             {ELENV = $5}
/Molecular mass/        {MOLWEIV = $(NF-1)}
/ Temp.*Kelvin/         {TEMPV = $2}
/Kelvin.*Pressure/      {PRV = ($NF=="Atm."?"-":"") $(NF-1)}
/Multiplicity/          {MULTIV = $NF "."}
/Rotational.*ber/       {ROTFACTV = $NF}
/Frequencies --/        {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}   
/Eigenvalues -- /       {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
                         for (i=1; i<=3; i++) MOMINV[i] = $(i+2)
                        }

' /tmp/feco4_s.log.txt templform

will yield (pretty close to what you requested?):

Code:
Title of the job
feco4_s
Important Comment: The number of lines in the input file, and the
   *exact* typography for text input needs to be *rigorously* followed !!!!!!!!!
Temperatures (Next two lines: "List", or "Interval", then no of temps.)
   List
   1
Values (If "Interval", give start and finish temp.)
298.150
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Calculated properties of the species (e.g. from Gaussian output)
Electronic Energy
-1715.82310939
Nature of species (Atom = 0, Linear = 1, or General = 3)
  3
Rotational factor (symmetry factor)
2.
Electronic degeneracy (multiplicity)
1.
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066

Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to add code to hundreds of .html files

Need assistance to add code to hundreds of .html Code will look like below and needs to be added below <html> tag: <script> Some .js code here </script> This will be used in Fedora release 7 (Moonshine). I will appreciate any type of help and/or orientation. Thank you! (4 Replies)
Discussion started by: Ferocci
4 Replies

2. Shell Programming and Scripting

Best way to connect to hundreds of nodes and grep log files

Hi, What will be the best way to connect (ssh) to hundreds of nodes and grep log files parallely from shell. Using for loop seems to be sequential. Are there any shell built in construct which could be used to achieve this? Is the sub shell any good here? (1 Reply)
Discussion started by: agent001
1 Replies

3. Solaris

Renaming hundreds of files at the same time

Hi, I am having issues trying to figure out how to rename files like this: TEST1_B.tt To SQP_CAN_B.tt I have hundreds of files like those, I need to rename them automatically. Any help will be greatly appreciated. Thanks, (5 Replies)
Discussion started by: ocramas
5 Replies

4. Shell Programming and Scripting

Shell script to automatically download files

I am new to shell scripting and need to write a program to copy files that are posted as links on a specific url. I want all the links copied with the same file name and the one posted on the webpage containing the url onto a specific directory. That is the first part. The second part of the script... (2 Replies)
Discussion started by: libertyforall
2 Replies

5. Shell Programming and Scripting

automatically format Shell Script (bash)

Hi In TOAD I can write SQL code, then select the SQL code -> Menu Edit -> Format Code The output is well formatted code (correct indent, ...) Is there a tool (for Windows and/or UNIX) what can do the same for bash code? TOAD (software) - Wikipedia, the free encyclopedia (1 Reply)
Discussion started by: slashdotweenie
1 Replies

6. Shell Programming and Scripting

Shell script to invoke options automatically

i have a script which has 2 options. a b And a has 6 sub options. i want to write a script which will call the parent script and give options automatically. examle: linasplg11:/opt/ss/kk/01.00/bin # startup.sh /opt/ss/rdm/01.00 Please select the component to... (2 Replies)
Discussion started by: Aditya.Gurgaon
2 Replies

7. Shell Programming and Scripting

how to run shell script automatically

hi , i m trying to run bash scrip automaticially but i dont know how i can do this an anybody tell me how i can autorun shell script when i logon . thanks (9 Replies)
Discussion started by: tahir23
9 Replies

8. Shell Programming and Scripting

Script to automatically check ports in shell?

Good day, I'm new to linux environment...Is there any scripts available for me to check ports (lets say port 80 and 21) through shell with just a single commandline? Any response is very much appreciated.. thanks (4 Replies)
Discussion started by: arsonist
4 Replies
Login or Register to Ask a Question