Hundreds of files need manual preparation. Does shell script could do it automatically?
Hello friends,
I have hundreds files in hand, which need extract some data from logs and read these data into an input file.
Here I will explain in detail using these two files as attached. read some data from .log file and write it into the .in file. **explanation is given inside two stars**
Here shows the final .in file:
I understand this is non trivial work to do it with bash. probably it's better to do it using python or some other advanced programming languages. However, I don't know any of them. I have done some preparations to search these data from log and manually put these data into the .in file. However, there are still hundreds of them waiting to be prepared.
If you could give me some help on this point, it would be save me from desperation. I thank you very much for your kind help in advance! All your help will be much appreciated! Thanks.
Zhen
This is as far as I get on first attempt; use it as a template to be enhanced. For the correct order of lines of information, it might be worthwhile to create a template form to be read and then filled in from the log files.
Code:
awk '
FNR==1 {gsub (/^.*\/|\..*$/, "", FILENAME)
print FILENAME
}
/SCF Done:/ {print "Electronic Energy"
print $5
}
/Molecular mass/ {print "Molecular Weight (amu)"
print $(NF-1)
}
/Temperature/ {print "Values (If \"Interval\", give start and finish temp.)"
print $3
}
/Pressure/ {print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
print ($NF=="Atm."?"-":"") $(NF-1)
}
/Multiplicity/ {print "Electronic degeneracy (multiplicity)"
print $NF "."
}
/Rotational.*ber/ {print "Rotational factor (symmetry factor)"
print $NF
}
/Frequencies/ {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}
/Eigenvalues -- / {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
$3, $4, $5
}
END {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
print 3 * FC
for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
}
' /tmp/feco4_s.log.txt
resulting in
Code:
feco4_s
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Values (If "Interval", give start and finish temp.)
Kelvin.
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Rotational factor (symmetry factor)
2.
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066
Super-intelligent!!! Thanks a million! It works like a charm! Especially, reading the three numbers I tried many times without any success.
Could you please explain a little bit on "
---------- Post updated at 02:34 AM ---------- Previous update was at 01:54 AM ----------
Quote:
Originally Posted by RudiC
This is as far as I get on first attempt; use it as a template to be enhanced. For the correct order of lines of information, it might be worthwhile to create a template form to be read and then filled in from the log files.
Code:
awk '
FNR==1 {gsub (/^.*\/|\..*$/, "", FILENAME)
print FILENAME
}
/SCF Done:/ {print "Electronic Energy"
print $5
}
/Molecular mass/ {print "Molecular Weight (amu)"
print $(NF-1)
}
/Temperature/ {print "Values (If \"Interval\", give start and finish temp.)"
print $3
}
/Pressure/ {print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
print ($NF=="Atm."?"-":"") $(NF-1)
}
/Multiplicity/ {print "Electronic degeneracy (multiplicity)"
print $NF "."
}
/Rotational.*ber/ {print "Rotational factor (symmetry factor)"
print $NF
}
/Frequencies/ {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}
/Eigenvalues -- / {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
$3, $4, $5
}
END {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
print 3 * FC
for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
}
' /tmp/feco4_s.log.txt
resulting in
Code:
feco4_s
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Electronic degeneracy (multiplicity)
1.
Electronic Energy
-1715.82310939
Values (If "Interval", give start and finish temp.)
Kelvin.
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Rotational factor (symmetry factor)
2.
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066
Hi RudiC, thanks for the smart code to help me out of heavy manual work. It's only one place does'n fit to the format of the input. That is multiple matches for /SCF Done/ will be presented in the log file, and only the last match is the correct value. Here, the script reads all of them (two of them, although the same value here, but not necessarily equal for other case). Would you please help me to fix it? Many thanks for your great job!
Last edited by Scrutinizer; 06-01-2015 at 03:12 PM..
Reason: codE tags
This one is looking for n (literal dot and five digits), and adds a space (FS) after each. If $0 is modified, it's evaluated again so new fields $4 and $5 are created, which then are printed. Should $4 and $5 already exists because integer parts are smaller and space padded, it doesn't hurt as (and if) many awk use multiple FS as one per default.
---------- Post updated at 02:34 AM ---------- Previous update was at 01:54 AM ----------
Quote:
.
.
.
only the last match is the correct value.
.
.
.
To eliminate those duplicates would lead us away from immediate printing to parsing and storing the desired values and then print them in the end section. That would be the major part of implementing of what I mentioned, the template form file to be filled in with actual data. May take a while - no promises.
To eliminate those duplicates would lead us away from immediate printing to parsing and storing the desired values and then print them in the end section. That would be the major part of implementing of what I mentioned, the template form file to be filled in with actual data. May take a while - no promises.
It already have helped me a lot! Thanks!
I have tried to split the awk from one to three, which means we possibly could insert an individual awk to read the last value of "SCF Done:" like
"
However, the other two parts dead with unexpected outputs.
Code:
awk '
FNR==1 {gsub (/^.*\/|\..*$/, "", FILENAME)
print "Title of the job"
print FILENAME
print "Important Comment: The number of lines in the input file, and the"
print "*exact* typography for text input needs to be *rigorously* followed!!"
print "Temperatures (Next two lines: \"List\", or \"Interval\", then no of temps.)"
print " List"
print " 1"
print "Values (If \"Interval\", give start and finish temp.)"
print " 298.15"
print "Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)"
print " -1."
print "Calculated properties of the species (e.g. from Gaussian output)"
print "Electronic Energy"
}' feco4_s.log
awk '/SCF Done:/ {foundstring=$5} END{print foundstring}' feco4_s.log
awk ' {print "Nature of species (Atom = 0, Linear = 1, or General = 3)"
print " 3"
}
/Rotational.*ber/ {print "Rotational factor (symmetry factor)"
print $NF
}
/Multiplicity/ {print "Electronic degeneracy (multiplicity)"
print $NF "."
}
/Molecular mass/ {print "Molecular Weight (amu)"
print $(NF-1)
}
/Frequencies/ {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}
/Eigenvalues -- / {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
printf "Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)\n%s\n%s\n%s\n",
$3, $4, $5
}
END {print "Vibrational Frequencies (first line: number; then freqs. in cm-1)"
print 3 * FC
for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
}
' feco4_s.log
Title of the job
FNMV
Important Comment: The number of lines in the input file, and the
*exact* typography for text input needs to be *rigorously* followed !!!!!!!!!
Temperatures (Next two lines: "List", or "Interval", then no of temps.)
List
1
Values (If "Interval", give start and finish temp.)
TEMPV
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
PRV
Calculated properties of the species (e.g. from Gaussian output)
Electronic Energy
ELENV
Nature of species (Atom = 0, Linear = 1, or General = 3)
3
Rotational factor (symmetry factor)
ROTFACTV
Electronic degeneracy (multiplicity)
MULTIV
Molecular Weight (amu)
MOLWEIV
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
MOMINV
Vibrational Frequencies (first line: number; then freqs. in cm-1)
VIBFREQV
produced from your desired output, this
Code:
awk '
# parse form file, fill in values found in data file
# do this first so we don''t overwrite values by accident should a pattern fit
/FNMV/ {print FNMV; next}
/TEMPV/ {print TEMPV; next}
/PRV/ {print PRV; next}
/ELENV/ {print ELENV; next}
/ROTFACTV/ {print ROTFACTV; next}
/MULTIV/ {print MULTIV; next}
/MOLWEIV/ {print MOLWEIV; next}
/MOMINV/ {for (i=1; i<=3; i++) print MOMINV[i]; next}
/VIBFREQV/ {print 3 * FC
for (i=3; i<=5; i++) for (j=1; j<=FC; j++) print FR[j,i]
next}
FNR != NR {print; next}
# parse data file, save relevant fields
NR==1 {FNMV=FILENAME
gsub (/^.*\/|\..*$/, "", FNMV)
}
/SCF Done:/ {ELENV = $5}
/Molecular mass/ {MOLWEIV = $(NF-1)}
/ Temp.*Kelvin/ {TEMPV = $2}
/Kelvin.*Pressure/ {PRV = ($NF=="Atm."?"-":"") $(NF-1)}
/Multiplicity/ {MULTIV = $NF "."}
/Rotational.*ber/ {ROTFACTV = $NF}
/Frequencies --/ {FC++;for (i=3; i<=5; i++) FR[FC,i]=$i}
/Eigenvalues -- / {gsub (/\.[0-9][0-9][0-9][0-9][0-9]/,"& ")
for (i=1; i<=3; i++) MOMINV[i] = $(i+2)
}
' /tmp/feco4_s.log.txt templform
will yield (pretty close to what you requested?):
Code:
Title of the job
feco4_s
Important Comment: The number of lines in the input file, and the
*exact* typography for text input needs to be *rigorously* followed !!!!!!!!!
Temperatures (Next two lines: "List", or "Interval", then no of temps.)
List
1
Values (If "Interval", give start and finish temp.)
298.150
Pressure (in atm or bar - if given in atm, enter as a negative number. -1 is the G09 default)
-1.00000
Calculated properties of the species (e.g. from Gaussian output)
Electronic Energy
-1715.82310939
Nature of species (Atom = 0, Linear = 1, or General = 3)
3
Rotational factor (symmetry factor)
2.
Electronic degeneracy (multiplicity)
1.
Molecular Weight (amu)
167.91460
Moments of Inertia (in amu x au2; none for Atom, 1 for Linear, 3 for General)
1284.73849
1536.05333
2170.19761
Vibrational Frequencies (first line: number; then freqs. in cm-1)
21
45.7886
96.4952
353.2741
427.8395
499.7992
640.4272
2062.0748
66.7894
96.9991
363.5129
430.9072
537.9941
640.7995
2062.6585
79.7836
335.4020
422.7705
448.4787
595.3209
2042.2981
2151.4066
Need assistance to add code to hundreds of .html
Code will look like below and needs to be added below <html> tag:
<script>
Some .js code here
</script>
This will be used in Fedora release 7 (Moonshine).
I will appreciate any type of help and/or orientation.
Thank you! (4 Replies)
Hi, What will be the best way to connect (ssh) to hundreds of nodes and grep log files parallely from shell. Using for loop seems to be sequential. Are there any shell built in construct which could be used to achieve this? Is the sub shell any good here? (1 Reply)
Hi,
I am having issues trying to figure out how to rename files like this:
TEST1_B.tt
To
SQP_CAN_B.tt
I have hundreds of files like those, I need to rename them automatically.
Any help will be greatly appreciated.
Thanks, (5 Replies)
I am new to shell scripting and need to write a program to copy files that are posted as links on a specific url. I want all the links copied with the same file name and the one posted on the webpage containing the url onto a specific directory. That is the first part. The second part of the script... (2 Replies)
Hi
In TOAD I can write SQL code, then select the SQL code
-> Menu Edit -> Format Code
The output is well formatted code (correct indent, ...)
Is there a tool (for Windows and/or UNIX) what can do the same for bash code?
TOAD (software) - Wikipedia, the free encyclopedia (1 Reply)
i have a script which has 2 options.
a
b
And a has 6 sub options.
i want to write a script which will call the parent script and give options automatically.
examle:
linasplg11:/opt/ss/kk/01.00/bin # startup.sh
/opt/ss/rdm/01.00
Please select the component to... (2 Replies)
hi ,
i m trying to run bash scrip automaticially but i dont know how i can do this an anybody tell me how i can autorun shell script when i logon .
thanks (9 Replies)
Good day,
I'm new to linux environment...Is there any scripts available for me to check ports (lets say port 80 and 21) through shell with just a single commandline?
Any response is very much appreciated..
thanks (4 Replies)