G/AWK and ksh troubles


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting G/AWK and ksh troubles
# 1  
Old 12-06-2011
G/AWK and ksh troubles

Alright, so I've been banging my head against the wall for the past 7 hours trying to figure this out Smilie. What I'm trying to do is "unwrap" periodic coordinates from a molecular simulation to put them back in their unit cell box. I've accomplished that little bit of magic easily enough, but I have to change the Unit Cell Vectors manually each time I want to run the program (see code below):
Code:
cat $project.xyz | gawk '
{ getline
    natoms = $1
        system("rm unwrapcoord")
    #print "Number of atoms/lines is: " , natoms
        getline
        for (i = 1; i <= natoms; ++i) {
                getline
                #print $0
                if ($2 > 26.680)
                        while ($2 > 26.680) {
                                $2 -= 26.680
                } else  while ($2 < 0) {
                                $2 += 26.680
                }
                if ($3 > 25.5422)
                        while ( $3 >= 25.5422 ) {
                                $3 -= 25.5422
                } else while ($3 < 0) {
                                $3 += 25.5422
                }
                if ($4 > 31.6148)
                        while ($4 >= 31.6148) {
                                $4 -= 31.6148
                } else while ($4 < 0) {
                                $4 += 31.6148
                }
                system("touch unwrapcoord")
                print $1, $2, $3, $4 >> "unwrapcoord"
                #print "New X: ", $2
                #print "New Y: ", $3
                #print "New Z: ", $4
        }
}'

And here is the first few lines of $project.xyz:

Code:
2250
$project.cif
C          4.68047        2.79457        0.68952
C          7.15344        2.79202        0.70501
C          9.58372        2.78180        0.67751
C         11.98973        2.76648        0.63293
....

What I am trying to do is set it up to where the shell script (or gawk, I'm not picky) reads the original CIF file to extract the Unit Cell Vectors and make everything automatic/streamlined. I can get the Unit Vectors into my ksh script using:
Code:
head -15 $project.cif | gawk '/_cell_length_a/ {print $2}'

but I can't seem to find a way to send this variable to GAWK or have gawk generate it on its own without messing up the rest of the unwrapping code.

Again, I've been at this for hours, and am hoping someone out there has got some ideas. Thank you to anyone that can help.

---------- Post updated at 02:33 AM ---------- Previous update was at 02:19 AM ----------

I know KSH has a "while read line" function built into it. But I couldn't find much information on it, and my Advisor told me that AWK is easier for parsing a file even if it is a bit slower in the math department. But I'm not opposed to writing everything in KSH using read line. So if that's an option, I'm ok with that. I just didn't know how to even try to use it.
# 2  
Old 12-06-2011
If yuo want to pass a variable to gawk, you can do it this way...
Code:
gawk -v var="somevalue" '{print var}' ...

I didn't go thru your code... Seems you are stcuk with this issue of passing the variable to awk.

Let us know if you need anything more...

--ahamed
# 3  
Old 12-06-2011
Tried That

I did that before, and it transfers the values, but the rest of the GAWK script messes up. The Getline function doesn't grab the first row from my concatenated coordinates. My code:
Code:
#!/bin/ksh
project=${project}
ucva=`head -15 ${project}.cif | gawk ' /_cell_length_a/ {print $2}'`
ucvb=`head -15 ${project}.cif | gawk ' /_cell_length_b/ {print $2}'`
ucvc=`head -15 ${project}.cif | gawk ' /_cell_length_c/ {print $2}'`
echo "From KSH: " $ucva $ucvb $ucvc

cat test.xyz | gawk -v UCA="$ucva" -v UCB="$ucvb" -v UCC="$ucvc" '
{       print "From GAWK script: ", UCA, UCB, UCC
        getline
        natoms = $1
        system("touch unwrapcoord")
        system("rm unwrapcoord")
        print "Number of atoms/lines is: " , natoms
        getline
        for (i = 1; i <= natoms; ++i) {
                getline
                #print $0
                if ($2 > UCA) while ($2 > UCA) {
                        $2 -= UCA
                } else  while ($2 < 0) {
                        $2 += UCA
                }
                if ($3 > UCB) while ( $3 >= UCB ) {
                        $3 -= UCB
                } else while ($3 < 0) {
                        $3 += UCB
                }
                if ($4 > UCC) while ($4 >= UCC) {
                        $4 -= UCC
                } else while ($4 < 0) {
                        $4 += UCC
                }
                system("touch unwrapcoord")
                print $1, $2, $3, $4 >> "unwrapcoord"
                #print "New X: ", $2
                #print "New Y: ", $3
                #print "New Z: ", $4
        }
}'

And here is test coordinates I'm using (running tests on 2k+ coords was taking too long so made a small sample of test data):
Code:
13
test.xyz
C         29.79490     -219.04889       16.97810
C         29.68951     -218.11404       15.73500
O         30.71856     -217.45837       15.41253
C         29.69085     -218.15799       18.19906
C         30.07130     -218.62616       19.60655
C         31.54004     -219.08261       19.78486
O         32.36285     -218.16895       20.00300
O         31.86954     -220.26163       19.36312
H         28.94621     -219.77812       16.99074
H         30.43015     -217.42186       17.95341
O        188.97899     -299.12930      -35.16546
H        188.52623     -298.39215      -34.75984
H        188.99127     -298.92932      -36.14583

But when I run the script, getline grabs the text line instead of the first line. Output from script:
Code:
dec014@linux:~/Desktop/CP2KINPUT_SCRIPT/CP2KINPUT_V2> ./ksh.tst project
From KSH:  26.6800 25.5422 31.6148
From GAWK script: 26.6800 25.5422 31.6148
Number of atoms/lines is:  test.xyz
^C

And then it just goes into an infinite loop in the calculation phase when it hits the last coordinate. I knew I tried that last night, and I knew there was a reason for me banging my head against the wall for hours on end. Because I read on this forum about passing variable and was pretty sure I tried that. Is there a way to alter the second part of my GAWK script to behave the way it's supposed to?
# 4  
Old 12-06-2011
Try this...
Code:
...
gawk -v UCA="$ucva" -v UCB="$ucvb" -v UCC="$ucvc" '
/^[COH]/{ 
        print "From GAWK script: ", UCA, UCB, UCC
        #getline << you dont need this
        natoms = $1
        system("touch unwrapcoord")
        system("rm unwrapcoord")
        print "Number of atoms/lines is: "
        ...
}' test.xyz

HTH
--ahamed

Last edited by ahamed101; 12-06-2011 at 01:40 PM.. Reason: Updated the code!
# 5  
Old 12-06-2011
You could replace this:
Code:
                if ($2 > UCA) while ($2 > UCA) {
                        $2 -= UCA
                } else  while ($2 < 0) {
                        $2 += UCA
                }
                if ($3 > UCB) while ( $3 >= UCB ) {
                        $3 -= UCB
                } else while ($3 < 0) {
                        $3 += UCB
                }
                if ($4 > UCC) while ($4 >= UCC) {
                        $4 -= UCC
                } else while ($4 < 0) {
                        $4 += UCC
                }

with this
Code:
                $2=$2%UCA<0||$2>0&&$2%UCA==0 ? UCA+$2%UCA : $2%UCA
                $3=$3%UCB<0||$3==UCB         ? UCB+$3%UCB : $3%UCB
                $4=$4%UCC<0||$4==UCC         ? UCC+$4%UCC : $4%UCC

The would be much more efficient when UC* are small and $1-$3 are large.
# 6  
Old 12-08-2011
Alright, I've got 2 questions...

@Ahamed101
Why did my script work before without the Cell Vector Variables with that extra Getline code?

@ Chubler_XL
And I'm all about making programs more efficient. The Unit Cell Vectors are at most 10-20 times smaller than the coordinates. So would your code still be efficient with only an order of magnitude difference? And I've only had basic programming classes, most of the what I know is self taught. So, can you explain what your code does and why it is more efficient?
# 7  
Old 12-08-2011
It should be close to 5 times faster. With the existing code you keep subtracting the Unit Cell value until the coordinates is within range (10 to 20) separate subtractions.

The updated code uses the modulo function (%) to calculate the remainder after dividing by the Unit Cell Value. So basically it one or two division operations instead of lots of subtract or add operations.

The if-then-else operator <expr>?<true value>:<false value> deals with the border conditions (e.g. negative values wrapping to positive and edge values ending up as either zero or the Unit Cell value).

$1 is different to $2 and $3 because in your original code you have a > test for $1 and >= for $2 and $3
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ksh passing to awk multiple dyanamic variables awk -v

Using ksh to call a function which has awk script embedded. It parses a long two element list file, filled with text numbers (I want column 2, beginning no sooner than line 45, that's the only known thing) . It's unknown where to start or end the data collection, dynamic variables will be used. ... (1 Reply)
Discussion started by: highnthemnts
1 Replies

2. UNIX for Advanced & Expert Users

Troubles with OpenSSH

Hi, I am trying to login from one AIX server to another without using a password, a basic configuration, however it doesn't seem to work. All things are in place. I have both a public and private key in the ~/.ssh folder and also have an "authorized_keys" file on the target-server containing... (5 Replies)
Discussion started by: Hille
5 Replies

3. Shell Programming and Scripting

awk and tr troubles

I am trying to make all the fields containing lower case letters upper case and the third field of a file display ** instead. I have this: awk '{printf "%s %s ** %d %d\n", $1, $2, $4, $5}' database.txt | tr '' '' < database.txt And that only changes it to upper case, other... (4 Replies)
Discussion started by: Bungkai
4 Replies

4. UNIX for Dummies Questions & Answers

Cron troubles

I am aware this question has been answered time and again. I feel I have tried everything I have seen on the net and really need help to get this working. Same old story. Shell script, working from command but not from cron. I need my script to take values from a .properties file. Tried... (2 Replies)
Discussion started by: airalpha
2 Replies

5. Shell Programming and Scripting

Encoding troubles

Hello All I have a set of files, each one containing some lines that follows that regex: regex='disabled\,.*\,\".*\"'and here is what file says about each files: file <random file> <random file> ASCII text, with CRLF line terminatorsSo, as an example, here is what a file ("Daffy Duck - The... (3 Replies)
Discussion started by: tukuyomi
3 Replies

6. HP-UX

cron troubles

I have a cronjob that I need to run everyday and it needs to have todays date inputed, here is what I have, but is not working as expected.......... 23 02 * * * cd /path;./RequestSummaryReport.sh $(date +%Y-%m-%d) the output from mail gives me............. Date: Fri, 8 Feb 2008 02:12:07... (4 Replies)
Discussion started by: theninja
4 Replies

7. UNIX for Dummies Questions & Answers

ssh2 troubles

I'm trying to set up a secure and trusted connection between 2 boxes running solaris using ssh2. I've run ssh-keygen2 on the local box and on the remote box, created the identification file ( IdKey id_dsa_2048_a ) on the local machine and copied across the public key file from the local to... (5 Replies)
Discussion started by: PaulC
5 Replies

8. Programming

Troubles with HPUX

Hello I created an application in c language for HP-UX operative system,and it runs on a 32 bits PARISC processor. My problem is that I have to run this same application but now in a 64 bits Parisc processor. But I am not able to compile the application with the 64 bit server, and I only could use... (1 Reply)
Discussion started by: masterboy6666
1 Replies

9. UNIX for Dummies Questions & Answers

compariosn troubles...

Hi Guys, I am trying to compare using if, but keep getting some strange results. if ; then keeps creating the file 1 if ; then does not work at all if ; then does not work if ; then does not work if ; then does not work eihter. I am using a ksh, on Solaris (9 Replies)
Discussion started by: jagannatha
9 Replies

10. Programming

compiling troubles

i keep getting the following error with the code segment below when i try to compile the program. The code is from 'defs.h' parse error before '(' parse error before ')' stray '\' in program this is the code segment and the error is on the second line of the segment #define... (1 Reply)
Discussion started by: token
1 Replies
Login or Register to Ask a Question