Help with AWK and Scripting!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with AWK and Scripting!
# 1  
Old 08-12-2010
Help with AWK and Scripting!

Hi,
This is the first time I am working with awk and I am not familiar with any commands in it. But I managed to do most of my work just left with one more. Needing your help!

I have to extract only the matrix (written within []) from a text file. For example:
Code:
1JTJ_0006_ACGC_NPNP_A_12_15.pdb  0.61 0.54  [     0.43 0.51 0.71 0.81]
1JTJ_0011_ACGC_NPNP_A_12_15.pdb  0.46 0.44  [0.37      0.30 0.42 0.70]

I have thousands of text files in a folder. I decided to use awk since I found one awk command on the internet to extract the substring within []. For each file read, after processing, the output should be stored in part of the filename_out.txt. I have almost completed the work. But I have the output as

Code:
     0.43 0.51 0.71 0.81
0.37      0.30 0.42 0.70

I have to process this matrix further in matlab. So I just want square brackets at the end as,

Code:
[    0.43 0.51 0.71 0.81
0.37      0.30 0.42 0.70 ]

Here is what I have done so far,
Code:
for f in *.pair.txt;do awk -v RS=[ -F] '/]/{split (FILENAME,d,".");}{print   $1  >> d[1]"_out.txt"}' $f;done

I tried {print "[" > d[1]"out.txt"} in many places but dint work.

If there is anyother easy way of doing my task please let me know.

Thanks,
Sriji

Last edited by Scott; 08-17-2010 at 03:45 AM.. Reason: Added code tags, removed formatting
# 2  
Old 08-12-2010
I might have written the awk a bit differently, but working with what you've got, this should add the leading and trailing square brackets.

Code:
awk  -F "]" -v RS="["  '
        NR == 1 {                              # establish file name just once at beginning
                split( FILENAME, d, ".");
                outfile = d[1] "_out.txt";
                printf( "[" ) >outfile;         # opening bracket to the output
        }
        /\]/{                                   # for each record, print the last one if there, save this one
                if( last )
                        print last >outfile;    # print the one from the last time
                last = $1;                     # save this one til next record is read or end
        }
        END {
                printf( "%s]\n", last ) >outfile;       # print the last line of the matrix and closing bracket
        }
        ' $f

The logic is such that it saves the current line in the matrix and will output it as the next is read. At the end of the input file, the last line of the matrix is printed with the closing bracket.
# 3  
Old 08-13-2010
Thank you so much!!!

---------- Post updated 08-13-10 at 01:58 PM ---------- Previous update was 08-12-10 at 09:46 PM ----------

Hi,
Can you help me with this
The matrix in the input file will be like, (there will be blank spaces indicating no value in the matrix)

Code:
[    4   5  6  7 ]
[3       7  9  10]
[4   5      11 9]

When I make it look like
Code:
[0  4   5  6   7 
 3  0   7  9   10
 4  5   0  11  9]

in the output file, how to insert 0 in the diagonal space which is blank? How can be this done along with the previous code?


Srijit

Last edited by Scott; 08-17-2010 at 03:47 AM.. Reason: Code tags
# 4  
Old 08-13-2010
I'd write it like this (completely replaces the previous example) -- there might be a better way, but this is straight forward.

It assumes that a pair of spaces indicates the need for a zero. If there are 4 spaces then it will insert to zero values, 6 spaces 3 zeros, etc. It also assumes an opening bracket space, or space closing bracket indicates the need for a zero at the beginning or end. If this isn't quite right, it should be straightforward to make the needed changes.

Code:
for f in *.pair.txt
do
        awk '
        BEGIN {printf( "[" );   }               # opening bracket
  
        /[.*]/  {
                if( last )
                        printf( "%s\n",  last );  # print the last one we saw

                gsub( ".*\\[", "[", $0 );       # trash all before [, but keep [
                gsub( "].*", "]", $0 );         # trash all after ], but keep ]
                gsub( "\\[ ", "0.0 ", $0 );     # opening bracket, space becomes 0 space
                gsub( "\\[", "", $0 );          # ditch opening bracket
                gsub( " ]", " 0.0", $0 );       # trailing space, bracket becomes 0
                gsub( "]", "", $0 );            # ditch trailing bracket
                gsub( "  ", " 0.0 ", $0 );      # two spaces becomes 0 space
                gsub( "  ", " ", $0 );          # cleanup if two or more 0s inserted

                last = $0;               # save to add trailing ] if this is the last one
        } 

        END {
                printf( "%s]\n", last );
        }
        ' $f >${f%%.*}_out.txt  
done

# 5  
Old 08-13-2010
Code:
grep -Eo "\[(.[^]]*)\]" file | sed '1s/[ \t]*\]//;$s/^\[[ \t]*//'

# 6  
Old 08-16-2010
Incorrect Ouput

Hi Agama,

I ran yoyour second code. It gives me the wrong output. It even prints the texts in the input.

Here is a sample input,

Code:
#### ACGG_NNNP.pairwiseRMSDs.txt
#### Output from pdb_extract.py
#### Created 2010-08-12 13:59:31.122708

5 structures aligned

1OSW_0002_ACGG_NNNP_A_10_13.pdb.pdb most representative structure of pool
 with lowest average pair-wise RMSD of 0.81

    Mean global RMSD: 0.91  (0.81 to 0.98A)
 Mean global bb RMSD: 0.72  (0.65 to 0.82A)

                                 Avg  Avgbb  Pair-Wise RMSD Matrix:
Structure:                       RMSD RMSD  (Top-Right=Heavy atom alignment, Bottom-Left=Backbone atom alignment)
-------------------------------  ---- ----   0_13 0_13 0_13 0_13 0_13
1OSW_0002_ACGG_NNNP_A_10_13.pdb  0.81 0.65  [     0.84 0.77 0.95 0.70]
1OSW_0005_ACGG_NNNP_A_10_13.pdb  0.85 0.76  [0.85      1.01 0.59 0.94]
1OSW_0015_ACGG_NNNP_A_10_13.pdb  0.98 0.68  [0.43 0.83      1.10 1.04]
1OSW_0019_ACGG_NNNP_A_10_13.pdb  0.95 0.82  [0.91 0.51 0.94      1.14]
1OSW_0021_ACGG_NNNP_A_10_13.pdb  0.95 0.67  [0.42 0.83 0.52 0.91     ]

END

Can you please try to run the code for this and see if it works perfectly with just printing the matrix with 0 inserted in the spaces along the diagonal? If not can you make what changes I should make to get that output? It is 5*5 matrix in the above input. There is space along the diagonal.

Thanks,
Srijit

Last edited by Scott; 08-17-2010 at 03:49 AM.. Reason: Code tags, please...
# 7  
Old 08-16-2010
Quote:
Originally Posted by SriJit
[SIZE=2]
Hi Agama,

I ran yoyour second code. It gives me the wrong output. It even prints the texts in the input.
Yes, I ran the script against data that always had a matrix component so I didn't notice the mistake. Add backslashes to escape the square brackets and this will print just the matrix:

Code:
/\[.*\]/  {

Quote:
Here is a sample input,
--------------- snip -----------------------------------------------------
Code:
-------------------------------  ---- ----   0_13 0_13 0_13 0_13 0_13
1OSW_0002_ACGG_NNNP_A_10_13.pdb  0.81 0.65  [     0.84 0.77 0.95 0.70]
1OSW_0005_ACGG_NNNP_A_10_13.pdb  0.85 0.76  [0.85      1.01 0.59 0.94]
1OSW_0015_ACGG_NNNP_A_10_13.pdb  0.98 0.68  [0.43 0.83      1.10 1.04]
1OSW_0019_ACGG_NNNP_A_10_13.pdb  0.95 0.82  [0.91 0.51 0.94      1.14]
1OSW_0021_ACGG_NNNP_A_10_13.pdb  0.95 0.67  [0.42 0.83 0.52 0.91     ]

END


The way I originally wrote the code assumed that there was one extra space indicating a missing 0 -- please use code tags so that spacing is preserved - your example here shows 4 spaces, so the code will need to be a bit different:

Code:
        awk '
        BEGIN {printf( "[" );   }               # opening bracket

        /\[.*\]/  {          
                if( last )
                        printf( "%s\n",  last );  # print the last one we saw

                gsub( ".*\\[", "[", $0 );       # trash all before [, but keep [
                gsub( "].*", "]", $0 );         # trash all after ], but keep ]
                gsub( "\\[", "", $0 );          # ditch opening bracket
                gsub( "]", "", $0 );            # ditch trailing bracket

                gsub( "    ", " 0.00", $0 );  # four spaces becomes 0.00 

                gsub( " $", "", $0 );          # cleanup trailing space if there
                gsub( "^ ", "", $0 );          # cleanup leading space if there

                last = $0;               # save to add trailing ] if this is the last one
        }

        END {
                printf( "%s]\n", last );
        }
        ' $f

Lines in bold are new or changed. Several lines have been removed as they became unnecessary.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

New at scripting awk with variable

I am trying to write a shell script that will add a date to and awk command from the command prompt. here is the line I am having difficulties. zgrep -i playback $dir/$1-*-errors.out.gz |cut -d '|' -f 1,11,12,15 | awk -v start=${start} -v end=${end} -F '|' '{$1>=start && $1 <=end} {print $2... (7 Replies)
Discussion started by: infinity0880
7 Replies

2. Shell Programming and Scripting

awk Scripting

Hey guys, I want to get all the columns in this input file tab-delimited, i need to get the column send them to a variable. From there i could print them in shuffle and pick and select columns i want. Here is the input sample 2013/08/05 06:50:38:067 MINOR SiteScope ... (9 Replies)
Discussion started by: ryandegreat25
9 Replies

3. Shell Programming and Scripting

Need help in awk scripting

Hi I am beginner of shell/AWK scripting , can you please help me in select particular column and column between two pattern from a multiple column file. file1.txt number status date1 date2 description category ... (7 Replies)
Discussion started by: vijay_rajni
7 Replies

4. Shell Programming and Scripting

Need help with awk scripting

hi all, I am working on awk scripting.I have created two awk files and now have a requirement of replacing the contents of first file with some contents of second file. Please find below the two files created.File1 has 3 records and File2 has 4 records. cat File1 111,0165,CB21031251,0165,... (3 Replies)
Discussion started by: csrohit
3 Replies

5. Shell Programming and Scripting

Need help with awk scripting.

Hi, i am newbie to this site and hope to learn but problem is s but need help urgently. Plz pm me if you are good at this. Help will be appreciated. (11 Replies)
Discussion started by: Rookie80
11 Replies

6. Shell Programming and Scripting

scripting/awk help : awk sum output is not comming in regular format. Pls advise.

Hi Experts, I am adding a column of numbers with awk , however not getting correct output: # awk '{sum+=$1} END {print sum}' datafile 2.15291e+06 How can I getthe output like : 2152910 Thank you.. # awk '{sum+=$1} END {print sum}' datafile 2.15079e+06 (3 Replies)
Discussion started by: rveri
3 Replies

7. Shell Programming and Scripting

awk scripting

Hi I have 2 files of large size( 10 Miilions) , i want to join 2 files based on some condition . for this taking lot of time and 100 cpu .i want to iterate the based on some value (like 1 lakh) I put the 2 files in the associative arrays . if the array limit reaches the 1 lach join the with... (2 Replies)
Discussion started by: kiranmosarla
2 Replies

8. Shell Programming and Scripting

Scripting via awk

Hi, I am trying to understand what is happening here, new to scripting: I have a couple of these, but if I knew what was going on in one I can figure out the rest: awk '/rpc-100083/ { $2 = "enable -r" } $3 ~ /.NOS99dtlogin/ { $t = $2; $2 = $3; $3 = $t } { print }' /var/svc/profile/upgrade... (2 Replies)
Discussion started by: ulemsee
2 Replies

9. Shell Programming and Scripting

AWK scripting

I have a text file in which the text has been divided into paragraphs (two line breaks or tab marks a new paragraph) and I want to make a script which output would delete line breaks within the paragraph and the different paragraphs would be separated by two line breaks. So, if my input file... (14 Replies)
Discussion started by: Muki101
14 Replies

10. UNIX for Dummies Questions & Answers

Awk scripting

Hi, I'm new to unix and i am kind of familiar with the basic commands. can anyone suggest some good books especially for AWK scripting and SHELL scripting thanks, Hari (2 Replies)
Discussion started by: rharee
2 Replies
Login or Register to Ask a Question