Help with AWK and Scripting!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with AWK and Scripting!
# 8  
Old 08-16-2010
If your awk support gensub(), try this code.

Code:
awk '
BEGIN {printf "["}  
/\[/ {if ( last ) { printf "%s\n",  last}
       last=gensub(/.+\[(.+)\]/,"\\1","g")
      } 
END {printf( "%s]\n", last )}
' urfile

[ 0.84 0.77 0.95 0.70
0.85 1.01 0.59 0.94
0.43 0.83 1.10 1.04
0.91 0.51 0.94 1.14
0.42 0.83 0.52 0.91 ]

# 9  
Old 08-17-2010
Thanks

Sweet...Thanks
# 10  
Old 08-18-2010
Again a problem with the code

Hi Agama,

I have another problem with the code. Some of my inputs are like,

------------------------------- ---- ---- 0_13 0_13 0_13 0_13 0_13
1OSW_0002_ACGG_NNNP_A_10_13.pdb 0.81 0.65 [ 0.84 0.77 0.95 0.70........
12 12 34 5 5654 6 7.....]
1OSW_0005_ACGG_NNNP_A_10_13.pdb 0.85 0.76 [0.85 1.01 0.59 0.94]
1OSW_0015_ACGG_NNNP_A_10_13.pdb 0.98 0.68 [0.43 0.83 1.10 1.04]
1OSW_0019_ACGG_NNNP_A_10_13.pdb 0.95 0.82 [0.91 0.51 0.94 1.14]
1OSW_0021_ACGG_NNNP_A_10_13.pdb 0.95 0.67 [0.42 0.83 0.52 0.91 ]

END
rest of the lines in the matrix will be like the first line of the matrix.

i.e we have the input such that if the matrix is more than 30*30
then for each line in the matrix, only 30 columns will be in one line
rest will be wrapped to second line but still within one [] for each row of the matrix.

For this type of input ,your awk code does nt write anything in the output file.But it creates a output file.

Can you please modify the code or let me know how to modify?

Thanks,
Sri
# 11  
Old 08-18-2010
Yes, the original wouldn't have printed anything as it only picked up matrix
data when both opening bracket and closing bracket were on the same line.
This assumes that they can be split, and that there can be multiple lines
that have neither opening or closing brackets. It also assumes that if the
'blank indicates a zero' set of spaces exists at the end of the line, those
blanks are present. If not, it will not add 0.00 correctly.

Try this (of course the standard no guarantee that it will work).

Code:
        awk '
        BEGIN {printf( "[" );   }               # opening bracket

        /]/  {
                if( last )
                        printf( "%s\n",  last );  # print the last one we saw

                if( partial )                           # add current line to partial buffer
                        buffer = partial " " $0;
                else
                        buffer = $0;                    # no partial, just use current line

                gsub( ".*\\[", "[", buffer );      # trash all before [, but keep [
                gsub( "].*", "]", buffer );        # trash all after ], but keep ]
                gsub( "\\[", "", buffer );         # ditch opening bracket
                gsub( "]", "", buffer );           # ditch trailing bracket
                gsub( "    ", " 0.00", buffer );   # four spaces becomes 0.00 space
                gsub( "  ", " ", buffer );         # cleanup if multiple spaces
                gsub( " $", "", buffer );          # cleanup trailing space if there
                gsub( "^ ", "", buffer );          # cleanup leading space if there

                last = buffer;               # save to add trailing ] if this is the last one
                join = 0;
                partial = "";

                next;
        }

        /\[/ {                                  # beginning of matrix, but not end
                gsub( "^.*\\[", "[", $0 );      # ditch beginning junk
                partial = $0;                   # start a partial buffer
                join = 1;                       # join next line(s) if not end of matrix
                next;
        }

        join == 1 {
                partial = partial " " $0;       # add this line to the partial matrix
                next;
        }

        END {
                printf( "%s]\n", last );
        }
        ' $f

I ran a quick test with data that I dummied up. It seems to give sane output, but I didn't look too closely. You should be able to tweek this if it's not just right.
# 12  
Old 08-19-2010
Incorrect output

Hi,

The code works fine in reading the lines. But since within [] when space is seen it prints 0.00, when input is like,

1FCG_234_455 35 36 [1 2 3 ....30
31 32...60
61 .... ]


I am unable to show you the space, but in the input, if there are more than 30 columns per row, after 30 column the rest are wrapped to the next line and the starting number of the second line is in the same position as where the starting number of first line is. for ex. 31 in the second line of the example is positioned in the same column as 1. 1,31,61 are all in the same column of the text file. So there are empty spaces till that position.

for the empty spaces in the beginning of each line within[] is substitued with 0.00. Hence the matrix size becomes large. 60*60 matrix becomes 90*90. I need to manipulte this matrix. So it has to be the same, just in the diagonal it will have 0.

Is it possible to modify your code to make this work? I tried but dint succeed.


Thanks,
Srijit

---------- Post updated at 10:49 AM ---------- Previous update was at 10:33 AM ----------

Is it possible to do the same what I have asked in the previous post like the following.

First, let it insert 0.00 for all the four spaces it sees in the input.

Then ,may be another awk command orin the same, it can check the output got from the previous output for continuous zeroes and remove them. In my input only 0.00 will be along the diagonal.

Since I am new to awk, am unable to try writing a code for my problem.

Thanks,
Srijit

Last edited by SriJit; 08-19-2010 at 11:40 AM..
# 13  
Old 08-19-2010
Quote:
Originally Posted by SriJit
Then ,may be another awk command orin the same, it can check the output got from the previous output for continuous zeroes and remove them. In my input only 0.00 will be along the diagonal.
It can all be done with a single awk. It just would have been helpful to know that the continuation lines were indented. In the future, if you place "code tags" around programmes or output, it preserves spacing and can illustrate things like the multiple blanks. To insert code tags, click on the '#' button at the top of the edit window and then type/paste your text between the tags that are inserted in the window.

Here is a revision to the awk that will allow for multiple spaces at the beginning of a continued line. It assumes that the text is aligned below the numbers like this:
Code:
leading junk tokens on first line [01.00 02.00 03.00 04.00 05.00 06.00 07.00 08.00 09.00 10.00 11.00 12.00 13.00
                                   14.00 15.00 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00
                                   27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 37.00 38.00 39.00]

The programme will properly handle the case where the blanks in the data are at the beginning of a continued record -- it does not blindly delete all blanks at the beginning of the line to provide for this.
Code:
        awk '
        BEGIN {printf( "[" );   }               # opening bracket

        /]/  {
                if( last )
                        printf( "%s\n",  last );  # print the last one we saw

                if( partial )                           # add current line to partial buffer
                        buffer = partial " " substr( $0, indent );      # ditch leading spaces, but dont trash the "blank == 0"
                else
                        buffer = $0;                    # no partial, just use current line

                gsub( ".*\\[", "[", buffer );      # trash all before [, but keep [
                gsub( "].*", "]", buffer );        # trash all after ], but keep ]
                gsub( "\\[", "", buffer );         # ditch opening bracket
                gsub( "]", "", buffer );           # ditch trailing bracket
                gsub( "  +", " 0.00 ", buffer );   # two or more spaces causes 0.00 to insert
                gsub( "  ", " ", buffer );         # cleanup if multiple spaces
                gsub( " $", "", buffer );          # cleanup trailing space if there
                gsub( "^ ", "", buffer );          # cleanup leading space if there

                last = buffer;               # save to add trailing ] if this is the last one
                join = 0;
                partial = "";

                next;
        } 

        /\[/ {                                  # beginning of matrix, but not end
                indent = index( $0, "[" ) + 1;  # number of spaces to skip for secondary lines
                gsub( "^.*\\[", "[", $0 );      # ditch beginning junk
                partial = $0;                   # start a partial buffer
                join = 1;                       # join next line(s) if not end of matrix
                next;
        }

        join == 1 {
                buffer = substr( $0, indent )   # ditch leading spaces, but dont trash the "blank == 0"
                partial = partial " " buffer;   # add this line to the partial matrix
                next;
        }
        
        END {
                printf( "%s]\n", last );
        }
        '  $f

Hope this works better for you.
# 14  
Old 08-20-2010
Still need one more change

Hi,

I still have to make small change. I get the output as,
[0.00 0.20 0.49 0.31 0.32 1.04 1.64 1.64 1.47 3.00 3.00 3.00 3.00 3.00 1.41 1.38 1.47 2.37 1.78 1.94 1.67 2.12 3.00 2.95 2.96 3.02 2.94 1.28 1.41 1.55 0.00 3.30 3.05 2.85

Green color values - second line of the same row.
Black first line.

For the above, the input had in the first row of the matrix as two lines. According to the code, it reads the second line too and writes it to the output file in the same line. But it inserts another 0.00 when it reads the the beginning of the second line. I have highlighted it above.
How should I modify the code?

Thanks,
Srijit
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

New at scripting awk with variable

I am trying to write a shell script that will add a date to and awk command from the command prompt. here is the line I am having difficulties. zgrep -i playback $dir/$1-*-errors.out.gz |cut -d '|' -f 1,11,12,15 | awk -v start=${start} -v end=${end} -F '|' '{$1>=start && $1 <=end} {print $2... (7 Replies)
Discussion started by: infinity0880
7 Replies

2. Shell Programming and Scripting

awk Scripting

Hey guys, I want to get all the columns in this input file tab-delimited, i need to get the column send them to a variable. From there i could print them in shuffle and pick and select columns i want. Here is the input sample 2013/08/05 06:50:38:067 MINOR SiteScope ... (9 Replies)
Discussion started by: ryandegreat25
9 Replies

3. Shell Programming and Scripting

Need help in awk scripting

Hi I am beginner of shell/AWK scripting , can you please help me in select particular column and column between two pattern from a multiple column file. file1.txt number status date1 date2 description category ... (7 Replies)
Discussion started by: vijay_rajni
7 Replies

4. Shell Programming and Scripting

Need help with awk scripting

hi all, I am working on awk scripting.I have created two awk files and now have a requirement of replacing the contents of first file with some contents of second file. Please find below the two files created.File1 has 3 records and File2 has 4 records. cat File1 111,0165,CB21031251,0165,... (3 Replies)
Discussion started by: csrohit
3 Replies

5. Shell Programming and Scripting

Need help with awk scripting.

Hi, i am newbie to this site and hope to learn but problem is s but need help urgently. Plz pm me if you are good at this. Help will be appreciated. (11 Replies)
Discussion started by: Rookie80
11 Replies

6. Shell Programming and Scripting

scripting/awk help : awk sum output is not comming in regular format. Pls advise.

Hi Experts, I am adding a column of numbers with awk , however not getting correct output: # awk '{sum+=$1} END {print sum}' datafile 2.15291e+06 How can I getthe output like : 2152910 Thank you.. # awk '{sum+=$1} END {print sum}' datafile 2.15079e+06 (3 Replies)
Discussion started by: rveri
3 Replies

7. Shell Programming and Scripting

awk scripting

Hi I have 2 files of large size( 10 Miilions) , i want to join 2 files based on some condition . for this taking lot of time and 100 cpu .i want to iterate the based on some value (like 1 lakh) I put the 2 files in the associative arrays . if the array limit reaches the 1 lach join the with... (2 Replies)
Discussion started by: kiranmosarla
2 Replies

8. Shell Programming and Scripting

Scripting via awk

Hi, I am trying to understand what is happening here, new to scripting: I have a couple of these, but if I knew what was going on in one I can figure out the rest: awk '/rpc-100083/ { $2 = "enable -r" } $3 ~ /.NOS99dtlogin/ { $t = $2; $2 = $3; $3 = $t } { print }' /var/svc/profile/upgrade... (2 Replies)
Discussion started by: ulemsee
2 Replies

9. Shell Programming and Scripting

AWK scripting

I have a text file in which the text has been divided into paragraphs (two line breaks or tab marks a new paragraph) and I want to make a script which output would delete line breaks within the paragraph and the different paragraphs would be separated by two line breaks. So, if my input file... (14 Replies)
Discussion started by: Muki101
14 Replies

10. UNIX for Dummies Questions & Answers

Awk scripting

Hi, I'm new to unix and i am kind of familiar with the basic commands. can anyone suggest some good books especially for AWK scripting and SHELL scripting thanks, Hari (2 Replies)
Discussion started by: rharee
2 Replies
Login or Register to Ask a Question