counting lines containing two column field values with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counting lines containing two column field values with awk
# 1  
Old 06-22-2011
counting lines containing two column field values with awk

Hello everybody,
I'm trying to count the number of consecutive lines in a text file which have two distinctive column field values. These lines may appear in several line blocks within the file, but I only want a single block to be counted.

This was my first approach to tackle the problem (I'm a beginner, so be gentle Smilie) ...

# find line in which pattern INT appears for the first time in file topology
Code:
NR_1st_int=$(awk -v var="$INT" '$0~var {print NR}' topology | sed '1 !d')

# print molecule number of field 5 in that line into MOL
Code:
MOL_int=$(awk -v line_int="$NR_1st_int" 'NR==line_int {print $5}' topology)

# count number of lines that INT and MOL appear IN THE FILE
Code:
NINT=$(awk -v var1="$INT" -v var2="$MOL_int" 'BEGIN { count=0 } { if (( $4 == var1 ) && ( $5 == var2 )) count++} END{print count}' topology)

The main problem is that this (bad) code does not restrict the number NINT to a single block of occurring lines, which I need.

I've tried working with some loops but I messed it up every time. Can you help me out?
# 2  
Old 06-22-2011
Can you please post sample input and expected out , since I believe the "sample data" explains the problem better.
# 3  
Old 06-22-2011
Java

Of course... The text file looks something like this:

1 ABC
1 ABC
1 ABC
2 SOL
2 SOL
2 SOL
5 ABC
5 ABC
5 ABC
5 ABC
2 SOL
2 SOL
2 SOL
3 SOL
3 SOL
3 SOL

What I need the script to do is find the first occurrence of SOL in column 2 and count the number of times I consecutively find SOL in column 2 until the number in column 1 changes.

In fact I want to count constant atom numbers for a particular molecule (SOL). So in this particular case the output should be the integer value "3".

The problem is that in this ecample "2 " and "SOL" occurs 6 times within the same line (since there are two blocks of three three lines each). I need to find only the first block of lines and count the atoms.
# 4  
Old 06-22-2011
so the output you are expecting is

ABC -> 3 times,

SOL -> 3 times ?

In straight way , count only the first occurance of SOL ( keeping in mind column1 should be unique)
This User Gave Thanks to panyam For This Post:
# 5  
Old 06-22-2011
Code:
% VAR=SOL;  awk '/'$VAR'/ { while (match($0, "'$VAR'")) 
                { s = s $0 "\n";  getline } printf s; exit}' testfile | wc -l
3

This User Gave Thanks to yazu For This Post:
# 6  
Old 06-22-2011
@panyam: yes, basically

@yazu: thanks, this will probably do it!

I have problems adapting your solution to my problem though. What does this part do?

{ s = s $0 "\n"; getline }

In fact I need to apply it to files like this, where the atoms of several molecules SOL, GLN, etc need to be counted...

 
(...)
ATOM 55772 OW SOL 5362 104.470 140.450 35.110 1.00 0.00
ATOM 55773 HW1 SOL 5362 105.040 141.040 34.530 1.00 0.00
ATOM 55774 HW2 SOL 5362 103.620 140.230 34.630 1.00 0.00
ATOM 55775 OW SOL 5363 47.240 47.300 89.850 1.00 0.00
ATOM 55776 HW1 SOL 5363 46.450 47.900 90.020 1.00 0.00
ATOM 55777 HW2 SOL 5363 46.930 46.410 89.530 1.00 0.00
ATOM 55778 OW SOL 5364 122.910 41.340 72.190 1.00 0.00
ATOM 55779 HW1 SOL 5364 122.410 41.120 71.360 1.00 0.00
ATOM 55780 HW2 SOL 5364 123.530 42.110 72.010 1.00 0.00
ATOM 55781 OW SOL 5365 121.590 102.910 60.970 1.00 0.00
ATOM 55782 HW1 SOL 5365 120.940 102.300 60.520 1.00 0.00
ATOM 55783 HW2 SOL 5365 121.150 103.770 61.190 1.00 0.00
ATOM 55757 NE2 GLN 5358 83.370 21.490 106.870 1.00 0.00
ATOM 55758 1HE2 GLN 5358 83.890 22.060 107.530 1.00 0.00
ATOM 55759 2HE2 GLN 5358 83.280 20.500 107.080 1.00 0.00
ATOM 55760 C GLN 5358 84.800 25.100 108.100 1.00 0.00
ATOM 55761 O1 GLN 5358 84.860 26.360 108.290 1.00 0.00
ATOM 55762 O2 GLN 5358 85.820 24.380 107.860 1.00 0.00
ATOM 55763 OW SOL 5359 73.720 119.460 110.240 1.00 0.00
ATOM 55764 HW1 SOL 5359 73.060 119.910 110.850 1.00 0.00
ATOM 55765 HW2 SOL 5359 73.240 119.130 109.430 1.00 0.00
ATOM 55766 OW SOL 5365 27.480 87.690 57.920 1.00 0.00
ATOM 55767 HW1 SOL 5365 26.550 87.980 57.690 1.00 0.00
ATOM 55768 HW2 SOL 5365 28.130 88.360 57.580 1.00 0.00
ATOM 55769 OW SOL 5364 78.820 91.080 70.170 1.00 0.00
ATOM 55770 HW1 SOL 5364 78.180 90.310 70.120 1.00 0.00
ATOM 55771 HW2 SOL 5364 78.360 91.870 70.560 1.00 0.00
# 7  
Old 06-22-2011
Quote:
What does this part do?
{ s = s $0 "\n"; getline }
s (undefined by default or equal "") concatenated with the current line
and newline char. Then getline reads the next line from input and makes it
the current line ($0). This lasts until we have matches with values of $VAR.

I don't quite understand your task but maybe this can help:

Code:
for ATOM in SOL GLN ; do
{
        awk '/'$ATOM'/ { while (match($0, "'$ATOM'"))
                { s = s $0 "\n";  getline } printf s; exit}' | wc -l
} < testfile
done
12
6

where testfile is the file with your data.

Sorry for my English.
This User Gave Thanks to yazu For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum column values matching other field

this is part of a KT i am going thru. i am writing a script in bash shell, linux where i have 2 columns where 1st signifies the nth hour like 00, 01, 02...23 and 2nd the file size. sample data attached. Desired output is 3 columns which will give the nth hour, number of entries in nth hour and... (3 Replies)
Discussion started by: alpha_1
3 Replies

2. Shell Programming and Scripting

awk to filter out lines containing unique values in a specified column

Hi, I have multiple files that each contain four columns of strings: File1: Code: 123 abc gfh 273 456 ddff jfh 837 789 ghi u4u 395 File2: Code: 123 abc dd fu 456 def 457 nd 891 384 djh 783 I want to compare the strings in Column 1 of File 1 with each other file and Print in... (3 Replies)
Discussion started by: owwow14
3 Replies

3. Shell Programming and Scripting

Awk: print lines with one of multiple pattern in the same field (column)

Hi all, I am new to using awk and am quickly discovering what a powerful pattern-recognition tool it is. However, I have what seems like a fairly basic task that I just can't figure out how to perform in one line. I want awk to find and print all the lines in which one of multiple patterns (e.g.... (8 Replies)
Discussion started by: elgo4
8 Replies

4. Shell Programming and Scripting

[Solved] Counting The Number of Lines Between Values with Multiple Variables

Hey everyone, I have a bunch of lines with values in field 4 that I am interested in. If these values are between 1 and 3 I want it to count all these values to all be counted together and then have the computer print out LOW and the number of lines with those values in between 1 and 3,... (2 Replies)
Discussion started by: VagabondGold
2 Replies

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

7. Shell Programming and Scripting

How to compare the values of a column in awk in a same file and consecutive lines..

I would like to compare the values of 2nd column of consecutive lines of same file in such a way so that if the difference between first value and second value is more than 100 it should print complete line else ignore line. Input File ========== PDB 2500 RTDB 123 RTDB-EAGLE 122 VSCCP 2565... (4 Replies)
Discussion started by: manuswami
4 Replies

8. Shell Programming and Scripting

Transpose field names from column headers to values in one column

Hi All, I'm looking for a script which can transpose field names from column headers to values in one column. for example, the input is: IDa;IDb;IDc;PARAM1;PARAM2;PARAM3; a;b;c;p1val;p2val;p3val; d;e;f;p4val;p5val;p6val; g;h;i;p7val;p8val;p9val; into the output like this: ... (6 Replies)
Discussion started by: popesk
6 Replies

9. UNIX for Dummies Questions & Answers

Awk counting lines with field match

Hi, Im trying to create a script that reads throught every line in a file and then counts how many lines there with a certain field that matches a input, and also ausing another awk it has to do the same as the above but to then use sort anduniq to get rid of all the unique lines with another... (8 Replies)
Discussion started by: fredted40x
8 Replies

10. Shell Programming and Scripting

Need help with switching field/column values

Hi all, I need some help on switching field/column values. For example I have a file name data.txt which contains: a b a b a b and I want to switch a and b and save it to the same file. the file data.txt then will have: b a b a b a The problem is, well, I know how to... (7 Replies)
Discussion started by: sonyd8
7 Replies
Login or Register to Ask a Question