counting lines containing two column field values with awk

06-22-2011

Registered User

24, 1

Join Date: May 2011

Last Activity: 4 May 2014, 5:31 AM EDT

Posts: 24

Thanks Given: 24

Thanked 1 Time in 1 Post

counting lines containing two column field values with awk

Hello everybody,
I'm trying to count the number of consecutive lines in a text file which have two distinctive column field values. These lines may appear in several line blocks within the file, but I only want a single block to be counted.

This was my first approach to tackle the problem (I'm a beginner, so be gentle

) ...

# find line in which pattern INT appears for the first time in file topology

Code:

NR_1st_int=$(awk -v var="$INT" '$0~var {print NR}' topology | sed '1 !d')

# print molecule number of field 5 in that line into MOL

Code:

MOL_int=$(awk -v line_int="$NR_1st_int" 'NR==line_int {print $5}' topology)

# count number of lines that INT and MOL appear IN THE FILE

Code:

NINT=$(awk -v var1="$INT" -v var2="$MOL_int" 'BEGIN { count=0 } { if (( $4 == var1 ) && ( $5 == var2 )) count++} END{print count}' topology)

The main problem is that this (bad) code does not restrict the number NINT to a single block of occurring lines, which I need.

I've tried working with some loops but I messed it up every time. Can you help me out?

origamisven

View Public Profile for origamisven

Find all posts by origamisven

06-22-2011

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Can you please post sample input and expected out , since I believe the "sample data" explains the problem better.

panyam

View Public Profile for panyam

Find all posts by panyam

06-22-2011

Registered User

24, 1

Join Date: May 2011

Last Activity: 4 May 2014, 5:31 AM EDT

Posts: 24

Thanks Given: 24

Thanked 1 Time in 1 Post

Of course... The text file looks something like this:

1 ABC

2 SOL

5 ABC

2 SOL

3 SOL

What I need the script to do is find the first occurrence of SOL in column 2 and count the number of times I consecutively find SOL in column 2 until the number in column 1 changes.

In fact I want to count constant atom numbers for a particular molecule (SOL). So in this particular case the output should be the integer value "3".

The problem is that in this ecample "2 " and "SOL" occurs 6 times within the same line (since there are two blocks of three three lines each). I need to find only the first block of lines and count the atoms.

origamisven

View Public Profile for origamisven

Find all posts by origamisven

06-22-2011

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

so the output you are expecting is

ABC -> 3 times,

SOL -> 3 times ?

In straight way , count only the first occurance of SOL ( keeping in mind column1 should be unique)

This User Gave Thanks to panyam For This Post:

panyam

View Public Profile for panyam

Find all posts by panyam

06-22-2011

Registered User

1,000, 237

Join Date: Jun 2011

Last Activity: 2 August 2017, 9:27 AM EDT

Location: From far

Posts: 1,000

Thanks Given: 21

Thanked 237 Times in 231 Posts

Code:

% VAR=SOL;  awk '/'$VAR'/ { while (match($0, "'$VAR'")) 
                { s = s $0 "\n";  getline } printf s; exit}' testfile | wc -l
3

This User Gave Thanks to yazu For This Post:

yazu

View Public Profile for yazu

Find all posts by yazu

06-22-2011

Registered User

24, 1

Join Date: May 2011

Last Activity: 4 May 2014, 5:31 AM EDT

Posts: 24

Thanks Given: 24

Thanked 1 Time in 1 Post

@panyam: yes, basically

@yazu: thanks, this will probably do it!

I have problems adapting your solution to my problem though. What does this part do?

{ s = s $0 "\n"; getline }

In fact I need to apply it to files like this, where the atoms of several molecules SOL, GLN, etc need to be counted...

(...)

ATOM 55772 OW SOL 5362 104.470 140.450 35.110 1.00 0.00

ATOM 55773 HW1 SOL 5362 105.040 141.040 34.530 1.00 0.00

ATOM 55774 HW2 SOL 5362 103.620 140.230 34.630 1.00 0.00

ATOM 55775 OW SOL 5363 47.240 47.300 89.850 1.00 0.00

ATOM 55776 HW1 SOL 5363 46.450 47.900 90.020 1.00 0.00

ATOM 55777 HW2 SOL 5363 46.930 46.410 89.530 1.00 0.00

ATOM 55778 OW SOL 5364 122.910 41.340 72.190 1.00 0.00

ATOM 55779 HW1 SOL 5364 122.410 41.120 71.360 1.00 0.00

ATOM 55780 HW2 SOL 5364 123.530 42.110 72.010 1.00 0.00

ATOM 55781 OW SOL 5365 121.590 102.910 60.970 1.00 0.00

ATOM 55782 HW1 SOL 5365 120.940 102.300 60.520 1.00 0.00

ATOM 55783 HW2 SOL 5365 121.150 103.770 61.190 1.00 0.00

ATOM 55757 NE2 GLN 5358 83.370 21.490 106.870 1.00 0.00

ATOM 55758 1HE2 GLN 5358 83.890 22.060 107.530 1.00 0.00

ATOM 55759 2HE2 GLN 5358 83.280 20.500 107.080 1.00 0.00

ATOM 55760 C GLN 5358 84.800 25.100 108.100 1.00 0.00

ATOM 55761 O1 GLN 5358 84.860 26.360 108.290 1.00 0.00

ATOM 55762 O2 GLN 5358 85.820 24.380 107.860 1.00 0.00

ATOM 55763 OW SOL 5359 73.720 119.460 110.240 1.00 0.00

ATOM 55764 HW1 SOL 5359 73.060 119.910 110.850 1.00 0.00

ATOM 55765 HW2 SOL 5359 73.240 119.130 109.430 1.00 0.00

ATOM 55766 OW SOL 5365 27.480 87.690 57.920 1.00 0.00

ATOM 55767 HW1 SOL 5365 26.550 87.980 57.690 1.00 0.00

ATOM 55768 HW2 SOL 5365 28.130 88.360 57.580 1.00 0.00

ATOM 55769 OW SOL 5364 78.820 91.080 70.170 1.00 0.00

ATOM 55770 HW1 SOL 5364 78.180 90.310 70.120 1.00 0.00

ATOM 55771 HW2 SOL 5364 78.360 91.870 70.560 1.00 0.00

origamisven

View Public Profile for origamisven

Find all posts by origamisven

06-22-2011

Registered User

1,000, 237

Join Date: Jun 2011

Last Activity: 2 August 2017, 9:27 AM EDT

Location: From far

Posts: 1,000

Thanks Given: 21

Thanked 237 Times in 231 Posts

Quote:

What does this part do?
{ s = s $0 "\n"; getline }

s (undefined by default or equal "") concatenated with the current line
and newline char. Then getline reads the next line from input and makes it
the current line ($0). This lasts until we have matches with values of $VAR.

I don't quite understand your task but maybe this can help:

Code:

for ATOM in SOL GLN ; do
{
        awk '/'$ATOM'/ { while (match($0, "'$ATOM'"))
                { s = s $0 "\n";  getline } printf s; exit}' | wc -l
} < testfile
done
12
6

where testfile is the file with your data.

Sorry for my English.

This User Gave Thanks to yazu For This Post:

yazu

View Public Profile for yazu

Find all posts by yazu

Shell Programming and Scripting

counting lines containing two column field values with awk

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum column values matching other field

Discussion started by: alpha_1

2. Shell Programming and Scripting

awk to filter out lines containing unique values in a specified column

Discussion started by: owwow14

3. Shell Programming and Scripting

Awk: print lines with one of multiple pattern in the same field (column)

Discussion started by: elgo4

4. Shell Programming and Scripting

[Solved] Counting The Number of Lines Between Values with Multiple Variables

Discussion started by: VagabondGold

5. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Discussion started by: jacobs.smith

6. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Discussion started by: asjaiswal

7. Shell Programming and Scripting

How to compare the values of a column in awk in a same file and consecutive lines..

Discussion started by: manuswami

8. Shell Programming and Scripting

Transpose field names from column headers to values in one column

Discussion started by: popesk

9. UNIX for Dummies Questions & Answers

Awk counting lines with field match

Discussion started by: fredted40x

10. Shell Programming and Scripting

Need help with switching field/column values

Discussion started by: sonyd8