Filter datablocks meeting criteria

01-21-2016

Registered User

16, 0

Join Date: Mar 2015

Last Activity: 5 October 2017, 4:20 PM EDT

Posts: 16

Thanks Given: 12

Thanked 0 Times in 0 Posts

Filter datablocks meeting criteria

Hello,

I am trying to extract valid data blocks from invalid ones. In the input the data blocks are separated by one or more blank rows. The criteria are

1) second column value must be 30 or more for the row to be valid and considered for calculation and output.

2) the sum of all valid (>=30) second columns within a data block has to be 250 or more.

3) once the blocks are defined I want to assign a unique name to each block

So if my input is

Code:

The desired output should be

Code:

Block1	2395	84
Block1	2400	52
Block1	2402	106
Block1	2403	110
Block1	2404	98
		
		
Block2	1770	38
Block2	2424	54
Block2	2425	44
Block2	2426	105
Block2	2427	52
Block2	2431	142
Block2	2434	58
Block2	2439	39
Block2	2440	38
Block2	2441	71
Block2	2443	46
Block2	2446	51
Block2	2449	32

I can filter and sum the second column for the entire file, but I`m not being able to catch the data blocks and sum them

Code:

awk '$1==" "{b=1;i=i+1} $2 > 29 { sum[i] += $2 ; b=0} END { print sum[i] }' file

Please assist.

sheetalk

View Public Profile for sheetalk

Find all posts by sheetalk

01-21-2016

Moderator

3,689, 1,352

Join Date: Jan 2012

Last Activity: 22 August 2020, 11:29 PM EDT

Location: Galactic Empire

Posts: 3,689

Thanks Given: 268

Thanked 1,352 Times in 1,258 Posts

Try this:-

Code:

awk '
        BEGIN {
                X = 1
                F = 1
        }
        $2 >= 30 {
                F = 1
                ++C
                A[X,C] = $0
                T[X] += $2
                M[X] = C
        }
        /^[ \t]*$/ && F {
                ++X
                F = 0
        }
        END {
                for ( m = 1; m <= X; m++ )
                {
                        if ( T[m] >= 250 )
                        {
                                ++B
                                for ( n = 1; n <= M[m]; n++ )
                                {
                                        if ( A[m,n] )
                                                print "Block"B, A[m,n]
                                }
                                printf "\n"
                        }
                }
        }
' OFS='\t' file

This User Gave Thanks to Yoda For This Post:

Yoda

View Public Profile for Yoda

Visit Yoda's homepage!

Find all posts by Yoda

01-21-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk '
!NF     {if (SUM > 250) {++BLK
                         for (c=1; c<=CNT; c++) print "Block" BLK, M[c]
                        }
         SUM = CNT = 0
         next
        }
$2 >=30 {SUM += $2
         M[++CNT] = $0
        }
END     {if (SUM > 250) {BLK++
                         for (c=1; c<=CNT; c++) print "Block" BLK, M[c]
                        }
        }
' file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Filter datablocks meeting criteria

4 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

Discussion started by: MIA651

2. Shell Programming and Scripting

Help with filter result that fulfill criteria

Discussion started by: perl_beginner

3. Shell Programming and Scripting

Filter/remove duplicate .dat file with certain criteria

Discussion started by: mukeshguliao

4. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Discussion started by: ssshen