Filter datablocks meeting criteria


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Filter datablocks meeting criteria
# 1  
Old 01-21-2016
Filter datablocks meeting criteria

Hello,

I am trying to extract valid data blocks from invalid ones. In the input the data blocks are separated by one or more blank rows. The criteria are

1) second column value must be 30 or more for the row to be valid and considered for calculation and output.

2) the sum of all valid (>=30) second columns within a data block has to be 250 or more.

3) once the blocks are defined I want to assign a unique name to each block




So if my input is


Code:
2395	84
2400	52
2402	106
2403	110
2404	98



	
2322	32
1633	2
1634	6
1636	7
1637	3
1638	1
1639	2
1640	6
1641	4
1657	1
1668	4
622	2
2321	22
421	1
619	1
620	1
625	1
	
1764	28
1769	27
1770	38
1771	25
1776	18
1777	24
2424	54
2425	44
2426	105
2427	52
2431	142
2434	58
2435	24
2439	39
2440	38
2441	71
2443	46
2446	51
2447	29
2449	32
2450	29
2451	10

The desired output should be

Code:
Block1	2395	84
Block1	2400	52
Block1	2402	106
Block1	2403	110
Block1	2404	98
		
		
Block2	1770	38
Block2	2424	54
Block2	2425	44
Block2	2426	105
Block2	2427	52
Block2	2431	142
Block2	2434	58
Block2	2439	39
Block2	2440	38
Block2	2441	71
Block2	2443	46
Block2	2446	51
Block2	2449	32

I can filter and sum the second column for the entire file, but I`m not being able to catch the data blocks and sum them

Code:
awk '$1==" "{b=1;i=i+1} $2 > 29 { sum[i] += $2 ; b=0} END { print sum[i] }' file

Please assist.
# 2  
Old 01-21-2016
Try this:-
Code:
awk '
        BEGIN {
                X = 1
                F = 1
        }
        $2 >= 30 {
                F = 1
                ++C
                A[X,C] = $0
                T[X] += $2
                M[X] = C
        }
        /^[ \t]*$/ && F {
                ++X
                F = 0
        }
        END {
                for ( m = 1; m <= X; m++ )
                {
                        if ( T[m] >= 250 )
                        {
                                ++B
                                for ( n = 1; n <= M[m]; n++ )
                                {
                                        if ( A[m,n] )
                                                print "Block"B, A[m,n]
                                }
                                printf "\n"
                        }
                }
        }
' OFS='\t' file

This User Gave Thanks to Yoda For This Post:
# 3  
Old 01-21-2016
Try
Code:
awk '
!NF     {if (SUM > 250) {++BLK
                         for (c=1; c<=CNT; c++) print "Block" BLK, M[c]
                        }
         SUM = CNT = 0
         next
        }
$2 >=30 {SUM += $2
         M[++CNT] = $0
        }
END     {if (SUM > 250) {BLK++
                         for (c=1; c<=CNT; c++) print "Block" BLK, M[c]
                        }
        }
' file

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

4 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first... (15 Replies)
Discussion started by: MIA651
15 Replies

2. Shell Programming and Scripting

Help with filter result that fulfill criteria

Input file: ##fileformat=tab ##reference=file:input.txt #Line Position Score Input_185827_2127 1071 67 Input_18213_21 1021 100 Input_9012_214 200 150 Input_935_217 124 70 Output file: ##fileformat=tab ##reference=file:input.txt #Line Position Score Input_18213_21 1021... (2 Replies)
Discussion started by: perl_beginner
2 Replies

3. Shell Programming and Scripting

Filter/remove duplicate .dat file with certain criteria

I am a beginner in Unix. Though have been asked to write a script to filter(remove duplicates) data from a .dat file. File is very huge containig billions of records. contents of file looks like 30002157,40342424,OTC,mart_rec,100, ,0 30002157,40343369,OTC,mart_rec,95, ,0... (6 Replies)
Discussion started by: mukeshguliao
6 Replies

4. Shell Programming and Scripting

extract data from a data matrix with filter criteria

Here is what old matrix look like, IDs X1 X2 Y1 Y2 10914061 -0.364613333 -0.362922333 0.001691 -0.450094667 10855062 0.845956333 0.860396667 0.014440333 1.483899333... (7 Replies)
Discussion started by: ssshen
7 Replies
Login or Register to Ask a Question