Count lines with awk if statements


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count lines with awk if statements
# 1  
Old 12-09-2016
Question Count lines with awk if statements

Hi Everybody,

I wanna count lines in many files, but only if they meet a condition, I have something like this,

Code:
cat /path1/usr/STAT/GPRS/ESTCOL_GPRS_2016* | awk 'BEGIN{FS=",";}{ if (substr($5,1,8)=='$DATE'){a[FILENAME]++} END{for(i in a)print a[i]}}'
DATE=$(date +%Y%m%d -d "1 day ago")

But it has some bug, can anybody help me? thank you Smilie

Last edited by Scrutinizer; 12-09-2016 at 04:37 PM.. Reason: icode tags => code tags
# 2  
Old 12-09-2016
Hi,

Can you try like this?
Code:
DATE=$(date +%Y%m%d -d "1 day ago")
awk -F, -vy=DATE '$0 ~ y {a[FILENAME]++ } END { for (i in a) { print i a[i] }}' /path1/usr/STAT/GPRS/ESTCOL_GPRS_2016*

If needed, add substr() for strict regex.
This User Gave Thanks to greet_sed For This Post:
# 3  
Old 12-09-2016
Thank you, it works!!!

There is some way to save the result in an array and the sum it to get only value?
# 4  
Old 12-09-2016
Quote:
Originally Posted by Elly
Thank you, it works!!!
There is some way to save the result in an array and the sum it to get only value?
Hello Elly,

Not sure what you mean by above completely.
i- So if you want to get only number of matches in per file of given date then following may help you in same.
Code:
DATE=$(date +%Y%m%d -d "1 day ago")
awk -F, -vy=DATE '$0 ~ y {a[FILENAME]++ } END { for (i in a) { print a[i] }}' /path1/usr/STAT/GPRS/ESTCOL_GPRS_2016*

ii- If you want to a collective SUM of all the files processed then following may help you in same(Not tested though).
Code:
DATE=$(date +%Y%m%d -d "1 day ago")
awk -F, -vy=DATE '$0 ~ y {a[FILENAME]++ } END { for (i in a) { SUM+=a[i]};print SUM}' /path1/usr/STAT/GPRS/ESTCOL_GPRS_2016*

Thanks,
R. Singh
# 5  
Old 12-09-2016
Hi RavinderSingh13, thank you very much,

I have made some tests with your help and for my case, It's much more comfortable for me this way:

Code:
awk 'BEGIN{FS=",";}{ if (substr($5,1,8)=="20161208") a[$2]++ } END { for (i in a) { print i "," a[i]}}'

The result, for a file with lines like this (
Code:
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161209150007,20161209,225,51535

):
Code:
ALK_01P,2540

But, the value "2.540" is not correct should be "2,498", if I modify the
Code:
a[$2]++

by this
Code:
a[$4]++

, this bring me all lines $4 that contains strings like this-->processed_cdr_20161209144744_00101038.cdr, if I sum all this lines, give the correct number 2,498, so, I guess the problem is the Increment mode, ++, I need the sum value of all this lines ($4)

Thank you very much



Moderator's Comments:
Mod Comment Please use CODE tags for data/results as well, as required by forum rules!

Last edited by RudiC; 12-10-2016 at 12:30 PM.. Reason: Added CODE tags.
# 6  
Old 12-10-2016
Hi,

Quote:
this bring me all lines $4 that contains strings like
It depends on what you have in $5 and if condition succeeds.
value is sumed up only when if condition is executed.

I tried as follows and looks fine :

Code:
cat f1
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161209150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161208144744_00101038.cdr,20161208150007 ,20161209,225,51535

Code:
cat f2
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161209150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161208144744_00101038.cdr,20161208150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20151209144744_00101038.cdr,20161209150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20151208144744_00101038.cdr,20161208150007 ,20161209,225,51535

Code:
cat f3
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161209150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161208150007 ,20161209,225,51535
COMGPRS,ALK_01P,COMGPRS_ALK_01P_095398.dat,processed_cdr_20161209144744_00101038.cdr,20161209150007 ,20161209,225,51535

Code:
awk -F, '{ if (substr($5,1,8)=="20161208") a[$2]++ } END { for (i in a) { print i "," a[i]}}' *

Gives below output because $5 is matched only in 4 lines matched from those 3 files and $2 is same in all those match.
Quote:
ALK_01P,4
Code:
awk -F, '{ if (substr($5,1,8)=="20161208") a[$4]++ } END { for (i in a) { print i "," a[i]}}' *

Gives below output because $5 is matched ( same as above ) only in 4 lines matched from those files BUT $4 is different from those match.
Code:
processed_cdr_20151208144744_00101038.cdr,1
processed_cdr_20161209144744_00101038.cdr,1
processed_cdr_20161208144744_00101038.cdr,2

If it does not help, please share sample input & expected output.
This User Gave Thanks to greet_sed For This Post:
# 7  
Old 12-10-2016
You lost me. I couldn't imagine WHAT you really need.

In post#1, you cat all matching files into a pipe to awk and then sum into array a indexed by FILENAME. As there's only ONE single stream (by cat), there will be just one element with index "-".
- This has been cured in the proposals by greet_sed and RavinderSingh13.

Still your problem is not clear.

The count of lines with substr ($5,1,8) matching $DATE CANNOT depend on the index ($2 / $4 ?) of the a array. WHY should there be different counts (2540 <-> 2498)?

And, 20161208 doesn't match $5 in your sample, so count must be zero.

Why don't you take a step back and start over, carefully (re)formulating your specification, supplying a reasonable set of input data and a desired output format, and the logics connecting the two?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ... (1 Reply)
Discussion started by: cmccabe
1 Replies

2. Shell Programming and Scripting

awk to print before and after lines then count of patterns

What i'm trying to do here is show X amount of lines before and after the string "serialNumber" is found. BEFORE=3 AFTER=2 gawk '{a=$0} {count=0} /serialNumber/ && /./ {for(i=NR-'"${BEFORE}"';i<=NR;i++){count++ ;print a}for(i=1;i<'"${AFTER}"';i++){getline; print ; count ++; print... (5 Replies)
Discussion started by: SkySmart
5 Replies

3. Shell Programming and Scripting

Count words/lines between two tags using awk

Is there an efficient awk that can count the number of lines that occur in between two tags. For instance, consider the following text: <s> Hi PP - my VBD - name DT - is NN - . SENT . </s> <s> Her PP - name VBD - is DT - the NN - same WRT - . SENT - </s> I am interested to know... (4 Replies)
Discussion started by: owwow14
4 Replies

4. Shell Programming and Scripting

Combine 4 awk pattern count statements into 1

Hello fellow awkers, I am trying to combine the following awk statements into 1 so that the results are more accurate: awk '/\=\+/ { count++ } END { print count}' filename awk '/\=\?/ { count++ } END { print count}' filename awk '/\=\-/ { count++ } END { print count}' filename awk... (8 Replies)
Discussion started by: ux4me
8 Replies

5. Shell Programming and Scripting

Multiple pattern matching using awk and getting count of lines

Hi , I have a file which has multiple rows of data, i want to match the pattern for two columns and if both conditions satisfied i have to add the counter by 1 and finally print the count value. How to proceed... I tried in this way... awk -F, 'BEGIN {cnt = 0} {if $6 == "VLY278" &&... (6 Replies)
Discussion started by: aemunathan
6 Replies

6. Shell Programming and Scripting

Count lines AWK

Hi, how can I count the lines where a word appears in a file, using AWK? Example: file.txt: gold 1588 France gold 1478 Spain silver 1596 France emerald 1584 UK diamond 1478 Germany gold 1639 USA Number of lines where gold in text is = 3 I've try this, but all I get is the number... (3 Replies)
Discussion started by: Godie
3 Replies

7. Shell Programming and Scripting

awk to count duplicated lines

We have an input file as follows: 2010-09-15-12.41.15 2010-09-15-12.41.15 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.24 2010-09-15-12.41.25 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26 2010-09-15-12.41.26... (3 Replies)
Discussion started by: ux4me
3 Replies

8. Shell Programming and Scripting

How to execute a no of SELECT COUNT(*) statements using a loop

HI Unix Gurus, I have a number of SELECT count(*) statements in an input file and I want to execute it using a shell script but one by one using loop in script.... How can I do this..... (7 Replies)
Discussion started by: ustechie
7 Replies

9. Shell Programming and Scripting

awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first). I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass). However, can it be done more optimally with a single AWK command? My AWK has improved... (11 Replies)
Discussion started by: Michael Stora
11 Replies

10. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies
Login or Register to Ask a Question