Visit Our UNIX and Linux User Community

Performance of calculating total number of matching records in multiple files

Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance of calculating total number of matching records in multiple files
# 1  
Old 08-12-2015
Performance of calculating total number of matching records in multiple files

Hello Friends,

I've been trying to calculate total number of a certain match in multiple data records files (DRs).

Let say I have a daily created folders for each day since the beginning of july like the following

drwxrwxrwx 2 mmsuper med 65536 Jul  1 23:59 20150701
drwxrwxrwx 2 mmsuper med 65536 Jul  2 23:59 20150702
drwxrwxrwx 2 mmsuper med 65536 Jul  3 23:59 20150703
drwxrwxrwx 2 mmsuper med 65536 Jul  4 23:59 20150704
drwxrwxrwx 2 mmsuper med 65536 Jul  5 23:59 20150705
drwxrwxrwx 2 mmsuper med 65536 Jul  6 23:59 20150706
drwxrwxrwx 2 mmsuper med 65536 Jul  7 23:59 20150707
drwxrwxrwx 2 mmsuper med 65536 Jul  8 23:59 20150708
drwxrwxrwx 2 mmsuper med 65536 Jul  9 23:59 20150709
drwxrwxrwx 2 mmsuper med 65536 Jul 10 23:59 20150710
drwxrwxrwx 2 mmsuper med 65536 Jul 11 23:59 20150711
drwxrwxrwx 2 mmsuper med 65536 Jul 12 23:59 20150712
drwxrwxrwx 2 mmsuper med 65536 Jul 13 23:59 20150713

Each folder has tousands of files such as :

-rw-r--r-- 1 mmsuper med  7691 Jul  1 15:30 cdr_2015070103_30306.txt
-rw-r--r-- 1 mmsuper med  2276 Jul  1 15:30 cdr_2015070103_30307.txt
-rw-r--r-- 1 mmsuper med  2633 Jul  1 15:30 cdr_2015070103_30308.txt
-rw-r--r-- 1 mmsuper med  2682 Jul  1 15:31 cdr_2015070103_30309.txt
-rw-r--r-- 1 mmsuper med  2622 Jul  1 15:31 cdr_2015070103_30310.txt
-rw-r--r-- 1 mmsuper med  5592 Jul  1 15:31 cdr_2015070103_30311.txt
-rw-r--r-- 1 mmsuper med  3029 Jul  1 15:31 cdr_2015070103_30313.txt
-rw-r--r-- 1 mmsuper med  6940 Jul  1 15:31 cdr_2015070103_30312.txt
-rw-r--r-- 1 mmsuper med  2610 Jul  1 15:31 cdr_2015070103_30314.txt
-rw-r--r-- 1 mmsuper med  5350 Jul  1 15:32 cdr_2015070103_30315.txt
-rw-r--r-- 1 mmsuper med  2949 Jul  1 15:32 cdr_2015070103_30316.txt

And unfortunately each file has hundreds or several tousands rows (data records) whose FS are commas and in which they have a whole soap request as a field. For example the following is one charging request record Smilie (I have shortened it)

20150712124542,20150712124542,,20150430000000,,20150712124542,0,,,1,<soap:Envelope xmlns:soap=""><soap:Body><ns2:chargeSubscription xmlns:ns2=
<faultString>Sending exception</faultString><msisdn>9647512971064</msisdn><paymentMethod>2</paymentMethod><accountId>23399568</accountId><customerId>23669092</customerId><imsi>418400202171510</imsi>
</receiverSubscriber><subscriptionId>0</subscriptionId><subscriptionStartDate>20150712124542</subscriptionStartDate><billCycleStartDate xmlns:xsi="" xsi:nil="true"/>,

I need to calculate total number of spesific matching i.e. $1 = charging, $9 = Subscriber.. and I calculated it but under just one folder:

nawk -F\, '{if ($1=="charging" && $9=="9647512971064" && $29~/<faultString>[Ss]ending [Ee]xception<\/faultString>/) then c++}END{print c}' cdr_201507*txt

Here my question:

As there are hunder tousand files under a few directories how should I calculate the total number of matching fastest way by executing only a one-liner command or script?

Should I first find the files with a find command in a FOR loop and trigger nawk afterwards like the following?

for j in `find . -type f -name "cdr_201507*txt" 2>/dev/null`; 
do nawk ...

I would appreciate your suggestions. I checked but could not find a spesific answer on our forum or some others.

Kind Regards

Last edited by EAGL€; 08-12-2015 at 11:08 AM..
# 2  
Old 08-12-2015
Your suggested method would start a new nawk on every file and probably be very slow.

Using + terminator instead of ; with find's -exec groups many files together. It's still possible that it ends up being several invocations of nawk because of argument limits. Then we need to somehow keep track of the sum and pass the value around. One solution is to have find just cat all the files and pipe it to nawk.

find . -type f -name 'cdr_201507*txt' -exec cat {} + | nawk -F, '
  $1=="charging" && $9=="9647512971064" && $29~/<faultString>[Ss]ending [Ee]xception<\/faultString>/ {c++}
  END{print c}'

Previous Thread | Next Thread
Test Your Knowledge in Computers #667
Difficulty: Easy
IEEE 802 is a family of IEEE standards dealing with local area networks and metropolitan area networks.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Matching fields between two files, repeated records

In two previous posts (here) and (here), I received help from forum members comparing multiple fields across two files and selectively printing portions of each as output based upon would-be matches using awk. I had been fairly comfortable populating awk arrays with fields and using awk's special... (3 Replies)
Discussion started by: jvoot
3 Replies

2. Shell Programming and Scripting

Compare two files with different number of records and output only the Extra records from file1

Hi Freinds , I have 2 files . File 1 |nag|HYd|1|Che |esw|Gun|2|hyd |pra|bhe|3|hyd |omu|hei|4|bnsj |uer|oeri|5|uery File 2 |nag|HYd|1|Che |esw|Gun|2|hyd |uer|oi|3|uery output : (9 Replies)
Discussion started by: i150371485
9 Replies

3. HP-UX

Total number of files in a FS

Hello people, On HP-UX B.11.11 U 9000/800 How can I have in aprox. the total number of files in a specific FS? Is the number of used inodes a rough estimation of my total number of files? Server1 /Data:df -i . /Data (/dev/vg_Data/lvol1 ) : 18292960 total i-nodes 15800945 free... (3 Replies)
Discussion started by: drbiloukos
3 Replies

4. Emergency UNIX and Linux Support

Calculating total space in GB for all files with typical pattern

Hi Experts, In a particular dir, I have many files *AJAY*. How can I get total size of all such files. I tried du -hs *AJAY* but it gave me individual size of all files. All I require is summation of all. Thanks, Ajay (4 Replies)
Discussion started by: ajaypatil_am
4 Replies

5. UNIX for Dummies Questions & Answers

Write the total number of rows in multiple files into another file

Hello Friends, I know you all are busy and inteligent too... I am stuck with one small issue if you can help me then it will be really great. My problem is I am having some files i.e. Input.txt1 Input.txt2 Input.txt3 Now my task is I need to check the total number of rows in... (4 Replies)
Discussion started by: malaya kumar
4 Replies

6. Shell Programming and Scripting

Nawk, creating a variable total from multiple lines(records)

Good Morning/Afternoon All, I am having some trouble creating a variable called "total" to display the sum of the values in a specific field, $6 for example. The data I am working on is in the following form: John Doe:(555) 555-5555:1:2:3 Jane Doe:(544) 444-5556:4:5:6 Moe Doe:(654)... (2 Replies)
Discussion started by: SEinT
2 Replies

7. Shell Programming and Scripting

Calculating number of records by field

Hi, I have CSV file which looks like below, i want to calulate number of records for each brand say SOLO_UNBEATABLE E and SOLO_UNBEATABLE F combined and record count is say 20 . i want to calculate for each brand, and here only first record will have all data and rest of record for the brand... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

8. Shell Programming and Scripting

Creating a file with matching records from two other files

Hi All, I have 2 files (file1 & file2). File1 and File2 have m and n columns respectively I have to compare value in column1 of file1 with file2 and find line(s) from file2 matching column1 value. The value can be in any column in the matching lines of file2. The output should be... (10 Replies)
Discussion started by: Swagi
10 Replies

9. Shell Programming and Scripting

Calculate total space, total used space and total free space in filesystem names matching keyword

Good afternoon! Im new at scripting and Im trying to write a script to calculate total space, total used space and total free space in filesystem names matching a keyword (in this one we will use keyword virginia). Please dont be mean or harsh, like I said Im new and trying my best. Scripting... (4 Replies)
Discussion started by: bigben1220
4 Replies

10. UNIX for Dummies Questions & Answers

grep running total/ final total across multiple files

Ok, another fun hiccup in my UNIX learning curve. I am trying to count the number of occurrences of an IP address across multiple files named example.hits. I can extract the number of occurrences from the files individually but when you use grep -c with multiple files you get the output similar to... (5 Replies)
Discussion started by: MrAd
5 Replies

Featured Tech Videos