I think I understand now! Here's code with changes, and I found the performance bug. I tested with 1.5 million records which took just under 4 seconds on my small laptop. Rounding up to 4 seconds that should be just under 100 seconds to do the 35 million -- a few seconds faster than the 12 hours (and lots of memory) that the bug was causing it to take
The script still wants a 0 to show number of addresses seen for the first time (over all of the input) rather than over the longer interval. The number of unique addresses (col 3) is the number observed in the current interval, without regard to the previous interval.
Have fun and let me know how it goes!
EDIT: It did just occur to me that my performance tests were writing output to /dev/null, so your times might be longer given that it will need to do real I/O to write the results someplace. Still, should be better than 12 hours.
Last edited by agama; 02-23-2012 at 10:31 PM..
Reason: Additional thought
I'm trying to figure out a way to count the number of words in the follwing file:
cal 2002 > file1
Is there anyway to do this without using wc but instead using the cut command? (1 Reply)
Hi,
Please help me in counting the below records(1st field) from samplefile:
Expected output:
Count Descr
-------------------------------------------
7 Mean manager
14 ... (7 Replies)
Please find the below program. It contains the purpose of the program itself.
/* Program : Write a program to count the number of words in a given text file */
/* Date : 12-June-2010 */
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
int main( int argc, char *argv )
{... (6 Replies)
Hi,
I have a big file (~960MB) having epoch time values (~50 million entries) which looks like
897393601
897393601
897393601
897393601
897393602
897393602
897393602
897393602
897393602
897393603
897393603
897393603
897393603
and so on....each time stamp has more than one... (6 Replies)
Hi,
I have a very big (with around 1 million entries) txt file with IPv4 addresses in the standard format, i.e. a.b.c.d
The file looks like
10.1.1.1
10.1.1.1
10.1.1.1
10.1.2.4
10.1.2.4
12.1.5.6
.
.
.
.
and so on....
There are duplicate/multiple entries for some IP... (3 Replies)
Ok say I wanted to count every Y in a data file.
Then set Y as my delimiter so that I can separate my file by taking all the contents that occur BEFORE the first Y and store them in a variable so that I may use this content later on in my program. Then I could do the same thing with the next Y's... (5 Replies)
Hi All,
I have a small problem of counting the number of times a particular entry that exists in a horizontal string of elements and a vertical feild (column of entries). For example AATGGTCCTGExpected outputA=2 C=2 G=3 T=3 I have an idea to do this but I dont know how to do that if these entries... (1 Reply)
I want to count lines of a file using AWK (only) and not in the END part like this awk 'END{print FNR}' because I want to use it.
Does anyone know of a way?
Thanks a lot. (7 Replies)
Hi All ,
I got stuck on the below scenario.If anyone can help me ,that will be really helpful.
I have a target hdfs file layout.I need to know the no of column in that file.
Target_RECRD_layout
{
ABC_ID EN NOTNULLABLE,
ABC_COUNTRY CHARACTER ENCODING ASCII NOTNULLABLE,
... (5 Replies)
Dear community,
I have an already filtered log on my machine, something like:
WARN 2016.03.10 10:59:01.136 logging.LogAlarmListener raise ALARMWARNINGRAISED Alarm NODE-NetworkAccessGroup.Client.41283 SERVICEDOWN-41283.WC severity WARNING raised: Service 41283.WC protocoltype client is down... (13 Replies)