Counting duplicate entries in a file using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Counting duplicate entries in a file using awk
# 1  
Old 10-13-2010
Counting duplicate entries in a file using awk

Hi,

I have a very big (with around 1 million entries) txt file with IPv4 addresses in the standard format, i.e. a.b.c.d

The file looks like

10.1.1.1
10.1.1.1
10.1.1.1
10.1.2.4
10.1.2.4
12.1.5.6
.
.
.
.

and so on....

There are duplicate/multiple entries for some IP addresses. I want an awk/sed script (since the file is too big) to count the number of time each IP is repeated and print (write to the output file) in the following format:

10.1.1.1 3
10.1.2.4 2
12.1.5.6 1
.
.
.

and so on...

Any help would be highly appreciated.

Thanks !
# 2  
Old 10-13-2010
Is file sorted? Have you considered "uniq -c"?
# 3  
Old 10-13-2010
Using awk
Code:
awk 'NF{a[$NF]++}END{for(i in a)print i,a[i]}' file | sort

This User Gave Thanks to danmero For This Post:
# 4  
Old 10-13-2010
No the file is not sorted !

Thanks !!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting lines in a file using awk

I want to count lines of a file using AWK (only) and not in the END part like this awk 'END{print FNR}' because I want to use it. Does anyone know of a way? Thanks a lot. (7 Replies)
Discussion started by: guitarist684
7 Replies

2. Shell Programming and Scripting

Shell script with awk command for counting in a file

Hi, I hope you can help me with the awk command in shell scripting. I want to do the following, but it doesn't work. for i in $REF1 $REF2 $REF3; do awk '{if($n>=0 && $n<=50000){count+=1}} END{print count}' ${DIR}${i} >${DIR}${i}_count.txt done REF1 to REF3 are only variables for .txt... (1 Reply)
Discussion started by: y.g.
1 Replies

3. Shell Programming and Scripting

How to delete duplicate entries without using awk command?

Hello.. I am trying to remove the duplicate entries in a log files and used the the below shell script to do the same. awk '!x++' <filename> Can I do without using the awk command and the regex? I do not want to start the search from the beginning of the line in the log file as it contains... (9 Replies)
Discussion started by: sandeepcm
9 Replies

4. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Hi, I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583... (22 Replies)
Discussion started by: Amit Pande
22 Replies

5. Shell Programming and Scripting

Counting entries in a file

Hi, I have a very large two column log file in the following format: # Epoch Time IP Address 899726401 112.254.1.0 899726401 112.254.1.0 899726402 154.162.38.0 899726402 160.114.12.0 899726402 165.161.7.0 899726403 ... (39 Replies)
Discussion started by: sajal.bhatia
39 Replies

6. Shell Programming and Scripting

counting particular record format in a file using AWK

I am trying to count records of particular format from a file and assign it to a variable. I tried below command br_count=wc -l "inputfile.dat"| awk -F"|" '{if (NF != "14") print }' but I amnot able to get it done. Please share me some idea how to get it done. Thanks in advance (7 Replies)
Discussion started by: siteregsam
7 Replies

7. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Hi, I have a file with date in it like: UserString1 UserString2 UserString3 UserString4 UserString5 I need two entries for each line so it reads like UserString1 UserString1 UserString2 UserString2 etc. Can someone help me with the awk command please? Thanks (4 Replies)
Discussion started by: Grueben
4 Replies

8. Shell Programming and Scripting

Counting multiple entries in a file using awk

Hi, I have a big file (~960MB) having epoch time values (~50 million entries) which looks like 897393601 897393601 897393601 897393601 897393602 897393602 897393602 897393602 897393602 897393603 897393603 897393603 897393603 and so on....each time stamp has more than one... (6 Replies)
Discussion started by: sajal.bhatia
6 Replies

9. Programming

Counting duplicate chars in C

Hi, im trying to create a C program that will count the number of characters, duplicate characters and non duplicate characters in a file and output this to the screen. Here is my code so far: #include <stdio.h> int main( void ) { char c; int duplicate = 0; int nonduplicate = 0; int... (3 Replies)
Discussion started by: DavoMan
3 Replies

10. UNIX for Dummies Questions & Answers

Counting The Number Of Duplicate Lines In a File

Hello. First time poster here. I have a huge file of IP numbers. I am trying to output only the class b of the IPs and rank them by most common and output the total # of duplicate class b's before the class b. An example is below: 12.107.1.1 12.107.9.54 12.108.3.89 12.109.109.4 12.109.6.3 ... (2 Replies)
Discussion started by: crunchtime
2 Replies
Login or Register to Ask a Question