Count total duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count total duplicates
# 8  
Old 03-25-2015
Hello Milkoz,

Then following may help you in same.
Code:
awk '{A[$1]++} END{;for(i in A){if(A[i]>1){S=S?S+A[i]-1:A[i]-1;}};print S"/" NR}'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 9  
Old 03-25-2015
Or:
Code:
awk 'l[$0]++{d++}END{printf("%d/%d\n",d,NR)}' file.txt

This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 03-25-2015
Perfect! Thank you!

---------- Post updated at 08:05 PM ---------- Previous update was at 12:26 AM ----------

Quote:
Originally Posted by Don Cragun
Or:
Code:
awk 'l[$0]++{d++}END{printf("%d/%d\n",d,NR)}' file.txt

Please, Don Cragun, can you explain it? I suppose you create a list, and add each line in it. If the line is already in the list, you increase the d variable. Is it correct?
# 11  
Old 03-25-2015
Quote:
Originally Posted by mikloz
Perfect! Thank you!

Please, Don Cragun, can you explain it? I suppose you create a list, and add each line in it. If the line is already in the list, you increase the d variable. Is it correct?
Yes, the script:
Code:
awk 'l[$0]++{d++}END{printf("%d/%d\n",d,NR)}' file.txt

can be rewritten as:
Code:
awk '		# Run awk with the following script...
l[$0]++ {	# Set array l[] indexed by the contents of the current input line to
		# the number of times this line has been seen so far and return the
		# number of times this line had been seen before this line.  If the
		# value returned is not zero and is not the empty string, execute
		# the commands in this section.  (This will happen any time this line
		# has been seen before.)

	d++	# Increment the number of duplicates seen.
}
END {		# After all lines have been read from all input files given to
		# this invocation of awk, run the commands in this section.

	printf("%d/%d\n, d, NR)	# Print the number of duplicates seen and
				# the Number of Records read from all of the input files
				# given to this invocation of awk.
}' file.txt	# End the script and specify the input file(s) to be processed by this
		# invocation of awk.

I hope this helps.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 12  
Old 03-26-2015
Got it! Thanks!
# 13  
Old 03-26-2015
Quote:
Originally Posted by Don Cragun
Yes, the script:
Code:
awk 'l[$0]++{d++}END{printf("%d/%d\n",d,NR)}' file.txt

can be rewritten as:
Code:
awk '		# Run awk with the following script...
l[$0]++ {	# Set array l[] indexed by the contents of the current input line to
		# the number of times this line has been seen so far and return the
		# number of times this line had been seen before this line.  If the
		# value returned is not zero and is not the empty string, execute
		# the commands in this section.  (This will happen any time this line
		# has been seen before.)

	d++	# Increment the number of duplicates seen.
}
END {		# After all lines have been read from all input files given to
		# this invocation of awk, run the commands in this section.

	printf("%d/%d\n, d, NR)	# Print the number of duplicates seen and
				# the Number of Records read from all of the input files
				# given to this invocation of awk.
}' file.txt	# End the script and specify the input file(s) to be processed by this
		# invocation of awk.

I hope this helps.
great explanation, thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count and keep duplicates in Column

Hi folks, I've got a csv file called test.csv Column A Column B Apples 1900 Apples 1901 Pears 1902 Pears 1903I want to count and keep duplicates in the first column. Desired output Column A Column B Column C Apples 2 1900 Apples ... (5 Replies)
Discussion started by: pshields1984
5 Replies

2. UNIX for Dummies Questions & Answers

In ls -l remove total count

Hi All, When i give ls -ltr i get 'total 10' like this along with files long listing. is there any option in ls command to remove this line or do we need use head -1 command only. $ls -ltr total 45 -rw-r--r-- 1 abc g1 0 Jul 17 07:20 0 -rw-r--r-- 1 abc g1 744 May 9 12:10 a -rw-r--r--... (1 Reply)
Discussion started by: HemaV
1 Replies

3. Shell Programming and Scripting

Finding total count of a word.

i want to find the no:of occurrences of a word in a file cat 1.txt unix script unix script unix script unix script unix script unix script unix script unix script unix unix script unix script unix script now i want to find , how many times 'unix' was occurred please help me thanks... (6 Replies)
Discussion started by: mahesh1987
6 Replies

4. UNIX for Dummies Questions & Answers

Grep and Count Duplicates

I have a delimited file (by |), and the second field is made out of Surnames. Is it possible to list the surnames together with their count of occurances. For example, image the first two lines are the following: Joe | Doe | 30 Jane | Doe | 28 Peter | Smith | 25 John | Jones | 26 I... (2 Replies)
Discussion started by: mouthpiec
2 Replies

5. Shell Programming and Scripting

Getting Data Count by Removing Duplicates

Hi Experts, I have many CSV data files in the below format (Example) :- Doc Number,Line Number,Condition Number 111,10,ABC 111,10,PQR 111,10,XYZ 222,20,DEF 222,20,EFG 222,20,HIJ 333,30,CCC 333,30,TCP Now, for the above data i want to get the row count based on the Doc Number & Line... (9 Replies)
Discussion started by: naikamit
9 Replies

6. Shell Programming and Scripting

total count of a word in the files

Hi Friends, Need help regarding counting the word "friend" in files test1.txt and test2.txt. ( there is no gap/space between word ) cat test1.txt himynameisrajandiamfriendofrajeshfriend wouldyouliketobemyfriend. cat test2.txt himynameisdostandiamfriendofdostfriend... (2 Replies)
Discussion started by: forroughuse
2 Replies

7. Shell Programming and Scripting

Total Count using AWK

Hi Everybody, I have the following example file... 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1... (8 Replies)
Discussion started by: sraj142
8 Replies

8. Shell Programming and Scripting

Bogus Total count

I have a shell script that I am pulling different zip file packages and totaling how many of each type of package is in the directory. I get a bogus total count of one in the middle of my output file (highlighted in RED) and not sure why, also would like to get a grand total of all files but not... (2 Replies)
Discussion started by: freddie999
2 Replies

9. UNIX for Advanced & Expert Users

total count of inodes in a mount

is there Any command to get total count and number of free inodes on a mount. please help (5 Replies)
Discussion started by: pharos467
5 Replies
Login or Register to Ask a Question