Sponsored Content
Full Discussion: Count total duplicates
Top Forums Shell Programming and Scripting Count total duplicates Post 302939600 by ken6503 on Thursday 26th of March 2015 10:55:07 PM
Old 03-26-2015
Quote:
Originally Posted by Don Cragun
Yes, the script:
Code:
awk 'l[$0]++{d++}END{printf("%d/%d\n",d,NR)}' file.txt

can be rewritten as:
Code:
awk '		# Run awk with the following script...
l[$0]++ {	# Set array l[] indexed by the contents of the current input line to
		# the number of times this line has been seen so far and return the
		# number of times this line had been seen before this line.  If the
		# value returned is not zero and is not the empty string, execute
		# the commands in this section.  (This will happen any time this line
		# has been seen before.)

	d++	# Increment the number of duplicates seen.
}
END {		# After all lines have been read from all input files given to
		# this invocation of awk, run the commands in this section.

	printf("%d/%d\n, d, NR)	# Print the number of duplicates seen and
				# the Number of Records read from all of the input files
				# given to this invocation of awk.
}' file.txt	# End the script and specify the input file(s) to be processed by this
		# invocation of awk.

I hope this helps.
great explanation, thanks.
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

total count of inodes in a mount

is there Any command to get total count and number of free inodes on a mount. please help (5 Replies)
Discussion started by: pharos467
5 Replies

2. Shell Programming and Scripting

Bogus Total count

I have a shell script that I am pulling different zip file packages and totaling how many of each type of package is in the directory. I get a bogus total count of one in the middle of my output file (highlighted in RED) and not sure why, also would like to get a grand total of all files but not... (2 Replies)
Discussion started by: freddie999
2 Replies

3. Shell Programming and Scripting

Total Count using AWK

Hi Everybody, I have the following example file... 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1 199|TST-GURGAON|GURGAON|1... (8 Replies)
Discussion started by: sraj142
8 Replies

4. Shell Programming and Scripting

total count of a word in the files

Hi Friends, Need help regarding counting the word "friend" in files test1.txt and test2.txt. ( there is no gap/space between word ) cat test1.txt himynameisrajandiamfriendofrajeshfriend wouldyouliketobemyfriend. cat test2.txt himynameisdostandiamfriendofdostfriend... (2 Replies)
Discussion started by: forroughuse
2 Replies

5. Shell Programming and Scripting

Getting Data Count by Removing Duplicates

Hi Experts, I have many CSV data files in the below format (Example) :- Doc Number,Line Number,Condition Number 111,10,ABC 111,10,PQR 111,10,XYZ 222,20,DEF 222,20,EFG 222,20,HIJ 333,30,CCC 333,30,TCP Now, for the above data i want to get the row count based on the Doc Number & Line... (9 Replies)
Discussion started by: naikamit
9 Replies

6. UNIX for Dummies Questions & Answers

Grep and Count Duplicates

I have a delimited file (by |), and the second field is made out of Surnames. Is it possible to list the surnames together with their count of occurances. For example, image the first two lines are the following: Joe | Doe | 30 Jane | Doe | 28 Peter | Smith | 25 John | Jones | 26 I... (2 Replies)
Discussion started by: mouthpiec
2 Replies

7. Shell Programming and Scripting

Finding total count of a word.

i want to find the no:of occurrences of a word in a file cat 1.txt unix script unix script unix script unix script unix script unix script unix script unix script unix unix script unix script unix script now i want to find , how many times 'unix' was occurred please help me thanks... (6 Replies)
Discussion started by: mahesh1987
6 Replies

8. UNIX for Dummies Questions & Answers

In ls -l remove total count

Hi All, When i give ls -ltr i get 'total 10' like this along with files long listing. is there any option in ls command to remove this line or do we need use head -1 command only. $ls -ltr total 45 -rw-r--r-- 1 abc g1 0 Jul 17 07:20 0 -rw-r--r-- 1 abc g1 744 May 9 12:10 a -rw-r--r--... (1 Reply)
Discussion started by: HemaV
1 Replies

9. Shell Programming and Scripting

Count and keep duplicates in Column

Hi folks, I've got a csv file called test.csv Column A Column B Apples 1900 Apples 1901 Pears 1902 Pears 1903I want to count and keep duplicates in the first column. Desired output Column A Column B Column C Apples 2 1900 Apples ... (5 Replies)
Discussion started by: pshields1984
5 Replies
FDUPES(1)						      General Commands Manual							 FDUPES(1)

NAME
fdupes - finds duplicate files in a given set of directories SYNOPSIS
fdupes [ options ] DIRECTORY ... DESCRIPTION
Searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison. OPTIONS
-r --recurse for every directory given follow subdirectories encountered within -R --recurse: for each directory given after this option follow subdirectories encountered within (note the ':' at the end of option; see the Examples section below for further explanation) -s --symlinks follow symlinked directories -H --hardlinks normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behav- ior -n --noempty exclude zero-length files from consideration -f --omitfirst omit the first file in each set of matches -A --nohidden exclude hidden files from consideration -1 --sameline list each set of matches on a single line -S --size show size of duplicate files -m --summarize summarize duplicate files information -q --quiet hide progress indicator -d --delete prompt user for files to preserve, deleting all others (see CAVEATS below) -N --noprompt when used together with --delete, preserve the first file in each set of duplicates and delete the others without prompting the user -v --version display fdupes version -h --help displays help SEE ALSO
md5sum(1) NOTES
Unless -1 or --sameline is specified, duplicate files are listed together in groups, each file displayed on a separate line. The groups are then separated from each other by blank lines. When -1 or --sameline is specified, spaces and backslash characters () appearing in a filename are preceded by a backslash character. EXAMPLES
fdupes a --recurse: b will follow subdirectories under b, but not those under a. fdupes a --recurse b will follow subdirectories under both a and b. CAVEATS
If fdupes returns with an error message such as fdupes: error invoking md5sum it means the program has been compiled to use an external program to calculate MD5 signatures (otherwise, fdupes uses internal routines for this purpose), and an error has occurred while attempting to execute it. If this is the case, the specified program should be properly installed prior to running fdupes. When using -d or --delete, care should be taken to insure against accidental data loss. When used together with options -s or --symlink, a user could accidentally preserve a symlink while deleting the file it points to. Furthermore, when specifying a particular directory more than once, all files within that directory will be listed as their own duplicates, leading to data loss should a user preserve a file without its "duplicate" (the file itself!). AUTHOR
Adrian Lopez <adrian2@caribe.net> FDUPES(1)
All times are GMT -4. The time now is 01:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy