Awk: count unique elements in a field and sum their occurence across the entire file

Top Forums UNIX for Beginners Questions & Answers Awk: count unique elements in a field and sum their occurence across the entire file

Prev Next

04-18-2018

Registered User

123, 1

Join Date: Apr 2012

Last Activity: 3 February 2020, 7:11 AM EST

Posts: 123

Thanks Given: 70

Thanked 1 Time in 1 Post

Very useful information. Thanks !

beca123456

View Public Profile for beca123456

Find all posts by beca123456

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Use strings from nth field from one file to match strings in entire line in another file, awk

I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file. I would like to output the lines of File2 which...

2. Shell Programming and Scripting

Count of unique lines in field 4

When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to...

3. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301...

4. Shell Programming and Scripting

Looping through entire directory and count unique values

Hello, I`m a complete newbie to coding, please help with this problem. I have multiple files in a directory, I have to loop through the contents of each file and extract number of unique isoforms in that file. Each file is tab delimited and only the line with the first parent (column 3)...

5. Shell Programming and Scripting

awk sum entire string

Hi I am trying to carry out a sum on a file (totals.txt). The file looks like: So far i have this command this returns 20610 I however want it to return 000000206100 Any help would be great thanks!

6. Shell Programming and Scripting

awk if statement not printing entire field

I have an input that looks like this: chr1 mm9_knownGene utr3 3204563 3206102 0 - . gene_id "Xkr4"; transcript_id "uc007aeu.1"; chr1 mm9_knownGene utr3 4280927 4283061 0 - . gene_id "Rp1"; transcript_id "uc007aew.1"; chr1 mm9_knownGene ...

7. Shell Programming and Scripting

awk and count sum ?

I have a input.txt file which have 3 fields separate by a comma place, os and timediff in seconds tampa,win7, 2575 tampa,win7, 157619 tampa,win7, 3352 dallas,vista,604799 greenbay,winxp, 14400 greenbay,win7 , 518400 san jose,winxp, 228121 san jose,winxp, 70853 san jose,winxp, 193514...

8. Shell Programming and Scripting

Printing entire field, if at least one row is matching by AWK

Dear all, I have been trying to print an entire field, if the first line of the field is matching. For example, my input looks something like this. aaa ddd zzz 123 987 126 24 0.650 985 354 9864 0.32 0.333 4324 000 I am looking for a pattern,...

9. UNIX for Dummies Questions & Answers

How to search unique occurence in a file?

Hi, I have to search and count unique occurence of DE numbers in bold below in a file which has content like below. Proc Tran F-BUY Item Tkey Q5JV Item Tsid JTIZ9 Item Tdat 20091001 Item Tset 20091001 Item Tbkr 5 Item Tshs 2 Item Tprc 897.0 Item Tcom 2000.0 Item Tcm1 20091001...

10. Shell Programming and Scripting

Getting Sum, Count and Distinct Count of a file

Hi all this is a UNIX question. I have a large flat file with millions of records. col1|col2|col3 1|a|b 2|c|d 3|e|f 3|g|h footer**** I am supposed to calculate the sum of col1 1+2+3+3=9, count of col1 1,2,3,3=4, and distinct count of col1 1,2,3=c3 I would like it if you avoid...

LEARN ABOUT FREEBSD

pagesizes

PAGESIZES(5)							File Formats Manual						      PAGESIZES(5)

NAME

       pagesizes - HylaFAX page size definitions

DESCRIPTION

       The  pagesizes database defines the page dimensions and guaranteed reproducible areas (GRA) for well-known page sizes.  The GRA is the por-
       tion of the page that is guaranteed to be imaged during facsimile transmission.	This region is typically less than the	full  page  dimen-
       sions because of paper roller contacts and other mechanical aspects of the printing process in a facsimile machine.

       All  HylaFAX  programs that require page size information read the information from this database using a page size name.  Documents should
       be prepared such that the full page dimensions are employed with the imaged area contained within the GRA.

       The system-wide default page size to use in preparing documents for transmission is given by the ``default'' entry in the  database.   (NB:
       the default entry should be placed last so that inverse matches find the real page size name and not the default entry.)

       The page size database is an ASCII file with the following format.  Each entry consists of whitespace-separated fields:
       name  abbrev  width  height  gra-width  gra-height  top-margin  left-margin
       Fields have the following interpretation:

       name	   the full name for the page size; e.g. ISO A4;

       abbrev	   an abbreviated version of the full name for use in compact listings such as the receive queue listing printed by faxstat(1);

       width	   the full width of the page;

       height	   the full height of the page;

       gra-width   the width of the GRA;

       gra-height  the height of the GRA;

       top-margin  the margin between the top of the full page and the top of the GRA;

       left-margin the margin between the left side of the full page the left side of the GRA.

       The  first  two fields must be separated from the subsequent fields by a tab character (possibly followed by more whitespace); this is done
       to easily permit blank characters to be included in names.  Otherwise fields can be separated by any amount  of	any  kind  of  whitespace.
       Numbers	are all base 10 and in basic measurement units (BMU); defined as 1/1200 x 25.4 millimeters for paper output with a scale factor of
       one.  All fields must be present on a single line; otherwise the entry is ignored.  Comments are introduced by the ``#'' character and con-
       tinue to the end the line.

       Page  size database lookups are either by name or by page dimensions.  Lookups by name are done by sequentially scanning the entries in the
       database for the first entry that has a matching abbreviation or substring of the page size name field.	The string comparisons ignore case
       so,  for example, ``a4'' would match a full name of ``ISO A4''.	Lookups by dimension scan the entire database and return the page with the
       closest dimensions using a straightforward distance metric.  If the difference in dimensions of the closest match is greater than 1/2  inch
       on each side, then no page entry is returned for a lookup by dimension.

SEE ALSO

       hylafax-client(1), faxmail(1), faxstat(1), sendfax(1), sgi2fax(1), textfmt(1).

								 December 5, 1994						      PAGESIZES(5)

UNIX for Beginners Questions & Answers