Awk: count unique elements in a field and sum their occurence across the entire file
Hi,
Sure it's an easy one, but it drives me insane.
input ("|" separated):
I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1.
I am trying to get this output (tab-separated; does not matter if sorted or not by $1):
What I tried so far:
But I get:
There is a conflict between the array 'count' (to count the letter only once per field per record) and the array 'total' (to sum up the number of letter in the file).
Hi all this is a UNIX question.
I have a large flat file with millions of records.
col1|col2|col3
1|a|b
2|c|d
3|e|f
3|g|h
footer****
I am supposed to calculate the sum of col1 1+2+3+3=9, count of col1 1,2,3,3=4, and distinct count of col1 1,2,3=c3
I would like it if you avoid... (4 Replies)
Hi,
I have to search and count unique occurence of DE numbers in bold below in a file which has content like below.
Proc Tran F-BUY
Item Tkey Q5JV
Item Tsid JTIZ9
Item Tdat 20091001
Item Tset 20091001
Item Tbkr 5
Item Tshs 2
Item Tprc 897.0
Item Tcom 2000.0
Item Tcm1 20091001... (6 Replies)
Dear all,
I have been trying to print an entire field, if the first line of the field is matching.
For example, my input looks something like this.
aaa ddd zzz
123 987 126
24 0.650 985
354 9864 0.32
0.333 4324 000
I am looking for a pattern,... (5 Replies)
I have a input.txt file which have 3 fields separate by a comma
place, os and timediff in seconds
tampa,win7, 2575
tampa,win7, 157619
tampa,win7, 3352
dallas,vista,604799
greenbay,winxp, 14400
greenbay,win7 , 518400
san jose,winxp, 228121
san jose,winxp, 70853
san jose,winxp, 193514... (5 Replies)
Hi I am trying to carry out a sum on a file (totals.txt).
The file looks like:
So far i have this command
this returns 20610
I however want it to return 000000206100
Any help would be great
thanks! (6 Replies)
Hello,
I`m a complete newbie to coding, please help with this problem.
I have multiple files in a directory, I have to loop through the contents of each file and extract number of unique isoforms in that file. Each file is tab delimited and only the line with the first parent (column 3)... (1 Reply)
Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6.
CA001011500 11111 11111 -9999 201301 AAA
CA001012040 11111 11111 -9999 201301 AAA
CA001012573 11111 11111 -9999 201301 BBB
CA001012710 11111 11111 -9999 201301... (4 Replies)
When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to... (8 Replies)
I cannot seem to get what should be a simple awk one-liner to work correctly and cannot figure out why. I would like to use patterns from a specific field in one file as regex to search for matching strings in the entire line ($0) of another file.
I would like to output the lines of File2 which... (1 Reply)
Discussion started by: jvoot
1 Replies
LEARN ABOUT MOJAVE
locale::codes::langext
Locale::Codes::LangExt(3pm) Perl Programmers Reference Guide Locale::Codes::LangExt(3pm)NAME
Locale::Codes::LangExt - standard codes for language extension identification
SYNOPSIS
use Locale::Codes::LangExt;
$lext = code2langext('acm'); # $lext gets 'Mesopotamian Arabic'
$code = langext2code('Mesopotamian Arabic'); # $code gets 'acm'
@codes = all_langext_codes();
@names = all_langext_names();
DESCRIPTION
The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in
the IANA language registry.
Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language
registry codes will be used.
SUPPORTED CODE SETS
There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or
a constant that is automatically exported by this module.
For example, the two are equivalent:
$lext = code2langext('acm','alpha');
$lext = code2langext('acm',LOCALE_LANGEXT_ALPHA);
The codesets currently supported are:
alpha
This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic.
This is the default code set.
ROUTINES
code2langext ( CODE [,CODESET] )
langext2code ( NAME [,CODESET] )
langext_code2code ( CODE ,CODESET ,CODESET2 )
all_langext_codes ( [CODESET] )
all_langext_names ( [CODESET] )
Locale::Codes::LangExt::rename_langext ( CODE ,NEW_NAME [,CODESET] )
Locale::Codes::LangExt::add_langext ( CODE ,NAME [,CODESET] )
Locale::Codes::LangExt::delete_langext ( CODE [,CODESET] )
Locale::Codes::LangExt::add_langext_alias ( NAME ,NEW_NAME )
Locale::Codes::LangExt::delete_langext_alias ( NAME )
Locale::Codes::LangExt::rename_langext_code ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::LangExt::add_langext_code_alias ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::LangExt::delete_langext_code_alias ( CODE [,CODESET] )
These routines are all documented in the Locale::Codes::API man page.
SEE ALSO
Locale::Codes
The Locale-Codes distribution.
Locale::Codes::API
The list of functions supported by this module.
http://www.iana.org/assignments/language-subtag-registry
The IANA language subtag registry.
AUTHOR
See Locale::Codes for full author history.
Currently maintained by Sullivan Beck (sbeck@cpan.org).
COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.18.2 2013-11-04 Locale::Codes::LangExt(3pm)