Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Count occurrence of string (based on type) in a column using awk Post 302937066 by Gussifinknottle on Tuesday 3rd of March 2015 05:08:22 AM
Old 03-03-2015
Count occurrence of string (based on type) in a column using awk

Hello,

I have a table that looks like what is shown below:

Code:
AA
BB
CC
XY
PQ
RS
AA
BB
CC
XY
RS

I would like the total counts depending on the set they belong to:

if search pattern is in
Code:
{AA, BB, CC} --> count them as Type1 | wc -l

or if pattern is in
Code:
{XY, RS, PQ} --> count them as Type2 | wc -l

So, w.r.t the above table the output should be Type1 = 6 & Type2 Counts = 5

I can do a simple awk for each type and then add them together, but it does not seem very efficient. Any (one liner)suggestions?

Many thanks!
~Guss

Last edited by Gussifinknottle; 03-08-2015 at 06:46 AM.. Reason: Typo
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove characters from string based on occurrence of a string

Hello Folks.. I need your help .. here the example of my problem..i know its easy..i don't all the commands in unix to do this especiallly sed...here my string.. dwc2_dfg_ajja_dfhhj_vw_dec2_dfgh_dwq desired output is.. dwc2_dfg_ajja_dfhhj it's a simple task with tail... (5 Replies)
Discussion started by: victor369
5 Replies

2. Shell Programming and Scripting

Count no of occurrence of the strings based on column value

Can anyone help me to count number of occurrence of the strings based on column value. Say i have 300 files with 1000 record length from which i need to count the number of occurrence string which is existing from 213 to 219. Some may be unique and some may be repeated. (8 Replies)
Discussion started by: zooby
8 Replies

3. Emergency UNIX and Linux Support

awk cut column based on string

Using awk I required to cut out column contain word "-Tag" regardles of any order of contents and case INsensitive -Tag:messages -P:/var/log/messages -P:/var/log/maillog -K:Error -K:Warning -K:critical Please Guide ...... --Shirish Shukla ---------- Post updated at 05:58 AM... (15 Replies)
Discussion started by: Shirishlnx
15 Replies

4. Shell Programming and Scripting

Count number of occurrence of a string in file

if there's a file containing: money king money queen money cat money also money king all those strings are on one line in the file. how can i find out how many times "money king" shows up in the line? egrep -c "money king" wont work. (7 Replies)
Discussion started by: SkySmart
7 Replies

5. Shell Programming and Scripting

Count occurrence of string in a column using awk

Hi, I want to count the occurrences of strings in a column and display as in example below: Input: get1 345 789 098 get2 567 982 090 fet4 777 610 632 get1 800 544 230 get1 600 788 451 get2 892 321 243 get1 673 111 235 fet3 789 220 278 fet4 768 222 341 output: 4 get1 345 789... (7 Replies)
Discussion started by: aydj
7 Replies

6. UNIX for Dummies Questions & Answers

[Solved] Awk: count occurrence of each character for every field

Hi, let's say an input looks like: A|C|C|D A|C|I|E A|B|I|C A|T|I|B as the title of the thread explains, I am trying to get something like: 1|A=4 2|C=2|B=1|T=1 3|I=3|C=1 4|D=1|E=1|C=1|B=1 i.e. a count of every character in each field (first column of output) independently, sorted... (4 Replies)
Discussion started by: beca123456
4 Replies

7. Shell Programming and Scripting

Insert Columns before the last Column based on the Count of Delimiters

Hi, I have a requirement where in I need to insert delimiters before the last column of the total delimiters is less than a specified number. Say if the delimiters is less than 139, I need to insert 2 columns ( with blanks) before the last field awk -F 'Ç' '{ if (NF-1 < 139)} END { "Insert 2... (5 Replies)
Discussion started by: arunkesi
5 Replies

8. Programming

awk to count occurrence of strings and loop for multiple columns

Hi all, If i would like to process a file input as below: col1 col2 col3 ...col100 1 A C E A ... 3 D E G A 5 T T A A 6 D C A G how can i perform a for loop to count the occurences of letters in each column? (just like uniq -c ) in every column. on top of that, i would also like... (8 Replies)
Discussion started by: iling14
8 Replies

9. Shell Programming and Scripting

Count of occurrence in particular column of the file.

Hi All, let's say an input looks like: C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11 ---------------------------------- 1|0123452|C501|Z|Z|Z|E|E|E|E|E|E|E 1|0156123|C501|X|X|X|E|E|E|E|E|E|E 1|0178903|C501|Z|Z|Z|E|E|E|E|E|E|E 1|0127896|C501|Z|Z|Z|E|E|E|E|E|E|E 1|0981678|C501|X|X|X|E|E|E|E|E|E|E ... (6 Replies)
Discussion started by: suresh_target
6 Replies

10. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 04:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy