Visit Our UNIX and Linux User Community


Count the repetition of a Field in File


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count the repetition of a Field in File
# 1  
Old 08-16-2009
Data Count the repetition of a Field in File

Hi,
Thanks for keeping such a help-full platform active and live always.
I am new to this forum and to unix also.
Want to know how to count the repetition of a field in a file. Anything of awk, sed, perl, shell script, solution are welcomed.

Input File------------------
abc,12345
pqr,51223
mno,72121
stu,34567
aaa,12345
pqp,11224
plm,72121
zxy,88888
fgh,12345
jkl,88888

Output File-----------------
abc,12345,3
pqr,51223,1
mno,72121,2
stu,34567,1
aaa,12345,3
pqp,11224,1
plm,72121,2
zxy,88888,2
fgh,12345,3
jkl,88888,2

As 12345 is repeated 3 times in files as second field, so wherever it is "3" is suffixed as last field.
Thanks for the solution in advance.

Ace
# 2  
Old 08-16-2009
Here is what I get so far. Of course, you'll surely have other replies that will do the same in a simpler way Smilie
Code:
#!/bin/sh

sort -t',' -k2,2n file | uniq -c -s4 > tmp

while read line; do
  echo "$line,$(grep ${line##*,} tmp | awk '{print $1}')"
done < file

exit 0

Your data file need to be named file, in the same directory as the script.
I use a tmp file to keep the number of occurences of the second field.
# 3  
Old 08-16-2009
Ok, another oneSmilie:

Code:
awk -F, 'NR==FNR{a[$2]++;next}{print $0 "," a[$2]}' file file

Regards
# 4  
Old 08-17-2009
how about below perl:

Code:
my (%result,%cnt);
while(<DATA>){
	chomp;
	my @tmp=split(",",$_);
	$result{$_}=$.;
	$cnt{$tmp[1]}++;
}
map  {s/([0-9]+)/$1.",".$cnt{$1}/e;print $_,"\n";} 
  sort {$result{$a} <=> $result{$b}} keys %result;
__DATA__
abc,12345
pqr,51223
mno,72121
stu,34567
aaa,12345
pqp,11224
plm,72121
zxy,88888
fgh,12345
jkl,88888

# 5  
Old 08-17-2009
Quote:
Originally Posted by tukuyomi
Here is what I get so far. Of course, you'll surely have other replies that will do the same in a simpler way Smilie
Code:
#!/bin/sh

sort -t',' -k2,2n file | uniq -c -s4 > tmp

while read line; do
  echo "$line,$(grep ${line##*,} tmp | awk '{print $1}')"
done < file

exit 0

Your data file need to be named file, in the same directory as the script.
I use a tmp file to keep the number of occurences of the second field.
---------- Post updated at 03:31 AM ---------- Previous update was at 03:29 AM ----------

[/COLOR]Hi Tukuyomi
Thanks for the solution but it has a deviation than expected result, and eating out some inputs. The output was like this.
1 pqp,11224
3 aaa,12345
1 stu,34567
1 pqr,51223
2 mno,72121
2 jkl,88888

can you please amend it if possible.Smilie

---------- Post updated at 03:40 AM ---------- Previous update was at 03:31 AM ----------

Hi frank,
there is no output for this awk script, its just publishing the same optput as input except a field saparator at the end as ",". Please can you correct it.
# 6  
Old 08-17-2009
Quote:
Originally Posted by indian.ace

Hi frank,
there is no output for this awk script, its just publishing the same optput as input except a field saparator at the end as ",". Please can you correct it.
This is what I get:

Code:
$ cat file
abc,12345
pqr,51223
mno,72121
stu,34567
aaa,12345
pqp,11224
plm,72121
zxy,88888
fgh,12345
jkl,88888
$ awk -F, 'NR==FNR{a[$2]++;next}{print $0 "," a[$2]}' file file
abc,12345,3
pqr,51223,1
mno,72121,2
stu,34567,1
aaa,12345,3
pqp,11224,1
plm,72121,2
zxy,88888,2
fgh,12345,3
jkl,88888,2

Am I missing something?
This User Gave Thanks to Franklin52 For This Post:
# 7  
Old 08-17-2009
Franklin,This is what i am getting, as you know much more abt this you can find out if I am doing something wrong I have Solaris10 as OS.
root@sunmc01>cat file
abc,12345
pqr,51223
mno,72121
stu,34567
aaa,12345
pqp,11224
plm,72121
zxy,88888
fgh,12345
jkl,88888
root@sunmc01>awk -F, 'NR==FNR{a[$2]++;next}{print $0 "," a[$2]}' file file
abc,12345,
pqr,51223,
mno,72121,
stu,34567,
aaa,12345,
pqp,11224,
plm,72121,
zxy,88888,
fgh,12345,
jkl,88888,
abc,12345,
pqr,51223,
mno,72121,
stu,34567,
aaa,12345,
pqp,11224,
plm,72121,
zxy,88888,
fgh,12345,
jkl,88888,
root@sunmc01>
Thanks for your consistent support.Smilie

Previous Thread | Next Thread
Test Your Knowledge in Computers #514
Difficulty: Easy
As a general rule, the more a variable is used, the longer the variable name should be.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies

2. Shell Programming and Scripting

How to count the field and add String?

Example i have 3 fields and i wanna add my input to the field after that (NF+1) SID|Fname|Lname 123123:adds:asdasdasd Result SID|Fname|Lname|Number 123123:adds:asdasdasd:123123 ---------- Post updated at 02:36 PM ---------- Previous update was at 02:23 PM ---------- Input is likes.... (3 Replies)
Discussion started by: vutung1991
3 Replies

3. Shell Programming and Scripting

Count of unique lines in field 4

When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to... (8 Replies)
Discussion started by: cmccabe
8 Replies

4. Shell Programming and Scripting

Count the field values in a file

Hi I have a file with contents like : 101,6789556897,0000795369 - seq - fmt_recs187] - avg_recs 101,4678354769,0000835783 - seq - fmt_recs98] - avg_recs 221,5679787008,0001344589 - seq - fmt_recs1283] - avg_recs I need to find the sum of the all the values (which are in bold). here... (6 Replies)
Discussion started by: rkrish
6 Replies

5. Shell Programming and Scripting

Help with awk for selecting lines in a file avoiding repetition

Hello, I am using Awk in UBUNTU 12.04. I have a file as following with 48,432,354 lines and 4 fields. The file has this structure (There are repetitions of the first column in several lines) AB_14 S54 A G AB_14 S55 A A AB_14 S56 G G GO_15 S45 T A GO_15 S46 A A PT_16 S33 C C PT_16 ... (4 Replies)
Discussion started by: Homa
4 Replies

6. Shell Programming and Scripting

Read File and Display The Count of a particular field

Hi Mates, I require help in the following: I have the following file snmp.txt Wed Mar 2 16:02:39 SGT 2011 Class : mmTrapBladeS origin : 10.0.0.0 hostname : 10.0.0.2 msg : IBM Blade Alert: Calendar Index : 10.0.0.2-IBMBLADE Fri Mar 4 07:10:54 SGT 2011 Class : mmTrapBladeS... (2 Replies)
Discussion started by: dbashyam
2 Replies

7. Shell Programming and Scripting

Count number of occurences of a character in a field defined by the character in another field

Hello, I have a text file with n lines in the following format (9 column fields): Example: contig00012 149606 G C 49 68 60 18 c$cccccacccccccccc^c I need to count the number of lower-case and upper-case occurences in column 9, respectively, of the... (3 Replies)
Discussion started by: s052866
3 Replies

8. Shell Programming and Scripting

How to check the repetition values in a file using bourne shell

Hi all, I have a scenario, like consider a file abc.txt, inside abc.txt, the contents is value1 = aaa, value2 = bbb, value3 = ccc, value1 = ddd. In this situation i need to throw an error for the repeatation of keys like "value1" is repeating twice. how to handle this using bourne... (1 Reply)
Discussion started by: Nandagopal
1 Replies

9. Shell Programming and Scripting

Count field frequency in a '|' delimited file

I have a large file with fields delimited by '|', and I want to run some analysis on it. What I want to do is count how many times each field is populated, or list the frequency of population for each field. I am in a Sun OS environment. Thanks, - CB (3 Replies)
Discussion started by: ChicagoBlues
3 Replies

10. UNIX for Dummies Questions & Answers

Count of Field for Non-Empty

Hi Guys, I wanted to count the number of records for a particular field of a file. whose fields are separated by comma"," I fI use this command. cat "filename" cut -sd "," -f13 | wc -l This shows all the lines count including the blank values for the field number 13. I wanted to count... (2 Replies)
Discussion started by: Swapna173
2 Replies

Featured Tech Videos