11-21-2007
Finding the most common entry in a column
Hi,
I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this.
e.g.
value1,value2,bob
value1,value2,bob
value1,value2,bob
value1,value2,dave
value1,value2,james
Clearly in the above example the most popular value in column3 is "bob", but how would I write a script to work this out?
Many thanks
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention:
YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT
What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies
2. Shell Programming and Scripting
I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc.
One person may have
Last1,... (5 Replies)
Discussion started by: Peggy White
5 Replies
3. Shell Programming and Scripting
I have 3 files which are tab delimited and have numbers in it.
file 1
1
2
3
4
5
6
7
File 2
3
5
7
8
File 3
1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies
4. Shell Programming and Scripting
Hello,
I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2
SDF4 -0.211654
SDF4 0.978068
... (1 Reply)
Discussion started by: Diya123
1 Replies
5. Shell Programming and Scripting
Hi All,
I have a file example.csv which looks like this
GrpID,TargetID,Signal,Avg_Num
CSCH74_1_1,2007,61,256
CSCH74_1_1,212007,647,679
CSCH74_1_1,12007,3,32
CSCH74_1_1,207,299,777
I want the output as
GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num
CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies
6. Shell Programming and Scripting
Please can you help in providing the most repeated entry in the 2nd column and give its count
Here is an input file
1, This , is a forum
2, This , is a forum
1, There , is a forum
2, This , is not right
Here the most repeated entry is "This" and count is 3
So output... (4 Replies)
Discussion started by: necro98
4 Replies
7. Shell Programming and Scripting
Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this
col1 col2
EN03 typehellobyedogcatcatdog
EN09 typehellobyebyebyebye
EN08 dogcatcatdogbyebyebyebye
EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies
8. Shell Programming and Scripting
Hi,
I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column)
The input is for example, after sorted:
K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies
9. UNIX for Beginners Questions & Answers
Hello, I need to find the intersection across 10 columns. Kindly help.
my file (INPUT.csv) looks like this
4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S
LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies
10. UNIX for Beginners Questions & Answers
Hi All ,
I am having an input file as stated below
Input file
6 ddk/djhdj/djhdj/Q 10 0.5
dhd/jdjd.djd.nd/QB 01 0.5
hdhd/jd/jd/jdj/Q 10 0.5
512 hd/hdh/gdh/Q 01 0.5
jdjd/jd/ud/j/QB 10 0.5
HD/jsj/djd/Q 01 0.5
71 hdh/jjd/dj/jd/Q 10 0.5
... (5 Replies)
Discussion started by: kshitij
5 Replies
LEARN ABOUT OPENDARWIN
slapd.replog
SLAPD.REPLOG(5) File Formats Manual SLAPD.REPLOG(5)
NAME
slapd.replog - slapd replication log format
SYNOPSIS
slapd.replog slapd.replog.lock
DESCRIPTION
The file slapd.replog is produced by the stand-alone LDAP daemon, slapd(8), when changes are made to its local database that are to be
propagated to one or more replica slapds. The file consists of zero or more records, each one corresponding to a change, addition, or
deletion from the slapd database. The file is meant to be read and processed by slurpd(8), the stand-alone LDAP update replication daemon.
The records are separated by a blank line. Each record has the following format.
The record begins with one or more lines indicating the replicas to which the change is to be propagated:
replica: <hostname[:portnumber]>
Next, the time the change took place given, as the number of seconds since 00:00:00 GMT, Jan. 1, 1970, with an optional decimal extension,
in order to make times unique. Note that slapd does not make times unique, but slurpd makes all times unique in its copies of the replog
files.
time: <integer[.integer]>
Next, the distinguished name of the entry being changed is given:
dn: <distinguishedname>
Next, the type of change being made is given:
changetype: <[modify|add|delete|modrdn]>
Finally, the change information itself is given, the format of which depends on what kind of change was specified above. For a changetype
of modify, the format is one or more of the following:
add: <attributetype>
<attributetype>: <value1>
<attributetype>: <value2>
...
-
Or, for a replace modification:
replace: <attributetype>
<attributetype>: <value1>
<attributetype>: <value2>
...
-
Or, for a delete modification:
delete: <attributetype>
<attributetype>: <value1>
<attributetype>: <value2>
...
-
If no attributetype lines are given, the entire attribute is to be deleted.
For a changetype of add, the format is:
<attributetype1>: <value1>
<attributetype1>: <value2>
...
<attributetypeN>: <value1>
<attributetypeN>: <value2>
For a changetype of modrdn, the format is:
newrdn: <newrdn>
deleteoldrdn: 0 | 1
where a value of 1 for deleteoldrdn means to delete the values forming the old rdn from the entry, and a value of 0 means to leave the val-
ues as non-distinguished attributes in the entry.
For a changetype of delete, no additional information is needed in the record.
The format of the values is the LDAP Directory Interchange Format described in ldif(5).
Access to the slapd.replog file is synchronized through the use of flock(3) on the file slapd.replog.lock. Any process reading or writing
this file should obey this locking convention.
EXAMPLE
The following sample slapd.replog file contains information on one of each type of change.
replica: truelies.rs.itd.umich.edu
replica: judgmentday.rs.itd.umich.edu
time: 797612941
dn: cn=Babs Jensen,dc=example,dc=com
changetype: add
objectclass: person
cn: babs
cn: babs jensen
sn: jensen
replica: truelies.rs.itd.umich.edu
replica: judgmentday.rs.itd.umich.edu
time: 797612973
dn: cn=Babs Jensen,dc=example,dc=com
changetype: modify
add: description
description: the fabulous babs
replica: truelies.rs.itd.umich.edu
replica: judgmentday.rs.itd.umich.edu
time: 797613020
dn: cn=Babs Jensen,dc=example,dc=com
changetype: modrdn
newrdn: cn=Barbara J Jensen
deleteoldrdn: 0
FILES
slapd.replog
slapd replication log file
slapd.replog.lock
lockfile for slapd.replog
SEE ALSO
ldap(3), ldif(5), slapd(8), slurpd(8)
ACKNOWLEDGEMENTS
OpenLDAP is developed and maintained by The OpenLDAP Project (http://www.openldap.org/). OpenLDAP is derived from University of Michigan
LDAP 3.3 Release.
OpenLDAP 2.1.X RELEASEDATE SLAPD.REPLOG(5)