Sponsored Content
Top Forums Shell Programming and Scripting Help with awk in counting characters based on a column Post 302735807 by Homa on Monday 26th of November 2012 08:31:01 AM
Old 11-26-2012
Help with awk in counting characters based on a column

Hello,
I am using Awk in UBUNTU 12.04.

I have a file as follows with 2172 rows and 44707 columns. ABO and GPO are the names of my populations.
Code:
ABO_1  1  2
ABO_1  1  2
ABO_2  1  1 
ABO_2  1  2
GPO_1   1  1 
GPO_1  2  2
GPO_2   1  0 
GPO_2  2  0

I want to count the number of 1s and 2s in each population ignoring 0s if there is any but printing 0 if there is no 1 or 2 and have an output like this:
Code:
4 0 2 2 
1 3 1 1

Where 4 0 is the number of "1s" and "2s" in the second column of the first population. 1 3 is the number of "1s" and "2s" in the third column of the first population and so on.

Thank you very much for your help.

Last edited by Homa; 11-26-2012 at 09:40 AM.. Reason: Please use code tags for data and code samples
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk count characters, sum, and divide by another column

Hi All, I am another biologist attempting to parse a large txt file containing several million lines like: tucosnp 56762 T Y 228 228 60 23 .CcCcc,,..c.c,cc,,.C... What I need to do is get the frequency of periods (.) plus commas (,) in column 9, and populate this number into another... (1 Reply)
Discussion started by: peromhc
1 Replies

2. Shell Programming and Scripting

Counting rows line by line from a specific column using Awk

Dear UNIX community, I would like to to count characters from a specific row and have them displayed line-by-line. I have a file called testAwk2.csv which contain the following data: rabbit penguin goat giraffe emu ostrich I would like to count in the middle row individually... (4 Replies)
Discussion started by: vnayak
4 Replies

3. Shell Programming and Scripting

counting lines containing two column field values with awk

Hello everybody, I'm trying to count the number of consecutive lines in a text file which have two distinctive column field values. These lines may appear in several line blocks within the file, but I only want a single block to be counted. This was my first approach to tackle the problem (I'm... (6 Replies)
Discussion started by: origamisven
6 Replies

4. Shell Programming and Scripting

Sed or awk : pattern selection based on special characters

Hello All, I am here again scratching my head on pattern selection with special characters. I have a large file having around 200 entries and i have to select a single line based on a pattern. I am able to do that: Code: cat mytest.txt | awk -F: '/myregex/ { print $2}' ... (6 Replies)
Discussion started by: usha rao
6 Replies

5. Shell Programming and Scripting

Pick the column value based on another column using awk or CUT

My scenario is that I need to pick value from third column based on fourth column value, if fourth column value is 1 then first value of third column.Third column (2|3|4|6|1) values are cancatenated. Please someone help me to resolve this issue. Source column1 column2 column3 column4... (2 Replies)
Discussion started by: Ganesh L
2 Replies

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

7. Shell Programming and Scripting

Precede and Append characters using sed/awk based on a pattern

I have an input file which is similar to what I have shown below. Pattern : Data followed by two blank lines followed by data again followed by two blank lines followed by data again etc.. The first three lines after every blank line combination(2 blank lines between data) should be... (2 Replies)
Discussion started by: bikerboy
2 Replies

8. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

I have the following script that will print column 4 ("25") when column 1 contains "123". However, I need to ignore the alpha characters that are contained in the input file. If I were to ignore the characters my output would be column 3. What is the best way to print my column of interest... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

9. Shell Programming and Scripting

Awk: split column if special characters

Hi, I've data like these: Gene1,Gene2 snp1 Gene3 snp2 Gene4 snp3 I'd like to split line if comma and then print remaining information for the respective gene. My code: awk '{ if($1 ~ /,/){ n = split($0, t, ",") (7 Replies)
Discussion started by: genome
7 Replies

10. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies
Bio::PopGen::Population(3pm)				User Contributed Perl Documentation			      Bio::PopGen::Population(3pm)

NAME
Bio::PopGen::Population - A population of individuals SYNOPSIS
use Bio::PopGen::Population; use Bio::PopGen::Individual; my $population = Bio::PopGen::Population->new(); my $ind = Bio::PopGen::Individual->new(-unique_id => 'id'); $population->add_Individual($ind); for my $ind ( $population->get_Individuals ) { # iterate through the individuals } for my $name ( $population->get_marker_names ) { my $marker = $population->get_Marker(); } my $num_inds = $population->get_number_individuals; my $homozygote_f = $population->get_Frequency_Homozygotes; my $heterozygote_f = $population->get_Frequency_Heterozygotes; # make a population haploid by making fake chromosomes through # haplotypes -- ala allele 1 is on chrom 1 and allele 2 is on chrom 2 # the number of individuals created will thus be 2 x number in # population my $happop = $population->haploid_population; DESCRIPTION
This is a collection of individuals. We'll have ways of generating Bio::PopGen::MarkerI objects out so we can calculate allele_frequencies for implementing the various statistical tests. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via email or the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Jason Stajich Email jason-at-bioperl.org CONTRIBUTORS
Matthew Hahn, matthew.hahn-at-duke.edu APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $obj = Bio::PopGen::Population->new(); Function: Builds a new Bio::PopGen::Population object Returns : an instance of Bio::PopGen::Population Args : -individuals => array ref of individuals (optional) -name => population name (optional) -source => a source tag (optional) -description => a short description string of the population (optional) name Title : name Usage : my $name = $pop->name Function: Get the population name Returns : string representing population name Args : [optional] string representing population name description Title : description Usage : my $description = $pop->description Function: Get the population description Returns : string representing population description Args : [optional] string representing population description source Title : source Usage : my $source = $pop->source Function: Get the population source Returns : string representing population source Args : [optional] string representing population source annotation Title : annotation Usage : my $annotation_collection = $pop->annotation; Function: Get/set a Bio::AnnotationCollectionI for this population Returns : Bio::AnnotationCollectionI object Args : [optional set] Bio::AnnotationCollectionI object set_Allele_Frequency Title : set_Allele_Frequency Usage : $population->set_Allele_Frequency('marker' => { 'allele1' => 0.1}); Function: Sets an allele frequency for a Marker for this Population This allows the Population to not have individual individual genotypes but rather a set of overall allele frequencies Returns : Count of the number of markers Args : -name => (string) marker name -allele => (string) allele name -frequency => (double) allele frequency - must be between 0 and 1 OR -frequencies => { 'marker1' => { 'allele1' => 0.01, 'allele2' => 0.99}, 'marker2' => ... } add_Individual Title : add_Individual Usage : $population->add_Individual(@individuals); Function: Add individuals to a population Returns : count of the current number in the object Args : Array of Individuals remove_Individuals Title : remove_Individuals Usage : $population->remove_Individuals(@ids); Function: Remove individual(s) to a population Returns : count of the current number in the object Args : Array of ids get_Individuals Title : get_Individuals Usage : my @inds = $pop->get_Individuals(); Function: Return the individuals, alternatively restrict by a criteria Returns : Array of Bio::PopGen::IndividualI objects Args : none if want all the individuals OR, -unique_id => To get an individual with a specific id -marker => To only get individuals which have a genotype specific for a specific marker name get_Genotypes Title : get_Genotypes Usage : my @genotypes = $pop->get_Genotypes(-marker => $name) Function: Get the genotypes for all the individuals for a specific marker name Returns : Array of Bio::PopGen::GenotypeI objects Args : -marker => name of the marker get_marker_names Title : get_marker_names Usage : my @names = $pop->get_marker_names; Function: Get the names of the markers Returns : Array of strings Args : [optional] boolean flag to ignore internal cache status get_Marker Title : get_Marker Usage : my $marker = $population->get_Marker($name) Function: Get a Bio::PopGen::Marker object based on this population Returns : Bio::PopGen::MarkerI object Args : name of the marker get_number_individuals Title : get_number_individuals Usage : my $count = $pop->get_number_individuals; Function: Get the count of the number of individuals Returns : integer >= 0 Args : none set_number_individuals Title : set_number_individuals Usage : $pop->set_number_individuals($num); Function: Fixes the number of individuals, call this with 0 to unset. Only use this if you know what you are doing, this is only relavent when you are just adding allele frequency data for a population and want to calculate something like theta Returns : none Args : individual count, calling it with undef or 0 will reset the value to return a number calculated from the number of individuals stored for this population. get_Frequency_Homozygotes Title : get_Frequency_Homozygotes Usage : my $freq = $pop->get_Frequency_Homozygotes; Function: Calculate the frequency of homozygotes in the population Returns : fraction between 0 and 1 Args : $markername get_Frequency_Heterozygotes Title : get_Frequency_Heterozygotes Usage : my $freq = $pop->get_Frequency_Homozygotes; Function: Calculate the frequency of homozygotes in the population Returns : fraction between 0 and 1 Args : $markername haploid_population Title : haploid_population Usage : my $pop = $population->haploid_population; Function: Make a new population where all the individuals are haploid - effectively an individual out of each chromosome an individual has. Returns : L<Bio::PopGen::PopulationI> Args : None perl v5.14.2 2012-03-02 Bio::PopGen::Population(3pm)
All times are GMT -4. The time now is 09:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy