Sponsored Content
Top Forums Shell Programming and Scripting Count of unique lines in field 4 Post 302960172 by cmccabe on Wednesday 11th of November 2015 05:42:20 PM
Old 11-11-2015
Count of unique lines in field 4

When I use the below awk to count the unique lines in $4 for the input it seems to work. The answer is 3 because $4 is only unique 3 times in all the entries. However, when I use the same on actual data I get 56,536 and I know the answer should be 56,548. My question is there a better way to count the unique lines? Thank you Smilie.

input
Code:
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    1    20
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    2    20
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    3    22
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    4    22
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    5    22
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    1    186
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    2    201
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    3    201
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    271    176
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    272    175
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    273    175
chr1    957571    957852    chr1:957571-957852    AGRN-7|gc=61.2    274    175
chr1    970621    970740    chr1:970621-970740    AGRN-8|gc=57.1    46    280
chr1    970621    970740    chr1:970621-970740    AGRN-8|gc=57.1    47    280

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

2. Shell Programming and Scripting

Unique Field

I have this input file tilenet_test:clar_r5_performance:server_2:4.80762:0%:APM00083103999-009E,APM00083103999-009F tilenet_int:clar_r5_performance:server_2:4.80762:0%:APM00083103999-00C4... (3 Replies)
Discussion started by: greycells
3 Replies

3. UNIX for Dummies Questions & Answers

Print unique lines without sort or unique

I would like to print unique lines without sort or unique. Unfortunately the server I am working on does not have sort or unique. I have not been able to contact the administrator of the server to ask him to add it for several weeks. (7 Replies)
Discussion started by: cokedude
7 Replies

4. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301... (4 Replies)
Discussion started by: ncwxpanther
4 Replies

5. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies

6. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ... (5 Replies)
Discussion started by: shunya
5 Replies

7. Shell Programming and Scripting

awk to remove lines where field count is greather than 1 in two fields

I am trying to remove all the lines and spaces where the count in $4 or $5 is greater than 1 (more than 1 letter). The file and the output are tab-delimited. Thank you :). file X 5811530 . G C NLGN4X 17 10544696 . GA G MYH3 9 96439004 . C ... (1 Reply)
Discussion started by: cmccabe
1 Replies

8. UNIX for Beginners Questions & Answers

Count unique column

Hello, I am trying to count unique rows in my file based on 4 columns (2-5) and to output its frequency in a sixth column. My file is tab delimited My input file looks like this: Colum1 Colum2 Colum3 Colum4 Coulmn5 1.1 100 100 a b 1.1 100 100 a c 1.2 200 205 a d 1.3 300 301 a y 1.3 300... (6 Replies)
Discussion started by: nans
6 Replies

9. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

10. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies
Bio::Graphics::Glyph::stackedplot(3pm)			User Contributed Perl Documentation		    Bio::Graphics::Glyph::stackedplot(3pm)

NAME
Bio::Graphics::Glyph::stackedplot - The stackedplot glyph SYNOPSIS
See L<Bio::Graphics::Panel> and L<Bio::Graphics::Glyph>. DESCRIPTION
The stackedplot glyph can be used to draw quantitative feature data using a stacked column plot. It differs from the xyplot glyph in that the plot applies to a single top level feature, not a group of subfeatures. The data to be graphed is derived from an attribute called "data_series." The data to be graphed is represented as a list of arrays: ( [1, 2, 8], [6, 1, 1], [10,8, 0], [1, 1, 1], ) Each array is a column in the stacked plot. Its values become the subdivisions of the column. In this example, there are four columns, each of which has three subdivisions. You can add labels to the columns and change the colors of the subdivisions. To assign data to a feature, you can add a "series" tag: $snp1 = Bio::SeqFeature::Generic ->new (-start => 500,-end=>501, -display_name =>'example', -tag=> { series => [ [10,20,30], [30,30,0], [5,45,10], [5,45,10], [5,45,10], [50,0,50], ], } ); Note that the series tag must consist of an array of arrays. If you are using a gff3 representation, you can load a database with data that looks like this: chr1 test feature 1 1000 . . . series=10 20 30;series=30 30 0;series=5 45 10... If you are using a gff2 representation, you can load a database with data that looks like this: chr1 test feature 1 1000 . . . series 10 20 30; series 30 30 0 series 5 45 10... Or you can pass a callback to the -series option: $panel->add_track(@data, -glyph => 'stackedplot', -series => sub { my $feature = shift; return [ [10,20,30], [30,30,0], [5,45,10], ] } ); OPTIONS The following options are standard among all Glyphs. See Bio::Graphics::Glyph for a full explanation. Option Description Default ------ ----------- ------- -fgcolor Foreground color black -outlinecolor Synonym for -fgcolor -bgcolor Background color turquoise -fillcolor Synonym for -bgcolor -linewidth Line width 1 -height Height of glyph 10 -font Glyph font gdSmallFont -label Whether to draw a label 0 (false) -description Whether to draw a description 0 (false) -hilite Highlight color undef (no color) In addition, the alignment glyph recognizes all the options of the xyplot glyph, as well as the following glyph-specific option: Option Description Default ------ ----------- ------- -fixed_gap Vertical distance between 8 the rectangle that shows the start:end range of the feature and the fixed width stacked plot. -series_colors A list giving a series of red,blue,green,orange, color names for the data brown,grey,black series (the values inside each stacked column). -column_labels A list of labels to print -none- underneath each column. -column_width The width of each column. 8 -column_spacing Spacing between each 2 column. -min_score Minimum score for the 0.0 sum of the members of each data series. -max_score Maximum score for the 1.0 sum of the members of each data series. -scale_font Font to use for the scale. gdTinyFont -column_font Font to use for the column gdSmallFont labels. -draw_scale Whether to draw a scale to true right of the columns. Note that -min_score and -max_score represent the minimum and maximum SUM of all the values in the data series. For example, if your largest column contains the series (10,20,30), then the -max_score is 60. EXAMPLE
To understand how this glyph works, try running and modifying the following example: #!/usr/bin/perl use strict; use warnings; use Bio::Graphics; use Bio::SeqFeature::Generic; my $segment = Bio::Graphics::Feature->new(-start=>1,-end=>700); my $snp1 = Bio::SeqFeature::Generic ->new (-start => 500,-end=>590, -display_name =>'fred', -tag=> { series => [ [10,20,30], [30,30,0], [5,45,10], [5,45,10], [5,45,10], [50,0,50], ], }, -source=>'A test', ); my $snp2 = Bio::SeqFeature::Generic->new(-start => 300, -end => 301, -display_name => 'rs12345', -tag=> { series => [ [30,20,10 ], [80,10,10 ], ], }, -source=>'Another test', ); my $panel = Bio::Graphics::Panel->new(-segment=>$segment,-width=>800); $panel->add_track($segment,-glyph=>'arrow',-double=>1,-tick=>2); $panel->add_track([$snp1,$snp2], -height => 50, -glyph => 'stackedplot', -fixed_gap => 12, -series_colors => [qw(red blue lavender)], -column_labels => [qw(a b c d e f g)], -min_score => 0, -max_score => 100, -column_width => 8, -column_font => 'gdMediumBoldFont', -scale_font => 'gdTinyFont', -label => 1, -description=>1, ); print $panel->png; BUGS
Please report them. SEE ALSO
Bio::Graphics::Panel, Bio::Graphics::Track, Bio::Graphics::Glyph::transcript2, Bio::Graphics::Glyph::anchored_arrow, Bio::Graphics::Glyph::arrow, Bio::Graphics::Glyph::box, Bio::Graphics::Glyph::primers, Bio::Graphics::Glyph::segments, Bio::Graphics::Glyph::toomany, Bio::Graphics::Glyph::transcript, AUTHOR
Lincoln Stein <lstein@cshl.org> Copyright (c) 2006 Cold Spring Harbor Laboratory This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty. perl v5.14.2 2012-02-20 Bio::Graphics::Glyph::stackedplot(3pm)
All times are GMT -4. The time now is 07:23 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy