awk to split file using multiple deliminators Post: 302988876

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each...

2. UNIX for Dummies Questions & Answers

split a file into multiple files

Hi All, I have a file ABC.txt and I need to split this file on every 250 rows. And the file name should be ABC1.txt , ABC2.txt and so on. I tried with split command split -l 250 <filename> '<filename>' but the file name returned was ABC.txtaa ABC.txtab. Please...

3. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Hi, I need help to split lines from a file into multiple files. my input look like this: 13 23 45 45 6 7 33 44 55 66 7 13 34 5 6 7 87 45 7 8 8 9 13 44 55 66 77 8 44 66 88 99 6 I want to split every 3 lines from this file to be written to individual files.

4. Shell Programming and Scripting

Split file into multiple files

Hi I have a file that has multiple sequences; the sequence name is the line starting with '>'. It looks like below: infile.txt: >HE_ER tttggtgccttgactcggattgggggacctcccttgggagatcaatcccctgtcctcctgctctttgctc cgtgaaaaggatccacctatgacctctagtcctcagacccaccagcccaaggaacatctcaccaatttca >M7B_Ho_sap...

5. Shell Programming and Scripting

Awk multiple deliminators

I'm going through a list of files CLINK_0.fits CLINK_1.fits ... CLINK_11.fits and I want to grab the number. Since the number goes from single to double digits, I can't use fix widths. Currently, I'm using an ugly work around of echo $x | awk -F_ '{print $2}' | awk -F. '{print $1}' but I...

6. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c...

7. Shell Programming and Scripting

awk script to split file into multiple files based on many columns

So I have a space delimited file that I'd like to split into multiple files based on multiple column values. This is what my data looks like 1bc9A02 1 10 1000 FTDLNLVQALRQFLWSFRLPGEAQKIDRMMEAFAQRYCQCNNGVFQSTDTCYVLSFAIIMLNTSLHNPNVKDKPTVERFIAMNRGINDGGDLPEELLRNLYESIKNEPFKIPELEHHHHHH 1ku1A02 1 10...

8. Shell Programming and Scripting

Split a big file into multiple files using awk

this thread is a continuation from previous thread https://www.unix.com/shell-programming-and-scripting/223901-split-big-file-into-multiple-files-based-first-four-characters.html ..I am using awk to split file and I have a syntax error while executing the below code I am using AIX 7.2...

9. Shell Programming and Scripting

Split file into multiple files using awk

I have following file: FHEAD0000000001RTLG20161205110959201612055019 THEAD...... TCUST..... TITEM.... TTEND... TTAIL... THEAD...... TCUST..... TITEM.... TITEM..... TTEND... TTAIL... FTAIL<number of lines in file- 10 digits;prefix 0><number of lines in file-2 - 10 digits- perfix 0>...

10. UNIX for Beginners Questions & Answers

awk to split field twice using two deliminators

In the awk I am splitting on the : into array a, then splitting on the - into element b. I can not seem to duplicate b if there is no - after it. Lines 1,2,4 are examples. If there is a - after the number in b then the value to the right of it is $3 in the ouput. Thank you :). awk...

LEARN ABOUT DEBIAN

bio::graphics::glyph::whiskerplot

Bio::Graphics::Glyph::whiskerplot(3pm)			User Contributed Perl Documentation		    Bio::Graphics::Glyph::whiskerplot(3pm)

NAME

       Bio::Graphics::Glyph::whiskerplot - The whiskerplot glyph

SYNOPSIS

	 See L<Bio::Graphics::Panel> and L<Bio::Graphics::Glyph>.

DESCRIPTION

       This glyph is used for drawing features associated with numeric data using "box and whisker" style data points, which display the mean
       value, extreme ranges and first and third quartiles (or standard deviation). The boxes drawn by this glyph are similar to
       <http://www.abs.gov.au/websitedbs/D3310116.NSF/0/3c35ac1e828c23ef4a2567ac0020ec8a?OpenDocument>, except that they are oriented vertically
       so that the position and height of the box indicates the mean value and spread of the data, and the width indicates the genomic extent of
       the value.

       Like the xyplot glyph (from which it inherits the whiskerplot is designed to work on a single feature group that contains subfeatures.  It
       is the subfeatures that carry the score information. The best way to arrange for this is to create an aggregator for the feature.  We'll
       take as an example a histogram of repeat density in which interval are spaced every megabase and the score indicates the number of repeats
       in the interval; we'll assume that the database has been loaded in in such a way that each interval is a distinct feature with the method
       name "density" and the source name "repeat".  Furthermore, all the repeat features are grouped together into a single group (the name of
       the group is irrelevant).  If you are using Bio::DB::GFF and Bio::Graphics directly, the sequence of events would look like this:

	 my $agg = Bio::DB::GFF::Aggregator->new(-method    => 'repeat_density',
						 -sub_parts => 'density:repeat');
	 my $db  = Bio::DB::GFF->new(-dsn=>'my_database',
				     -aggregators => $agg);
	 my $segment  = $db->segment('Chr1');
	 my @features = $segment->features('repeat_density');

	 my $panel = Bio::Graphics::Panel->new;
	 $panel->add_track(@features,
			   -glyph => 'xyplot',
			   -scale => 'both',
       );

       If you are using Generic Genome Browser, you will add this to the configuration file:

	 aggregators = repeat_density{density:repeat}
		       clone alignment etc

       Note that it is a good idea to add some padding to the left and right of the panel; otherwise the scale will be partially cut off by the
       edge of the image.

       The mean (or median) of the data will be taken from the feature score. The range and quartile data must either be provided in a feature tag
       named "range", or must be generated dynamically by a -range callback option passed to add_track. The data returned by the tag or option
       should be an array reference containing the following five fields:

	[$median,$range_low,$range_high,$quartile_low,$quartile_high]

       where $range_low and $range_high correspond to the low and high value of the "whiskers" and $quartile_low and $quartile_high correspond to
       the low and high value of the "box."

       If $median is undef or missing, then the score field of the feature will be used instead. It may be useful to repeat the median in the
       score field in any case, in order to allow the minimum and maximum range calculations of the graph itself to occur.

       See Examples for three ways of generating an image.

   OPTIONS
       The following options are standard among all Glyphs.  See Bio::Graphics::Glyph for a full explanation.

	 Option      Description		      Default
	 ------      -----------		      -------

	 -fgcolor      Foreground color 	      black

	 -outlinecolor Synonym for -fgcolor

	 -bgcolor      Background color 	      turquoise

	 -fillcolor    Synonym for -bgcolor

	 -linewidth    Line width		      1

	 -height       Height of glyph		      10

	 -font	       Glyph font		      gdSmallFont

	 -label        Whether to draw a label	      0 (false)

	 -description  Whether to draw a description  0 (false)

	 -hilite       Highlight color		      undef (no color)

       In addition, the alignment glyph recognizes all the options of the xyplot glyph, as well as the following glyph-specific option:

	 Option 	Description		     Default
	 ------ 	-----------		     -------

	 -range        Callback to return median,    none - data comes from feature "range" tag
		       range and quartiles for each
		       sub feature

EXAMPLES

       Here are three examples of how to use this glyph.

   Example 1: Incorporating the numeric data in each subfeature
	#!/usr/bin/perl
	use strict;

	use Bio::Graphics;
	use Bio::SeqFeature::Generic;

	my $bsg = 'Bio::SeqFeature::Generic';

	my $feature = $bsg->new(-start=>0,-end=>1000);

	for (my $i=0;$i<1000;$i+=20) {
	  my $y = (($i-500)/10)**2;
	  my $range = make_range($y);
	  my $part = $bsg->new(-start=>$i,-end=>$i+16,
			      -score=>$y,-tag => { range=>$range });
	  $feature->add_SeqFeature($part);
	}

	my $panel = Bio::Graphics::Panel->new(-length=>1000,-width=>800,-key_style=>'between',
					     -pad_left=>40,-pad_right=>40);
	$panel->add_track($feature,
			 -glyph=>'arrow',
			 -double=>1,
			 -tick=>2);

	$panel->add_track($feature,
			 -glyph=>'whiskerplot',
			 -scale=>'both',
			 -height=>200,
			 -min_score => -500,
			 -key  =>'Whiskers',
			 -bgcolor => 'orange',
			);
	print $panel->png;

	sub make_range {
	  my $score	   = shift;
	  my $range_top    = $score + 5*sqrt($score) + rand(50);
	  my $range_bottom = $score - 5*sqrt($score) - rand(50);
	  my $quartile_top    = $score + 2*sqrt($score) + rand(50);
	  my $quartile_bottom = $score - 2*sqrt($score) - rand(50);
	  return [$score,$range_bottom,$range_top,$quartile_bottom,$quartile_top];
	}

   Example 2: Generating the range data with a callback
	#!/usr/bin/perl
	use strict;

	use Bio::Graphics;
	use Bio::SeqFeature::Generic;

	my $bsg = 'Bio::SeqFeature::Generic';
	my $feature = $bsg->new(-start=>0,-end=>1000);

	for (my $i=0;$i<1000;$i+=20) {
	  my $y = (($i-500)/10)**2;
	  my $part = $bsg->new(-start=>$i,-end=>$i+16,-score=>$y);
	  $feature->add_SeqFeature($part);
	}

	my $panel = Bio::Graphics::Panel->new(-length=>1000,-width=>800,-key_style=>'between',
					     -pad_left=>40,-pad_right=>40);
	$panel->add_track($feature,
			 -glyph=>'arrow',
			 -double=>1,
			 -tick=>2);

	$panel->add_track($feature,
			 -glyph=>'whiskerplot',
			 -scale=>'both',
			 -height=>200,
			 -min_score => -500,
			 -key  =>'Whiskers',
			 -bgcolor => 'orange',
			 -range => &make_range,
			);
	print $panel->png;

	sub make_range {
	  my $feature = shift;
	  my $score	   = $feature->score;
	  my $range_top    = $score + 5*sqrt($score) + rand(50);
	  my $range_bottom = $score - 5*sqrt($score) - rand(50);
	  my $quartile_top    = $score + 2*sqrt($score) + rand(50);
	  my $quartile_bottom = $score - 2*sqrt($score) - rand(50);
	  return [$score,$range_bottom,$range_top,$quartile_bottom,$quartile_top];
	}

   Example 3: Generating the image from a FeatureFile
       The file:
	    [general]
	    pixels = 840
	    pad_left = 40
	    pad_right = 40

	    [contig]
	    glyph     = arrow
	    double    = 1
	    tick      = 2

	    [data]
	    glyph     = whiskerplot
	    scale     = both
	    height    = 200
	    min_score = -500
	    max_score = 2800
	    key       = Whiskers
	    bgcolor   = orange

	    chr1   .	   contig  1	   1000    .	   .	   .	   Contig chr1
	    chr1   .	   data    0	   16	   2500    .	   .	   Dataset data1; range 2209,2769,2368,2619
	    chr1   .	   data    20	   36	   2304    .	   .	   Dataset data1; range 2051,2553,2163,2435
	    chr1   .	   data    40	   56	   2116    .	   .	   Dataset data1; range 1861,2384,1983,2253
	    chr1   .	   data    60	   76	   1936    .	   .	   Dataset data1; range 1706,2181,1819,2059
	    chr1   .	   data    80	   96	   1764    .	   .	   Dataset data1; range 1516,1995,1646,1849
	    chr1   .	   data    100	   116	   1600    .	   .	   Dataset data1; range 1359,1834,1513,1699
	    chr1   .	   data    120	   136	   1444    .	   .	   Dataset data1; range 1228,1654,1330,1565
	    chr1   .	   data    140	   156	   1296    .	   .	   Dataset data1; range 1105,1520,1198,1385
	    chr1   .	   data    160	   176	   1156    .	   .	   Dataset data1; range 983,1373,1062,1270
	    chr1   .	   data    180	   196	   1024    .	   .	   Dataset data1; range 853,1184,914,1116
	    chr1   .	   data    200	   216	   900	   .	   .	   Dataset data1; range 722,1093,801,965
	    chr1   .	   data    220	   236	   784	   .	   .	   Dataset data1; range 621,945,724,859
	    chr1   .	   data    240	   256	   676	   .	   .	   Dataset data1; range 532,833,605,742
	    chr1   .	   data    260	   276	   576	   .	   .	   Dataset data1; range 433,714,485,653
	    chr1   .	   data    280	   296	   484	   .	   .	   Dataset data1; range 331,600,418,545
	    chr1   .	   data    300	   316	   400	   .	   .	   Dataset data1; range 275,535,336,459
	    chr1   .	   data    320	   336	   324	   .	   .	   Dataset data1; range 198,434,270,374
	    chr1   .	   data    340	   356	   256	   .	   .	   Dataset data1; range 167,378,219,322
	    chr1   .	   data    360	   376	   196	   .	   .	   Dataset data1; range 114,303,118,249
	    chr1   .	   data    380	   396	   144	   .	   .	   Dataset data1; range 39,248,87,197
	    chr1   .	   data    400	   416	   100	   .	   .	   Dataset data1; range 17,173,68,141
	    chr1   .	   data    420	   436	   64	   .	   .	   Dataset data1; range -14,125,18,84
	    chr1   .	   data    440	   456	   36	   .	   .	   Dataset data1; range -8,74,11,64
	    chr1   .	   data    460	   476	   16	   .	   .	   Dataset data1; range -46,77,0,43
	    chr1   .	   data    480	   496	   4	   .	   .	   Dataset data1; range -40,43,-7,36
	    chr1   .	   data    500	   516	   0	   .	   .	   Dataset data1; range -43,0,-43,22
	    chr1   .	   data    520	   536	   4	   .	   .	   Dataset data1; range -6,52,-4,54
	    chr1   .	   data    540	   556	   16	   .	   .	   Dataset data1; range -5,38,-27,52
	    chr1   .	   data    560	   576	   36	   .	   .	   Dataset data1; range -43,109,18,66
	    chr1   .	   data    580	   596	   64	   .	   .	   Dataset data1; range -1,134,3,112
	    chr1   .	   data    600	   616	   100	   .	   .	   Dataset data1; range 49,186,69,124
	    chr1   .	   data    620	   636	   144	   .	   .	   Dataset data1; range 79,225,71,169
	    chr1   .	   data    640	   656	   196	   .	   .	   Dataset data1; range 124,289,120,266
	    chr1   .	   data    660	   676	   256	   .	   .	   Dataset data1; range 154,378,197,320
	    chr1   .	   data    680	   696	   324	   .	   .	   Dataset data1; range 220,439,249,396
	    chr1   .	   data    700	   716	   400	   .	   .	   Dataset data1; range 291,511,331,458
	    chr1   .	   data    720	   736	   484	   .	   .	   Dataset data1; range 350,627,400,572
	    chr1   .	   data    740	   756	   576	   .	   .	   Dataset data1; range 446,718,502,633
	    chr1   .	   data    760	   776	   676	   .	   .	   Dataset data1; range 515,833,576,777
	    chr1   .	   data    780	   796	   784	   .	   .	   Dataset data1; range 606,959,724,856
	    chr1   .	   data    800	   816	   900	   .	   .	   Dataset data1; range 747,1058,799,1004
	    chr1   .	   data    820	   836	   1024    .	   .	   Dataset data1; range 817,1231,958,1089
	    chr1   .	   data    840	   856	   1156    .	   .	   Dataset data1; range 961,1341,1069,1225
	    chr1   .	   data    860	   876	   1296    .	   .	   Dataset data1; range 1103,1511,1219,1385
	    chr1   .	   data    880	   896	   1444    .	   .	   Dataset data1; range 1218,1660,1338,1535
	    chr1   .	   data    900	   916	   1600    .	   .	   Dataset data1; range 1377,1828,1496,1703
	    chr1   .	   data    920	   936	   1764    .	   .	   Dataset data1; range 1547,2020,1674,1858
	    chr1   .	   data    940	   956	   1936    .	   .	   Dataset data1; range 1691,2188,1824,2043
	    chr1   .	   data    960	   976	   2116    .	   .	   Dataset data1; range 1869,2376,2019,2225
	    chr1   .	   data    980	   996	   2304    .	   .	   Dataset data1; range 2040,2554,2178,2418

       The script to render it
	    #!/usr/bin/perl

	    use strict;
	    use Bio::Graphics::FeatureFile;

	    my $data = Bio::Graphics::FeatureFile->new(-file=>'test.gff');

	    my(undef,$panel) = $data->render;
	    print $panel->png;

BUGS

       Please report them.

SEE ALSO

       Bio::Graphics::Panel, Bio::Graphics::Track, Bio::Graphics::Glyph::transcript2, Bio::Graphics::Glyph::anchored_arrow,
       Bio::Graphics::Glyph::arrow, Bio::Graphics::Glyph::box, Bio::Graphics::Glyph::primers, Bio::Graphics::Glyph::segments,
       Bio::Graphics::Glyph::toomany, Bio::Graphics::Glyph::transcript,

AUTHOR

       Lincoln Stein <lstein@cshl.org>

       Copyright (c) 2001 Cold Spring Harbor Laboratory

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  See DISCLAIMER.txt for
       disclaimers of warranty.

perl v5.14.2							    2012-02-20				    Bio::Graphics::Glyph::whiskerplot(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

Discussion started by: madhunk

2. UNIX for Dummies Questions & Answers

split a file into multiple files

Discussion started by: kumar66

3. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Discussion started by: saint2006

4. Shell Programming and Scripting

Split file into multiple files

Discussion started by: jdhahbi