Sponsored Content
Top Forums Shell Programming and Scripting How to remove a subset of data from a large dataset based on values on one line Post 302575977 by davegen on Wednesday 23rd of November 2011 10:28:49 AM
Old 11-23-2011
How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format:

marker1 marker2 marker3 marker4
position1 position2 position3 position4
genotype1 genotype2 genotype3 genotype4

with marker being a name, position a numeric measure of distance and genotype also a number.

I need to remove columns based on the values in the "position" line i.e. I need a script to take each position and remove adjacent columns that are within a certain distance of that marker, which is indicated by the value the position line which is a measure of distance. So if the file looked like this

rs1 rs2 rs3 rs4 rs5
1 2 3 4 5
2 3 1 1 2

and I was dealing with rs3 and the distance I wanted to remove was 1, I would want the output:

rs1 rs3 rs5
1 3 5
2 1 2

Does anyone know any way can do this? I appreciate any help and I hope I haven't been too confusing!
 

10 More Discussions You Might Find Interesting

1. Programming

I have C++ exe file( no source code) and need to run many large dataset under unix, b

I have C++ exe file( no source code) and need to run many large dataset under unix, but how to know the memeroy usage for one dataset?http://www.codeproject.com/script/Forums/Images/New.gif I think "top" is not good and if using the profiler, it seems no free download, any ideas? (1 Reply)
Discussion started by: Danielwang1986
1 Replies

2. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"... (2 Replies)
Discussion started by: blubbiblubbkekz
2 Replies

3. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

4. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

5. Shell Programming and Scripting

Find line number of bad data in large file

Hi Forum. I was trying to search the following scenario on the forum but was not able to. Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line. What's the easiest... (3 Replies)
Discussion started by: pchang
3 Replies

6. UNIX for Advanced & Expert Users

How to extract subset file from dataset?

Hello I have a data set which looks like this : progeny sire dam gender 12 1 3 M 13 2 4 F 14 2 5 F 15 6 5 ... (13 Replies)
Discussion started by: sajmar
13 Replies

7. Shell Programming and Scripting

How to read file line by line and compare subset of 1st line with 2nd?

Hi all, I have a log file say Test.log that gets updated continuously and it has data in pipe separated format. A sample log file would look like: <date1>|<data1>|<url1>|<result1> <date2>|<data2>|<url2>|<result2> <date3>|<data3>|<url3>|<result3> <date4>|<data4>|<url4>|<result4> What I... (3 Replies)
Discussion started by: pat_pramod
3 Replies

8. Shell Programming and Scripting

Selecting random columns from large dataset in UNIX

Dear folks I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select... (5 Replies)
Discussion started by: sajmar
5 Replies

9. Shell Programming and Scripting

Reoccuring peak values in large data file and print the line..

Hi i have some large data files that contain several fields and rows the data in a field have a numeric value that is in a sine wave pattern what i would like todo is locate each peak and pick the highest value and print that complete line. the data looks something like this it is field nr4 which... (4 Replies)
Discussion started by: ninjaunx
4 Replies

10. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies
Bio::Map::Mappable(3pm) 				User Contributed Perl Documentation				   Bio::Map::Mappable(3pm)

NAME
Bio::Map::Mappable - An object representing a generic map element that can have multiple locations in several maps. SYNOPSIS
# a map element in two different positions on the same map $map1 = Bio::Map::SimpleMap->new(); $position1 = Bio::Map::Position->new(-map => $map1, -value => 100); $position2 = Bio::Map::Position->new(-map => $map1, -value => 200); $mappable = Bio::Map::Mappable->new(-positions => [$position1, $position2] ); # add another position on a different map $map2 = Bio::Map::SimpleMap->new(); $position3 = Bio::Map::Position->new(-map => $map2, $value => 50); $mappable->add_position($position3); # get all the places our map element is found, on a particular map of interest foreach $pos ($mappable->get_positions($map1)) { print $pos->value, " "; } DESCRIPTION
This object handles the notion of a generic map element. Mappables are entities with one or more positions on one or more maps. This object is a pure perl implementation of Bio::Map::MappableI. That interface implements some of its own methods so check the docs there for those. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Sendu Bala Email bix@sendu.me.uk APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $mappable = Bio::Map::Mappable->new(); Function: Builds a new Bio::Map::Mappable object Returns : Bio::Map::Mappable Args : -name => string : name of the mappable element -id => string : id of the mappable element name Title : name Usage : $mappable->name($new_name); my $name = $mappable->name(); Function: Get/Set the name for this Mappable Returns : A scalar representing the current name of this Mappable Args : none to get string to set id Title : id Usage : my $id = $mappable->id(); $mappable->id($new_id); Function: Get/Set the id for this Mappable. Returns : A scalar representing the current id of this Mappable Args : none to get string to set in_map Title : in_map Usage : if ($mappable->in_map($map)) {...} Function: Tests if this mappable is found on a specific map Returns : boolean Args : L<Bio::Map::MapI> Comparison methods equals Title : equals Usage : if ($mappable->equals($other_mappable)) {...} my @equal_positions = $mappable->equals($other_mappable); Function: Finds the positions in this mappable that are equal to any comparison positions. Returns : array of L<Bio::Map::PositionI> objects Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative less_than Title : less_than Usage : if ($mappable->less_than($other_mappable)) {...} my @lesser_positions = $mappable->less_than($other_mappable); Function: Finds the positions in this mappable that are less than all comparison positions. Returns : array of L<Bio::Map::PositionI> objects Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative greater_than Title : greater_than Usage : if ($mappable->greater_than($other_mappable)) {...} my @greater_positions = $mappable->greater_than($other_mappable); Function: Finds the positions in this mappable that are greater than all comparison positions. Returns : array of L<Bio::Map::PositionI> objects Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative overlaps Title : overlaps Usage : if ($mappable->overlaps($other_mappable)) {...} my @overlapping_positions = $mappable->overlaps($other_mappable); Function: Finds the positions in this mappable that overlap with any comparison positions. Returns : array of L<Bio::Map::PositionI> objects Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative contains Title : contains Usage : if ($mappable->contains($other_mappable)) {...} my @container_positions = $mappable->contains($other_mappable); Function: Finds the positions in this mappable that contain any comparison positions. Returns : array of L<Bio::Map::PositionI> objects Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative overlapping_groups Title : overlapping_groups Usage : my @groups = $mappable->overlapping_groups($other_mappable); my @groups = Bio::Map::Mappable->overlapping_groups(@mappables); Function: Look at all the positions of all the supplied mappables and group them according to overlap. Returns : array of array refs, each ref containing the Bio::Map::PositionI objects that overlap with each other Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to, or an array ref of such objects (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative -min_pos_num => int : the minimum number of positions that must be in a group before it will be returned [default is 1] -min_mappables_num => int : the minimum number of different mappables represented by the positions in a group before it will be returned [default is 1] -min_mappables_percent => number : as above, but the minimum percentage of input mappables [default is 0] -min_map_num => int : the minimum number of different maps represented by the positions in a group before it will be returned [default is 1] -min_map_percent => number : as above, but the minimum percentage of maps known by the input mappables [default is 0] -require_self => 1|0 : require that at least one of the calling object's positions be in each group [default is 1, has no effect when the second usage form is used] -required => @mappables : require that at least one position for each mappable supplied in this array ref be in each group disconnected_intersections Title : disconnected_intersections Usage : @positions = $mappable->disconnected_intersections($other_mappable); @positions = Bio::Map::Mappable->disconnected_intersections(@mappables); Function: Make the positions that are at the intersection of each group of overlapping positions, considering all the positions of the supplied mappables. Returns : new Bio::Map::Mappable who's positions on maps are the calculated disconnected unions Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to, or an array ref of such objects (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative -min_pos_num => int : the minimum number of positions that must be in a group before the intersection will be calculated and returned [default is 1] -min_mappables_num => int : the minimum number of different mappables represented by the positions in a group before the intersection will be calculated and returned [default is 1] -min_mappables_percent => number : as above, but the minimum percentage of input mappables [default is 0] -min_map_num => int : the minimum number of different maps represented by the positions in a group before the intersection will be calculated and returned [default is 1] -min_map_percent => number : as above, but the minimum percentage of maps known by the input mappables [default is 0] -require_self => 1|0 : require that at least one of the calling object's positions be in each group [default is 1, has no effect when the second usage form is used] -required => @mappables : require that at least one position for each mappable supplied in this array ref be in each group disconnected_unions Title : disconnected_unions Usage : my @positions = $mappable->disconnected_unions($other_mappable); my @positions = Bio::Map::Mappable->disconnected_unions(@mappables); Function: Make the positions that are the union of each group of overlapping positions, considering all the positions of the supplied mappables. Returns : new Bio::Map::Mappable who's positions on maps are the calculated disconnected unions Args : arg #1 = L<Bio::Map::MappableI> OR L<Bio::Map::PositionI> to compare this one to, or an array ref of such objects (mandatory) arg #2 = optionally, one or more of the key => value pairs below -map => MapI : a Bio::Map::MapI to only consider positions on the given map -relative => RelativeI : a Bio::Map::RelativeI to calculate in terms of each Position's relative position to the thing described by that Relative -min_pos_num => int : the minimum number of positions that must be in a group before the union will be calculated and returned [default is 1] -min_mappables_num => int : the minimum number of different mappables represented by the positions in a group before the union will be calculated and returned [default is 1] -min_mappables_percent => number : as above, but the minimum percentage of input mappables [default is 0] -min_map_num => int : the minimum number of different maps represented by the positions in a group before the union will be calculated and returned [default is 1] -min_map_percent => number : as above, but the minimum percentage of maps known by the input mappables [default is 0] -require_self => 1|0 : require that at least one of the calling object's positions be in each group [default is 1, has no effect when the second usage form is used] -required => @mappables : require that at least one position for each mappable supplied in this array ref be in each group tuple Title : tuple Usage : Do Not Use! Function: tuple was supposed to be a private method; this method no longer does anything Returns : warning Args : none Status : deprecated, will be removed in next version annotation Title : annotation Usage : $mappable->annotation($an_col); my $an_col = $mappable->annotation(); Function: Get the annotation collection (see Bio::AnnotationCollectionI) for this annotatable object. Returns : a Bio::AnnotationCollectionI implementing object, or undef Args : none to get, OR a Bio::AnnotationCollectionI implementing object to set perl v5.14.2 2012-03-02 Bio::Map::Mappable(3pm)
All times are GMT -4. The time now is 02:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy