Sponsored Content
Top Forums Shell Programming and Scripting Parse tab delimited file, check condition and delete row Post 302702517 by empyrean on Tuesday 18th of September 2012 12:21:02 PM
Old 09-18-2012
Parse tab delimited file, check condition and delete row

I am fairly new to programming and trying to resolve this problem. I have the file like this.

CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam
tg93 77 T C T T T T T
tg93 79 C - C C C - -
tg93 79 C G C C C C G C
tg93 80 G A G G G G A A G
tg93 81 A C A A A A C C C
tg93 86 C A C C A A A A C
tg93 105 A G A A A A A G A
tg93 108 A G A A A A G A A
tg93 114 T C T T T T T C T
tg93 131 A C A A A A A A A
tg93 136 G C C G C C G G G
tg93 150 CTCTC - CTCTC - CTCTC CTCTC
In this file, in the heading

CHROM - name POS - position REF - reference ALT - alternate 10 - 16_sample.bam - samplesd I Now i wanted to see how many times the letter in REF and ALT column occured. If either of them is repeated less than two times, i need to delete that row. For example In the first row, i have 'T' in REF and 'C' in ALT . I see in 7 samples, there are 5 T's and 2 blanks and no C. So i need to delete this row. In Second row, REF is 'C' and Alt is '-'. Now in seven samples we have 3 C's, 2 '-'s and 2 blanks. So we keep this row as C and - have repeated more than 2 times. Always we ignore the blanks while counting

The final file after filtering is

#CHROM POS REF ALT 10_sample.bam 11_sample.bam 12_sample.bam 13_sample.bam 14_sample.bam 15_sample.bam 16_sample.bam
tg93 79 C - C C C - -
tg93 80 G A G G G G A A G
tg93 81 A C A A A A C C C
tg93 86 C A C C A A A A C
tg93 108 A G A A A A G A A
tg93 136 G C C G C C G G G
I am able to read the columns in to arrays and display them in the code but i am not sure how to start the loops to read the base and count their occurences and remain the column. Can anyone tell me how i should be proceeding with this? Or it will be helpful if you have any example code i can modify up on.

Thank you for the help !!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Check whether a given file is in ASCII format and data is tab-delimited

Hi All, Please help me out with a script which checks whether a given file say abc.txt is in ASCII format and data is tab-delimited. If the condition doesn't satisfy then it should generate error code "100" for file not in ASCII format and "105" if it is not in tab-delimited format. If the... (9 Replies)
Discussion started by: Mandab
9 Replies

2. Shell Programming and Scripting

Delete parts of a string of character in one given column of a tab delimited file

I would like to remove characters from column 7 so that from an input file looking like this: >HWI-EAS422_12:4:1:69:89 GGTTTAAATATTGCACAAAAGGTATAGAGCGT U0 1 0 0 ref_chr8.fa 6527777 F DD I get something like that in an output file: ... (13 Replies)
Discussion started by: matlavmac
13 Replies

3. Shell Programming and Scripting

Delete first column in tab-delimited text-file

I have a large text-file with tab-delimited genetic data that looks like: KSC112 KSC234 0 0 1 1 A G C T I simply wan to delete the first column, but since the file has 600 000 columns, it is not possible with awk (seems to be limited at 32k columns). Does anyone have an idea how to do this? (2 Replies)
Discussion started by: andmal
2 Replies

4. UNIX for Dummies Questions & Answers

How do you delete cells from a space delimited text file given row and column number?

How do you delete cells from a space delimited text file given row and column number? Letś say the row number is r and the column number is c. Thanks! (5 Replies)
Discussion started by: evelibertine
5 Replies

5. UNIX for Dummies Questions & Answers

Delete header row and reformat from tab delimited to fixed width

Hello gurus, I have a file in a tab delimited format and a header row. I need a code to delete the header in the file, and convert the file to a fixed width format, with all the columns aligned. Below is a sample of the file:... (4 Replies)
Discussion started by: chumsky
4 Replies

6. Shell Programming and Scripting

Delete an entire column from a tab delimited file

Hi, Can anyone please tell me about how we can delete an entire column from a tab delimited file? Mu input_file.txt looks like this: And I want the output as: I used the below code nawk -v d="1" 'BEGIN{FS=OFS="\t"}{$d=""}{print}' input_file.txtBut in the output, the first column is... (5 Replies)
Discussion started by: sampoorna
5 Replies

7. Shell Programming and Scripting

Delete and insert columns in a tab delimited file

Hi all , I have a file having 12 columns tab delimited . I need to read this file and remove the column 3 and column 4 and insert a word in column 3 as "AVIALABLE " Is there a way to do this . I am trying like below Thanks DJ cat $FILENAME|awk -F"\t" '{ print $1 "\t... (3 Replies)
Discussion started by: Hypesslearner
3 Replies

8. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

9. UNIX for Beginners Questions & Answers

awk to parse current and next row in tab-delimited file

Hi there, I would like to use awk to reformat a tab-delimited file containing three columns as follows: Data file: sample 1 173 sample 269 530 sample 687 733 sample 1699 1779 Desired output file: sample 174..265, 531..686, 734..1698 I need the value... (5 Replies)
Discussion started by: emiley
5 Replies

10. UNIX for Beginners Questions & Answers

Replace a column in tab delimited file with column in other tab delimited file,based on match

Hello Everyone.. I want to replace the retail col from FileI with cstp1 col from FileP if the strpno matches in both files FileP.txt ... (2 Replies)
Discussion started by: YogeshG
2 Replies
Bio::Tools::Run::BEDTools(3pm)				User Contributed Perl Documentation			    Bio::Tools::Run::BEDTools(3pm)

NAME
Bio::Tools::Run::BEDTools - Run wrapper for the BEDTools suite of programs *BETA* SYNOPSIS
# use a BEDTools program $bedtools_fac = Bio::Tools::Run::BEDTools->new( -command => 'subtract' ); $result_file = $bedtools_fac->run( -bed1 => 'genes.bed', -bed2 => 'mask.bed' ); # if IO::Uncompress::Gunzip is available... $result_file = $bedtools_fac->run( -bed1 => 'genes.bed.gz', -bed2 => 'mask.bed.gz' ); # be more strict $bedtools_fac->set_parameters( -strandedness => 1 ); # and even more... $bedtools_fac->set_parameters( -minimum_overlap => 1e-6 ); # create a Bio::SeqFeature::Collection object $features = $bedtools_fac->result( -want => 'Bio::SeqFeature::Collection' ); DEPRECATION WARNING
Most executables from BEDTools v>=2.10.1 can read GFF and VCF formats in addition to BED format. This requires the use of a new input file param, shown in the following documentation, '-bgv', in place of '-bed' for the executables that can do this. This behaviour breaks existing scripts. DESCRIPTION
This module provides a wrapper interface for Aaron R. Quinlan and Ira M. Hall's utilities "BEDTools" that allow for (among other things): o Intersecting two BED files in search of overlapping features. o Merging overlapping features. o Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. o Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see <http://code.google.com/p/bedtools/> for manuals and downloads). OPTIONS
"BEDTools" is a suite of 17 commandline executable. This module attempts to provide and options comprehensively. You can browse the choices like so: $bedtools_fac = Bio::Tools::Run::BEDTools->new; # all bowtie commands @all_commands = $bedtools_fac->available_parameters('commands'); @all_commands = $bedtools_fac->available_commands; # alias # just for default command ('bam_to_bed') @btb_params = $bedtools_fac->available_parameters('params'); @btb_switches = $bedtools_fac->available_parameters('switches'); @btb_all_options = $bedtools_fac->available_parameters(); Reasonably mnemonic names have been assigned to the single-letter command line options. These are the names returned by "available_parameters", and can be used in the factory constructor like typical BioPerl named parameters. As a number of options are mutually exclusive, and the interpretation of intent is based on last-pass option reaching bowtie with potentially unpredicted results. This module will prevent inconsistent switches and parameters from being passed. See <http://code.google.com/p/bedtools/> for details of BEDTools options. FILES
When a command requires filenames, these are provided to the "run" method, not the constructor ("new()"). To see the set of files required by a command, use "available_parameters('filespec')" or the alias "filespec()": $bedtools_fac = Bio::Tools::Run::BEDTools->new( -command => 'pair_to_bed' ); @filespec = $bedtools_fac->filespec; This example returns the following array: #bedpe #bam bed #out This indicates that the bed ("BEDTools" BED format) file MUST be specified, and that the out, bedpe ("BEDTools" BEDPE format) and bam ("SAM" binary format) file MAY be specified (Note that in this case you MUST provide ONE of bedpe OR bam, the module at this stage does not allow this information to be queried). Use these in the "run" call like so: $bedtools_fac->run( -bedpe => 'paired.bedpe', -bgv => 'genes.bed', -out => 'overlap' ); The object will store the programs STDERR output for you in the "stderr()" attribute: handle_bed_warning($bedtools_fac) if ($bedtools_fac->stderr =~ /Usage:/); For the commands 'fasta_from_bed' and 'mask_fasta_from_bed' STDOUT will also be captured in the "stdout()" attribute by default and all other commands can be forced to capture program output in STDOUT by setting the -out filespec parameter to '-'. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org Rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://redmine.open-bio.org/projects/bioperl/ AUTHOR - Dan Kortschak Email dan.kortschak adelaide.edu.au CONTRIBUTORS
Additional contributors names and emails here APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new() Title : new Usage : my $obj = new Bio::Tools::Run::BEDTools(); Function: Builds a new Bio::Tools::Run::BEDTools object Returns : an instance of Bio::Tools::Run::BEDTools Args : run() Title : run Usage : $result = $bedtools_fac->run(%params); Function: Run a BEDTools command. Returns : Command results (file, IO object or Bio object) Args : Dependent on filespec for command. See $bedtools_fac->filespec and BEDTools Manual. Also accepts -want => '(raw|format|<object_class>)' - see want(). Note : gzipped inputs are allowed if IO::Uncompress::Gunzip is available Command <in> <out> annotate bgv ann(s) #out graph_union bg_files #out fasta_from_bed seq bgv #out mask_fasta_from_bed seq bgv #out bam_to_bed bam #out bed_to_IGV bgv #out merge bgv #out sort bgv #out links bgv #out b12_to_b6 bed #out overlap bed #out group_by bed #out bed_to_bam bgv #out shuffle bgv genome #out slop bgv genome #out complement bgv genome #out genome_coverage bed genome #out window bgv1 bgv2 #out closest bgv1 bgv2 #out coverage bgv1 bgv2 #out subtract bgv1 bgv2 #out pair_to_pair bedpe1 bedpe2 #out intersect bgv1|bam bgv2 #out pair_to_bed bedpe|bam bgv #out bgv* signifies any of BED, GFF or VCF. ann is a bgv. NOTE: Replace 'bgv' with 'bed' unless $use_bgv is set. want() Title : want Usage : $bowtiefac->want( $class ) Function: make factory return $class, or 'raw' results in file or 'format' for result format All commands can return Bio::Root::IO commands returning: can return object: - BED or BEDPE - Bio::SeqFeature::Collection - sequence - Bio::SeqIO Returns : return wanted type Args : [optional] string indicating class or raw of wanted result result() Title : result Usage : $bedtoolsfac->result( [-want => $type|$format] ) Function: return result in wanted format Returns : results Args : [optional] hashref of wanted type Note : -want arg does not persist between result() call when specified in result(), for persistence, use want() _determine_format() Title : _determine_format( $has_run ) Usage : $bedtools-fac->_determine_format Function: determine the format of output for current options Returns : format of bowtie output Args : [optional] boolean to indicate result exists _read_bed() Title : _read_bed() Usage : $bedtools_fac->_read_bed Function: return a Bio::SeqFeature::Collection object from a BED file Returns : Bio::SeqFeature::Collection Args : _read_bedpe() Title : _read_bedpe() Usage : $bedtools_fac->_read_bedpe Function: return a Bio::SeqFeature::Collection object from a BEDPE file Returns : Bio::SeqFeature::Collection Args : _validate_file_input() Title : _validate_file_input Usage : $bedtools_fac->_validate_file_input( -type => $file ) Function: validate file type for file spec Returns : file type if valid type for file spec Args : hash of filespec => file_name version() Title : version Usage : $version = $bedtools_fac->version() Function: Returns the program version (if available) Returns : string representing location and version of the program perl v5.12.3 2011-06-18 Bio::Tools::Run::BEDTools(3pm)
All times are GMT -4. The time now is 12:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy