Sponsored Content
Full Discussion: Counts not matching in file
Top Forums Shell Programming and Scripting Counts not matching in file Post 302960234 by cmccabe on Thursday 12th of November 2015 02:59:26 PM
Old 11-12-2015
Counts not matching in file

I can not figure out why there are 56,548 unique entries in test.bed. However, perl and awk see only 56,543 and that # is what my analysis see's as well. What happened to the 5 missing? Thank you Smilie.

The file is attached as well.

Code:
cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$wc -l test.bed
56548 test.bed

cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$ perl -nae '$seen{$F[3]}++;
    END{
        print "There are ", scalar keys %seen, " unique fourth fields\n";
    }' test.bed
There are 56543 unique fourth fields

cmccabe@DTV-A5211QLM:~/Desktop/NGS/bed/bedtools$ awk '$4!=d{c++;d=$4}END{print c}' test.bed
56543

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

counts

How can i do a simple record count in my shell script? i just want to count the number of records i receive from a specific file. (11 Replies)
Discussion started by: k@ssidy
11 Replies

2. Solaris

file size counts??

Hello experts, I do - $ ls -lhtr logs2007* Is it possible that i can get the results of- totals size in MB/KB for ALL "logs2007*" note: in the same directory I have "logs2006*" & "logs2007*" files. (4 Replies)
Discussion started by: thepurple
4 Replies

3. UNIX for Dummies Questions & Answers

counts

To start I have a table that has ticketholders. Each ticket holder has a unique number and each ticket holder is associated to a so called household number. You can have multiple guests w/i a household. I would like to create 3 flags (form a, for a household that has 1-4 gst) form b 5-8 gsts... (3 Replies)
Discussion started by: sbr262
3 Replies

4. Shell Programming and Scripting

Perl script that counts lines of a file

I am working on this script, but hit a bump. Looking for a little help figuring out the last part: open(MY_FILE, $ARGV) or die $COUNTER = 1; $LINE = <FILE>; while ($LINE, <FILE>) { # Adds leading zeros for numbers 1 digit long if ($COUNTER<10){ print "000"; } # Adds... (2 Replies)
Discussion started by: Breakology
2 Replies

5. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

6. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

7. UNIX for Dummies Questions & Answers

Hardcoding & Record counts in a file

HI , I am having a huge comma delimiter file, I have to append the following four lines before the starting of the file through a shell script. FILE NAME = TEST_LOAD DATETIME = CURRENT DATE TIME LOAD DATE = CURRENT DATE RECORD COUNT = TOTAL RECORDS IN FILE Source data 1,2,3,4,5,6,7... (7 Replies)
Discussion started by: shruthidwh
7 Replies

8. Shell Programming and Scripting

word counts for a single line xml file

I have any XML ouput file(file name TABLE.xml), where the data is loaded in A SINGLE LINE, I need help in writting a ksh shell script which gives me the word counts of word <TABLE-ROW> This is my input file. <?xml version="1.0" encoding="UTF-8"?><!--Generated by Ascential Software... (4 Replies)
Discussion started by: pred55
4 Replies

9. Shell Programming and Scripting

New file should store all the 7 existing filenames and their record counts and ftp th

Hi, I need help regarding below concern. There is a script and it has 7 existing files(in a path say,. usr/appl/temp/file1.txt) and I need to create one new blank file say “file_count.txt” in the same script itself. Then the new file <file_count.txt> should store all the 7 filenames and... (1 Reply)
Discussion started by: pr293
1 Replies

10. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies
Bio::Tools::Run::BEDTools(3pm)				User Contributed Perl Documentation			    Bio::Tools::Run::BEDTools(3pm)

NAME
Bio::Tools::Run::BEDTools - Run wrapper for the BEDTools suite of programs *BETA* SYNOPSIS
# use a BEDTools program $bedtools_fac = Bio::Tools::Run::BEDTools->new( -command => 'subtract' ); $result_file = $bedtools_fac->run( -bed1 => 'genes.bed', -bed2 => 'mask.bed' ); # if IO::Uncompress::Gunzip is available... $result_file = $bedtools_fac->run( -bed1 => 'genes.bed.gz', -bed2 => 'mask.bed.gz' ); # be more strict $bedtools_fac->set_parameters( -strandedness => 1 ); # and even more... $bedtools_fac->set_parameters( -minimum_overlap => 1e-6 ); # create a Bio::SeqFeature::Collection object $features = $bedtools_fac->result( -want => 'Bio::SeqFeature::Collection' ); DEPRECATION WARNING
Most executables from BEDTools v>=2.10.1 can read GFF and VCF formats in addition to BED format. This requires the use of a new input file param, shown in the following documentation, '-bgv', in place of '-bed' for the executables that can do this. This behaviour breaks existing scripts. DESCRIPTION
This module provides a wrapper interface for Aaron R. Quinlan and Ira M. Hall's utilities "BEDTools" that allow for (among other things): o Intersecting two BED files in search of overlapping features. o Merging overlapping features. o Screening for paired-end (PE) overlaps between PE sequences and existing genomic features. o Calculating the depth and breadth of sequence coverage across defined "windows" in a genome. (see <http://code.google.com/p/bedtools/> for manuals and downloads). OPTIONS
"BEDTools" is a suite of 17 commandline executable. This module attempts to provide and options comprehensively. You can browse the choices like so: $bedtools_fac = Bio::Tools::Run::BEDTools->new; # all bowtie commands @all_commands = $bedtools_fac->available_parameters('commands'); @all_commands = $bedtools_fac->available_commands; # alias # just for default command ('bam_to_bed') @btb_params = $bedtools_fac->available_parameters('params'); @btb_switches = $bedtools_fac->available_parameters('switches'); @btb_all_options = $bedtools_fac->available_parameters(); Reasonably mnemonic names have been assigned to the single-letter command line options. These are the names returned by "available_parameters", and can be used in the factory constructor like typical BioPerl named parameters. As a number of options are mutually exclusive, and the interpretation of intent is based on last-pass option reaching bowtie with potentially unpredicted results. This module will prevent inconsistent switches and parameters from being passed. See <http://code.google.com/p/bedtools/> for details of BEDTools options. FILES
When a command requires filenames, these are provided to the "run" method, not the constructor ("new()"). To see the set of files required by a command, use "available_parameters('filespec')" or the alias "filespec()": $bedtools_fac = Bio::Tools::Run::BEDTools->new( -command => 'pair_to_bed' ); @filespec = $bedtools_fac->filespec; This example returns the following array: #bedpe #bam bed #out This indicates that the bed ("BEDTools" BED format) file MUST be specified, and that the out, bedpe ("BEDTools" BEDPE format) and bam ("SAM" binary format) file MAY be specified (Note that in this case you MUST provide ONE of bedpe OR bam, the module at this stage does not allow this information to be queried). Use these in the "run" call like so: $bedtools_fac->run( -bedpe => 'paired.bedpe', -bgv => 'genes.bed', -out => 'overlap' ); The object will store the programs STDERR output for you in the "stderr()" attribute: handle_bed_warning($bedtools_fac) if ($bedtools_fac->stderr =~ /Usage:/); For the commands 'fasta_from_bed' and 'mask_fasta_from_bed' STDOUT will also be captured in the "stdout()" attribute by default and all other commands can be forced to capture program output in STDOUT by setting the -out filespec parameter to '-'. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org Rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: http://redmine.open-bio.org/projects/bioperl/ AUTHOR - Dan Kortschak Email dan.kortschak adelaide.edu.au CONTRIBUTORS
Additional contributors names and emails here APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new() Title : new Usage : my $obj = new Bio::Tools::Run::BEDTools(); Function: Builds a new Bio::Tools::Run::BEDTools object Returns : an instance of Bio::Tools::Run::BEDTools Args : run() Title : run Usage : $result = $bedtools_fac->run(%params); Function: Run a BEDTools command. Returns : Command results (file, IO object or Bio object) Args : Dependent on filespec for command. See $bedtools_fac->filespec and BEDTools Manual. Also accepts -want => '(raw|format|<object_class>)' - see want(). Note : gzipped inputs are allowed if IO::Uncompress::Gunzip is available Command <in> <out> annotate bgv ann(s) #out graph_union bg_files #out fasta_from_bed seq bgv #out mask_fasta_from_bed seq bgv #out bam_to_bed bam #out bed_to_IGV bgv #out merge bgv #out sort bgv #out links bgv #out b12_to_b6 bed #out overlap bed #out group_by bed #out bed_to_bam bgv #out shuffle bgv genome #out slop bgv genome #out complement bgv genome #out genome_coverage bed genome #out window bgv1 bgv2 #out closest bgv1 bgv2 #out coverage bgv1 bgv2 #out subtract bgv1 bgv2 #out pair_to_pair bedpe1 bedpe2 #out intersect bgv1|bam bgv2 #out pair_to_bed bedpe|bam bgv #out bgv* signifies any of BED, GFF or VCF. ann is a bgv. NOTE: Replace 'bgv' with 'bed' unless $use_bgv is set. want() Title : want Usage : $bowtiefac->want( $class ) Function: make factory return $class, or 'raw' results in file or 'format' for result format All commands can return Bio::Root::IO commands returning: can return object: - BED or BEDPE - Bio::SeqFeature::Collection - sequence - Bio::SeqIO Returns : return wanted type Args : [optional] string indicating class or raw of wanted result result() Title : result Usage : $bedtoolsfac->result( [-want => $type|$format] ) Function: return result in wanted format Returns : results Args : [optional] hashref of wanted type Note : -want arg does not persist between result() call when specified in result(), for persistence, use want() _determine_format() Title : _determine_format( $has_run ) Usage : $bedtools-fac->_determine_format Function: determine the format of output for current options Returns : format of bowtie output Args : [optional] boolean to indicate result exists _read_bed() Title : _read_bed() Usage : $bedtools_fac->_read_bed Function: return a Bio::SeqFeature::Collection object from a BED file Returns : Bio::SeqFeature::Collection Args : _read_bedpe() Title : _read_bedpe() Usage : $bedtools_fac->_read_bedpe Function: return a Bio::SeqFeature::Collection object from a BEDPE file Returns : Bio::SeqFeature::Collection Args : _validate_file_input() Title : _validate_file_input Usage : $bedtools_fac->_validate_file_input( -type => $file ) Function: validate file type for file spec Returns : file type if valid type for file spec Args : hash of filespec => file_name version() Title : version Usage : $version = $bedtools_fac->version() Function: Returns the program version (if available) Returns : string representing location and version of the program perl v5.12.3 2011-06-18 Bio::Tools::Run::BEDTools(3pm)
All times are GMT -4. The time now is 05:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy