Unique entries based on a range of numbers. Post: 302885883

Sponsored Content

Top Forums Shell Programming and Scripting Unique entries based on a range of numbers. Post 302885883 by flyfisherman on Tuesday 28th of January 2014 05:32:32 PM

01-28-2014

Registered User

Quote:

Originally Posted by bartus11

Perl approach:

Code:

#!/usr/bin/perl
use strict;
use warnings;

open my $input, "<", "$ARGV[0]" or die "cannot open file: $ARGV[0]";

my %ranges;
while (my $line = <$input>) {
  next if $. == 1;
  chomp $line;
  my ($alg, $pred, $lower, $upper) = split /[ \t]+/, $line;
  my $range = (grep {$lower>=(split /:/, $_)[0] && $lower<=(split /:/, $_)[1]} keys %ranges)[0];
  if ( !$range ) {
    push @{$ranges{"$lower:$upper"}{algs}}, $alg;
    $ranges{"$lower:$upper"}{pred} = $pred;
    search_and_include($lower, $upper, \%ranges);
  } else {
    push @{$ranges{$range}{algs}}, $alg;
    $ranges{$range}{pred} = $pred;
  }
}

foreach my $range (keys %ranges) {
  print "Algorithm\tpredicted_gene\tstart_point\tend_point\tNumber_of_algorithms_predicting_this_site\n";
  my $algs = join ", ", @{$ranges{$range}{algs}};
  my $algs_count = scalar @{$ranges{$range}{algs}};
  my ($lower, $upper) = split /:/, $range;
  print join "\t", $algs, $ranges{$range}{pred}, $lower, $upper, $algs_count;
  print "\n";
}

sub search_and_include {
  my ($lower_inc, $upper_inc, $ranges) = @_;
  foreach my $range (keys %ranges) {
    my ($lower, $upper) = split /:/, $range;
    if ($lower >= $lower_inc && $upper <= $upper_inc && ($lower ne $lower_inc || $upper ne $upper_inc)) {
      push @{$ranges{"$lower_inc:$upper_inc"}{algs}}, @{$ranges{$range}{algs}};
      delete $ranges{$range};
    }
  }
}

Run it like this:

Code:

./script.pl file

Hi,

thank you very much for the help. Your script is working well, with two minor problems: first, in the output file, it adds the header for every row, and second, although it works very well on the simplified example, on the true samples it won't. I've attached tow files, the input and the expected output. I'd appreciate it if you could modify the script.

Thank you very much in advanced.

flyfisherman

View Public Profile for flyfisherman

Find all posts by flyfisherman

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

To get unique numbers from two files

here i have two files: file 1 1 2 3 4 5 5 6 7 8 9 file 2 4 5 6 6 8 8

2. Shell Programming and Scripting

read numbers from file and output which numbers belongs to which range

Howdy experts, We have some ranges of number which belongs to particual group as below. GroupNo StartRange EndRange Group0125 935300 935399 Group2006 935400 935476 937430 937459 Group0324 935477 935549 ...

3. UNIX for Dummies Questions & Answers

Getting unique list of numbers using grep

Hi, I am going to fetch a list of numbers that starts with "0032" from a file with a format like the given below: " 0032459999 0032458888 0032457777 0032451111 0032452222 0032453333 0032459999 0032458888 0032457777 0032451111 0032452222 0032453333 " I want to get a unique...

4. Shell Programming and Scripting

How to generate 10.000 unique numbers?

hello, does anybody can give me a hint on how to generate a lot of numbers which are not identically via scripting etc?

5. Shell Programming and Scripting

unique random numbers awk

Hi, I have a small piece of awk code (see below) that generates random numbers. gawk -F"," 'BEGIN { srand(); for (i = 1; i <= 30; i++) printf("%s AM329_%04d\n",$0,int(36 * rand())+1) }' OFS=, AM329_hole_names.csv The code works fine and generates alphanumeric numbers like AM329_0001,...

6. UNIX for Dummies Questions & Answers

Grep for a range of numbers?

I am trying to extract specific information from a large *.sam file (it's originally 28Gb). I want to extract all lines that are on chr3 somewhere in the range of 112,937,439-113,437,438. Here is a sample line from my file so you can get a feel for what each line looks like: seq.4 0 ...

7. Shell Programming and Scripting

How to create individual entries from a range of numbers?

I want to create entries based on the series as in examples below: Input: 2dat3 grht-5&&-15 3dat3 grht-16&&-30 4dat3 ftht-4&&-12 5sat3 ftht-16&&-20 Output: 2dat3 grht-5 2dat3 grht-6 2dat3 grht-7 2dat3 grht-8

8. UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

Hi all, I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following: input file: 200006-07file.txt 145 35 10 3 147 35 12 4 146 36 11 3 145 34 12 5 143 31 15 4 146 30 14 5 desired output files:...

9. Shell Programming and Scripting

Remove duplicate entries based on the range

I have file like this: chr start end chr15 99874874 99875874 chr15 99875173 99876173 aa1 chr15 99874923 99875923 chr15 99875173 99876173 aa1 chr15 99874962 99875962 chr15 99875173 99876173 aa1 chr1 ...

10. Shell Programming and Scripting

Printing unique numbers from each file

I have some files named file1, file2, fille3......etc. These files are in a folder f1. The content of files are shown below. I would like to count the unique pairs of third column in each file. some files have no data. It should be printed as zero. Your help would be appreciated. file1 ARG...

LEARN ABOUT DEBIAN

regexp::reggrp

Regexp::RegGrp(3pm)					User Contributed Perl Documentation				       Regexp::RegGrp(3pm)

NAME

       Regexp::RegGrp - Groups a regular expressions collection

VERSION

       Version 1.002

DESCRIPTION

       Groups regular expressions to one regular expression

SYNOPSIS

	   use Regexp::RegGrp;

	   my $reggrp = Regexp::RegGrp->new(
	       {
		   reggrp	   => [
		       {
			   regexp => '%name%',
			   replacement => 'John Doe',
			   modifier    => $modifier
		       },
		       {
			   regexp => '%company%',
			   replacement => 'ACME',
			   modifier    => $modifier
		       }
		   ],
		   restore_pattern => $restore_pattern
	       }
	   );

	   $reggrp->exec( $scalar );

       To return a scalar without changing the input simply use (e.g. example 2):

	   my $ret = $reggrp->exec( $scalar );

       The first argument must be a hashref. The keys are:

       reggrp (required)
	   Arrayref of hashrefs. The keys of each hashref are:

	   regexp (required)
		   A regular expression

	   replacement (optional)
		   Scalar or sub.

		   A replacement for the regular expression match. If not set, nothing will be replaced except "store" is set.	In this case the
		   match is replaced by something like sprintf("x01%dx01", $idx) where $idx is the index of the stored element in the store_data
		   arrayref. If "store" is set the default is:

		       sub {
			   return sprintf( "x01%dx01", $_[0]->{store_index} );
		       }

		   If a custom restore_pattern is passed to to constructor you MUST also define a replacement. Otherwise it is undefined.

		   If you define a subroutine as replacement an hashref is passed to this subroutine. This hashref has four keys:

		   match       Scalar. The match of the regular expression.

		   submatches  Arrayref of submatches.

		   store_index The next index. You need this if you want to create a placeholder and store the replacement in the
			       $self->{store_data} arrayref.

		   opts        Hashref of custom options.

	   modifier (optional)
		   Scalar. The default is 'sm'.

	   store (optional)
		   Scalar or sub. If you define a subroutine an hashref is passed to this subroutine. This hashref has three keys:

		   match       Scalar. The match of the regular expression.

		   submatches  Arrayref of submatches.

		   opts        Hashref of custom options.

		   A replacement for the regular expression match. It will not replace the match directly. The replacement will be stored in the
		   $self->{store_data} arrayref. The placeholders in the text can easily be rereplaced with the restore_stored method later.

       restore_pattern (optional)
	   Scalar or Regexp object. The default restore pattern is

	       qr~x01(d+)x01~

	   This means, if you use the restore_stored method it is looking for x010x01, x011x01, ... and replaces the matches with
	   $self->{store_data}->[0], $self->{store_data}->[1], ...

EXAMPLES

       Example 1
	   Common usage.

	       #!/usr/bin/perl

	       use strict;
	       use warnings;

	       use Regexp::RegGrp;

	       my $reggrp = Regexp::RegGrp->new(
		   {
		       reggrp	       => [
			   {
			       regexp => '%name%',
			       replacement => 'John Doe'
			   },
			   {
			       regexp => '%company%',
			       replacement => 'ACME'
			   }
		       ]
		   }
	       );

	       open( INFILE, 'unprocessed.txt' );
	       open( OUTFILE, '>processed.txt' );

	       my $txt = join( '', <INFILE> );

	       $reggrp->exec( $txt );

	       print OUTFILE $txt;
	       close(INFILE);
	       close(OUTFILE);

       Example 2
	   A scalar is requested by the context. The input will remain unchanged.

	       #!/usr/bin/perl

	       use strict;
	       use warnings;

	       use Regexp::RegGrp;

	       my $reggrp = Regexp::RegGrp->new(
		   {
		       reggrp	       => [
			   {
			       regexp => '%name%',
			       replacement => 'John Doe'
			   },
			   {
			       regexp => '%company%',
			       replacement => 'ACME'
			   }
		       ]
		   }
	       );

	       open( INFILE, 'unprocessed.txt' );
	       open( OUTFILE, '>processed.txt' );

	       my $unprocessed = join( '', <INFILE> );

	       my $processed = $reggrp->exec( $unprocessed );

	       print OUTFILE $processed;
	       close(INFILE);
	       close(OUTFILE);

AUTHOR

       Merten Falk, "<nevesenin at cpan.org>"

BUGS

       Please report any bugs or feature requests through the web interface at http://github.com/nevesenin/regexp-reggrp-perl/issues
       <http://github.com/nevesenin/regexp-reggrp-perl/issues>.

SUPPORT

       You can find documentation for this module with the perldoc command.

       perldoc Regexp::RegGrp

COPYRIGHT &; LICENSE
       Copyright 2010, 2011 Merten Falk, all rights reserved.

       This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.14.2							    2012-02-18						       Regexp::RegGrp(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

To get unique numbers from two files

Discussion started by: i.scientist

2. Shell Programming and Scripting

read numbers from file and output which numbers belongs to which range

Discussion started by: thepurple

3. UNIX for Dummies Questions & Answers

Getting unique list of numbers using grep

Discussion started by: tinku

4. Shell Programming and Scripting

How to generate 10.000 unique numbers?

Discussion started by: xrays