Perl: filtering lines based on duplicate values in a column Post: 302558129

Sponsored Content

Top Forums Shell Programming and Scripting Perl: filtering lines based on duplicate values in a column Post 302558129 by polsum on Thursday 22nd of September 2011 09:16:09 PM

09-22-2011

Registered User

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times.

Code:

13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG
13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG
13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT
13 18 1 + chromosome 1, 187172197 AGAGTATGGTCGCGGTTG
13 18 1 - chromosome X, 38407155 CAACCGCGACCATACTCT
13 18 1 + chromosome 9, 13503259 AGAGTATGGTCGCGGTTG
13 18 1 - chromosome 2, 105480832 CAACCGCGACCATACTCT
13 18 1 + chromosome 9, 49045535 AGAGTATGGTCGCGGTTG
13 18 1 + chromosome 1, 178729626 AGAGTATGGTCGCGGTTG
13 18 1 - chromosome X, 55081462 CAACCGCGACCATACTCT
9 17 2 + chromosome 10, 101398385 GCCAGTTCTACAGTCCG
9 17 2 - chromosome 3, 103818009 CGGACTGTAGAACTGGC
9 17 2 - chromosome 16, 94552245 CGGACTGTAGAACTGGC
4 18 1 - chromosome 18, 70056996 TACCCAACAACACATAGT

The value 13 in the first column is repeated 10 times in the consecutive lines. I need to eliminate all those lines in the output.

so the desired output will be

Code:

9 17 2 + chromosome 10, 101398385 GCCAGTTCTACAGTCCG
9 17 2 - chromosome 3, 103818009 CGGACTGTAGAACTGGC
9 17 2 - chromosome 16, 94552245 CGGACTGTAGAACTGGC
4 18 1 - chromosome 18, 70056996 TACCCAACAACACATAGT

Thank you much in advance. If it is possible a code in Perl would be much appreciated.

polsum

View Public Profile for polsum

Find all posts by polsum

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Does anybody know a command that filters duplicate lines out of a file. Similar to the uniq command but can handle duplicate lines no matter where they occur in a file?

2. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7

3. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44...

4. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Hi, I've got a file that I'd like to uniquely sort based on column 2 (values in column 2 begin with "comp"). I tried sort -t -nuk2,3 file.txtBut got: sort: multi-character tab `-nuk2,3' "man sort" did not help me out Any pointers? Input: Output:

5. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Hi experts, I have a tab-delimited file with one column containing values separated by a comma. I wish to duplicate the entire line for every value in that comma-delimited field. For example: $cat file 4444 4444 4444 4444 9990 2222,7777 6666 2222 ...

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters...

7. Shell Programming and Scripting

Removing duplicate lines on first column based with pipe delimiter

Hi, I have tried to remove dublicate lines based on first column with pipe delimiter . but i ma not able to get some uniqu lines Command : sort -t'|' -nuk1 file.txt Input : 38376KZ|09/25/15|1.057 38376KZ|09/25/15|1.057 02006YB|09/25/15|0.859 12593PS|09/25/15|2.803...

8. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same...

9. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ...

10. UNIX for Beginners Questions & Answers

Find lines with duplicate values in a particular column

I have a file with 5 columns. I want to pull out all records where the value in column 4 is not unique. For example in the sample below, I would want it to print out all lines except for the last two. 40991764 2419 724 47182 Cand A 40992936 3591 724 47182 Cand B 40993016 3671 724 47182 Cand C...

LEARN ABOUT DEBIAN

ace::sequence::gene

Ace::Sequence::Gene(3pm)				User Contributed Perl Documentation				  Ace::Sequence::Gene(3pm)

NAME

       Ace::Sequence::Gene - Simple "Gene" Object

SYNOPSIS

	   # open database connection and get an Ace::Object sequence
	   use Ace::Sequence;

	   # get a megabase from the middle of chromosome I
	   $seq = Ace::Sequence->new(-name   => 'CHROMOSOME_I,
				     -db     => $db,
				     -offset => 3_000_000,
				     -length => 1_000_000);

	   # get all the genes
	   @genes = $seq->genes;

	   # get the exons from the first one
	   @exons = $genes[0]->exons;

	   # get the introns
	   @introns = $genes[0]->introns

	   # get the CDSs (NOT IMPLEMENTED YET!)
	   @cds = $genes[0]->cds;

DESCRIPTION

       Ace::Sequence::Gene is a subclass of Ace::Sequence::Feature.  It inherits all the methods of Ace::Sequence::Feature, but adds the ability
       to retrieve the annotated introns and exons of the gene.

OBJECT CREATION

       You will not ordinarily create an Ace::Sequence::Gene object directly.  Instead, objects will be created in response to a genes() call to
       an Ace::Sequence object.

OBJECT METHODS

       Most methods are inherited from Ace::Sequence::Feature.	The following methods are also supported:

       exons()
	     @exons = $gene->exons;

	   Return a list of Ace::Sequence::Feature objects corresponding to annotated exons.

       introns()
	     @introns = $gene->introns;

	   Return a list of Ace::Sequence::Feature objects corresponding to annotated introns.

       cds()
	     @cds = $gene->cds;

	   Return a list of Ace::Sequence::Feature objects corresponding to coding sequence.  THIS IS NOT YET IMPLEMENTED.

       relative()
	     $relative = $gene->relative;
	     $gene->relative(1);

	   This turns on and off relative coordinates.	By default, the exons and intron features will be returned in the coordinate system used
	   by the gene.  If relative() is set to a true value, then coordinates will be expressed as relative to the start of the gene.  The first
	   exon will (usually) be 1.

SEE ALSO

       Ace, Ace::Object, Ace::Sequence,Ace::Sequence::Homol, Ace::Sequence::Feature, Ace::Sequence::FeatureList, GFF

AUTHOR

       Lincoln Stein <lstein@cshl.org> with extensive help from Jean Thierry-Mieg <mieg@kaa.crbm.cnrs-mop.fr>

       Copyright (c) 1999, Lincoln D. Stein

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  See DISCLAIMER.txt for
       disclaimers of warranty.

POD ERRORS

       Hey! The above document had some coding errors, which are explained below:

       Around line 148:
	   You forgot a '=back' before '=head1'

perl v5.14.2							    2001-02-18						  Ace::Sequence::Gene(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Filtering duplicate lines

Discussion started by: AreaMan

2. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Discussion started by: seqbiologist

3. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Discussion started by: polsum

4. UNIX for Dummies Questions & Answers

[SOLVED] remove lines that have duplicate values in column two

Discussion started by: pathunkathunk