Sponsored Content
Top Forums Shell Programming and Scripting Script for identifying and deleting dupes in a line Post 302608064 by gimley on Friday 16th of March 2012 04:35:23 AM
Old 03-16-2012
Script for identifying and deleting dupes in a line

I am compiling a synonym dictionary which has the following structure
Headword=Synonym1,Synonym2 and so on, with each synonym separated by a comma.
As is usual in such cases manual preparation of synonyms results in repeating the synonym which results in dupes as in the example below:
Code:
arrogance=affectation,affected manners,airs,array,boastfulness,boasting,bombast,braggadocio,bravado,brazenness,bumptiousness,conceit,contempt,contemptuousness,contumeliousness,contumely,coxcombry,crowing,dandyism,dash,disdain,disdainfulness,display,egotism,fanfare,fanfaronade,fatuousness,flourish,foppery,foppishness,frills and furbelows,frippery,gall,getting on one's high horse,glitter,gloating,haughtiness,hauteur,high notions,highfalutin' ways,loftiness,nerve,ostentation,overconfidence,pageantry,panache,parade,pomp,pomposity,pompousness,presumption,presumptuousness,pretension,pretentiousness,pride,putting on the dog,putting one's nose in the air,scorn,scornfulness,self-importance,shamelessness,show,showiness,affected manners,airs,array,snobbery,snobbishness,superciliousness,swagger,vainglory,vanity,affected manners

As can be seen
affected manners
is repeated and so are quite a few other synonyms.
I had written a script which basically does the following:
places each synonym on a line by replacing the comma by a CR/LF
sorting the synonym set
replacing the sorted unique synonyms in the structure Headword=syn1,syn2 etc.
Although it works, it is expensive and time consuming considering that the number of synonym sets is around 100,00
A perl or awk script which does the job faster would be really appreciated. Please note that a given headword can admit upto 100 synonyms, each separated by a comma.
Many thanks for a faster solution.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

identifying duplicates line & reporting their line number

I need to find to find duplicate lines in a document and then print the line numbers of the duplicates The files contain multiple lines with about 100 numbers on each line I need something that will output the line numbers where duplicates were found ie 1=5=7, 2=34=76 Any suggestions would be... (5 Replies)
Discussion started by: stresslog
5 Replies

2. Shell Programming and Scripting

Shell Script for deleting the first line in a file

Hi, Could any one please post the shell script for deleting the first line in a file? (3 Replies)
Discussion started by: badrimohanty
3 Replies

3. Shell Programming and Scripting

Deleting a line from a flatfile using Shell Script

Hi All, Can Anyone please tell me,how can I delete a line from a file. I am reading the file line by line using whil loop and validating each line..Suppose in the middle i found a particular line is invalid,i need to delete that particular line. Can anyone please help. Thanks in advance,... (14 Replies)
Discussion started by: dinesh1985
14 Replies

4. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Hello, I have two files. File1 or the master file contains two columns separated by a delimiter: a=b b=d e=f g=h File 2 which is the file to be processed has only a single column a h c b What I need is an awk script to identify unique names from file 2 which are not found in the... (6 Replies)
Discussion started by: gimley
6 Replies

5. Shell Programming and Scripting

deleting dupes in a row

Hello, I have a large database in which name homonyms are arranged in a row. Since the database is large and generated by hand, very often dupes creep in. I want to remove the dupes either using an awk or perl script. An input is given below The expected output is given below: As can be... (2 Replies)
Discussion started by: gimley
2 Replies

6. UNIX for Dummies Questions & Answers

Deleting a pattern in UNIX without deleting the entire line

Hi I have a file: r58778.3|SOURCES={KEY=f665931a...,fw,221-705}|ERRORS={16_1:T,30_1:T,56_1:C,57_1:T,59_1:A,101_1:A,115:-,158_1:C,186_1:A,204:-,271_1:T,305:-,350_1:C,368_1:G,442_1:C,472_1:G,477_1:A}|SOURCE_1="Contig_1092402550638"(f665931a359e36cea0976db191ff60ff09cc816e) I want to retain... (15 Replies)
Discussion started by: Alyaa
15 Replies

7. Shell Programming and Scripting

Identifying dupes within a database and creating unique sub-sets

Hello, I have a database of name variants with the following structure: variant=variant=variant The number of variants can be as many as thirty to forty. Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus John=Johann=Jon and... (2 Replies)
Discussion started by: gimley
2 Replies

8. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Dear all, I have a large dictionary database which has the following structure source word=target word e.g. book=livre Since the database is very large in spite of all the care taken, it so happens that at times the source word is repeated e.g. book=livre book=tome Since I want to... (7 Replies)
Discussion started by: gimley
7 Replies

9. Shell Programming and Scripting

Modify script to remove dupes with two delimiters

Hello, I have a script which removes duplicates in a database with a single delimiter = The script is given below: # script to remove dupes from a row with structure word=word BEGIN{FS="="} {for(i=1;i<=NF;i++){a++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1 How do I modify... (6 Replies)
Discussion started by: gimley
6 Replies

10. Shell Programming and Scripting

sed command within script wrongly deleting the last line

Hi, I have a shell script which has a for loop that scans list of files and do find and replace few variables using sed command. While doing this, it deletes the last line of all input file which is something wrong. how to fix this. please suggest. When i add an empty line in all my input file,... (5 Replies)
Discussion started by: rbalaj16
5 Replies
Bio::Annotation::OntologyTerm(3pm)			User Contributed Perl Documentation			Bio::Annotation::OntologyTerm(3pm)

NAME
Bio::Annotation::OntologyTerm - An ontology term adapted to AnnotationI SYNOPSIS
use Bio::Annotation::OntologyTerm; use Bio::Annotation::Collection; use Bio::Ontology::Term; my $coll = Bio::Annotation::Collection->new(); # this also implements a tag/value pair, where tag _and_ value are treated # as ontology terms my $annterm = Bio::Annotation::OntologyTerm->new(-label => 'ABC1', -tagname => 'Gene Name'); # ontology terms can be added directly - they implicitly have a tag $coll->add_Annotation($annterm); # implementation is by composition - you can get/set the term object # e.g. my $term = $annterm->term(); # term is-a Bio::Ontology::TermI print "ontology term ",$term->name()," (ID ",$term->identifier(), "), ontology ",$term->ontology()->name()," "; $term = Bio::Ontology::Term->new(-name => 'ABC2', -ontology => 'Gene Name'); $annterm->term($term); DESCRIPTION
Ontology term annotation object FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Hilmar Lapp Email hlapp at gmx.net APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $sv = Bio::Annotation::OntologyTerm->new(); Function: Instantiate a new OntologyTerm object Returns : Bio::Annotation::OntologyTerm object Args : -term => $term to initialize the term data field [optional] Most named arguments that Bio::Ontology::Term accepts will work here too. -label is a synonym for -name, -tagname is a synonym for -ontology. AnnotationI implementing functions as_text Title : as_text Usage : my $text = $obj->as_text Function: Returns a textual representation of the annotation that this object holds. Presently, it is tag name, name of the term, and the is_obsolete attribute concatenated togather with a delimiter (|). Returns : string Args : none display_text Title : display_text Usage : my $str = $ann->display_text(); Function: returns a string. Unlike as_text(), this method returns a string formatted as would be expected for te specific implementation. One can pass a callback as an argument which allows custom text generation; the callback is passed the current instance and any text returned Example : Returns : a string Args : [optional] callback hash_tree Title : hash_tree Usage : my $hashtree = $value->hash_tree Function: For supporting the AnnotationI interface just returns the value as a hashref with the key 'value' pointing to the value Returns : hashrf Args : none tagname Title : tagname Usage : $obj->tagname($newval) Function: Get/set the tagname for this annotation value. Setting this is optional. If set, it obviates the need to provide a tag to AnnotationCollection when adding this object. This is aliased to ontology() here. Example : Returns : value of tagname (a scalar) Args : new value (a scalar, optional) Methods for Bio::Ontology::TermI compliance term Title : term Usage : $obj->term($newval) Function: Get/set the Bio::Ontology::TermI implementing object. We implement TermI by composition, and this method sets/gets the object we delegate to. Example : Returns : value of term (a Bio::Ontology::TermI compliant object) Args : new value (a Bio::Ontology::TermI compliant object, optional) identifier Title : identifier Usage : $term->identifier( "0003947" ); or print $term->identifier(); Function: Set/get for the identifier of this Term. Returns : The identifier [scalar]. Args : The identifier [scalar] (optional). name Title : name Usage : $term->name( "N-acetylgalactosaminyltransferase" ); or print $term->name(); Function: Set/get for the name of this Term. Returns : The name [scalar]. Args : The name [scalar] (optional). definition Title : definition Usage : $term->definition( "Catalysis of ..." ); or print $term->definition(); Function: Set/get for the definition of this Term. Returns : The definition [scalar]. Args : The definition [scalar] (optional). ontology Title : ontology Usage : $term->ontology( $top ); or $top = $term->ontology(); Function: Set/get for a relationship between this Term and another Term (e.g. the top level of the ontology). Returns : The ontology of this Term [TermI]. Args : The ontology of this Term [TermI or scalar -- which becomes the name of the catagory term] (optional). is_obsolete Title : is_obsolete Usage : $term->is_obsolete( 1 ); or if ( $term->is_obsolete() ) Function: Set/get for the obsoleteness of this Term. Returns : the obsoleteness [0 or 1]. Args : the obsoleteness [0 or 1] (optional). comment Title : comment Usage : $term->comment( "Consider the term ..." ); or print $term->comment(); Function: Set/get for an arbitrary comment about this Term. Returns : A comment. Args : A comment (optional). get_synonyms Title : get_synonyms() Usage : @aliases = $term->get_synonyms(); Function: Returns a list of aliases of this Term. Returns : A list of aliases [array of [scalar]]. Args : add_synonym Title : add_synonym Usage : $term->add_synonym( @asynonyms ); or $term->add_synonym( $synonym ); Function: Pushes one or more synonyms into the list of synonyms. Returns : Args : One synonym [scalar] or a list of synonyms [array of [scalar]]. remove_synonyms Title : remove_synonyms() Usage : $term->remove_synonyms(); Function: Deletes (and returns) the synonyms of this Term. Returns : A list of synonyms [array of [scalar]]. Args : get_dblinks Title : get_dblinks() Usage : @ds = $term->get_dblinks(); Function: Returns a list of each dblinks of this GO term. Returns : A list of dblinks [array of [scalars]]. Args : Note : this is deprecated in favor of get_dbxrefs(), which works with strings or L<Bio::Annotation::DBLink> instances get_dbxrefs Title : get_dbxrefs() Usage : @ds = $term->get_dbxrefs(); Function: Returns a list of each dblinks of this GO term. Returns : A list of dblinks [array of [scalars] or Bio::Annotation::DBLink instances]. Args : add_dblink Title : add_dblink Usage : $term->add_dblink( @dbls ); or $term->add_dblink( $dbl ); Function: Pushes one or more dblinks into the list of dblinks. Returns : Args : One dblink [scalar] or a list of dblinks [array of [scalars]]. Note : this is deprecated in favor of add_dbxref(), which works with strings or L<Bio::Annotation::DBLink> instances add_dbxref Title : add_dbxref Usage : $term->add_dbxref( @dbls ); or $term->add_dbxref( $dbl ); Function: Pushes one or more dblinks into the list of dblinks. Returns : Args : remove_dblinks Title : remove_dblinks() Usage : $term->remove_dblinks(); Function: Deletes (and returns) the definition references of this GO term. Returns : A list of definition references [array of [scalars]]. Args : Note : this is deprecated in favor of remove_dbxrefs(), which works with strings or L<Bio::Annotation::DBLink> instances remove_dbxrefs Title : remove_dbxrefs() Usage : $term->remove_dbxrefs(); Function: Deletes (and returns) the definition references of this GO term. Returns : A list of definition references [array of [scalars]]. Args : get_secondary_ids Title : get_secondary_ids Usage : @ids = $term->get_secondary_ids(); Function: Returns a list of secondary identifiers of this Term. Secondary identifiers mostly originate from merging terms, or possibly also from splitting terms. Returns : A list of secondary identifiers [array of [scalar]] Args : add_secondary_id Title : add_secondary_id Usage : $term->add_secondary_id( @ids ); or $term->add_secondary_id( $id ); Function: Adds one or more secondary identifiers to this term. Returns : Args : One or more secondary identifiers [scalars] remove_secondary_ids Title : remove_secondary_ids Usage : $term->remove_secondary_ids(); Function: Deletes (and returns) the secondary identifiers of this Term. Returns : The previous list of secondary identifiers [array of [scalars]] Args : perl v5.14.2 2012-03-02 Bio::Annotation::OntologyTerm(3pm)
All times are GMT -4. The time now is 03:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy