...
...
I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any,
...
...
If Perl is an option, then you could do something like this for a descriptive answer:
From the output above, if you wanted to see only the key (1st column in your data file) and nothing else, then maybe something like this could work:
If you post the output you want to see, then it should be helpful.
Last edited by durden_tyler; 07-06-2017 at 02:54 PM..
Hello,
My text file has input of the form
abc dft45.xml
ert rt653.xml
abc ert57.xml
I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form...
abc dft45.xml
abc ert57.xml
Can some one help me plz? (5 Replies)
Given a file such as this I need to remove the duplicates.
00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt
00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt
0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt
0624-01 RUT CORPORATION ... (13 Replies)
Hi,
I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column.
clientname,day-of-month,max-users
-----------------------------------
client1,20120610,5
client2,20120610,2
client3,20120610,7... (3 Replies)
Hi,
I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines.
My input file:
comp100002 aaa bbb ccc ddd eee fff ggg
comp100003 aba aba aba aba aba aba aba
comp100003 fff fff fff fff fff fff fff... (5 Replies)
Hi, I have a file with +/- 13000 lines and 4 column. I need to search the 3rd column for a word that begins with "SAP-" and move/skip it to the next column (4th). Because the 3rd column need to stay empty.
Thanks in advance.:)
89653 36891 OTR-60 SAP-2
89653 36892 OTR-10 SAP-2... (2 Replies)
input
"A","B","C,D","E","F"
"S","T","U,V","W","X"
"AA","BB","CC,DD","EEEE","FFF"
required output:
"A","B","C,D","C,D","F"
"S", T","U,V","U,V","X"
"AA","BB","CC,DD","CC,DD","FFF"
tried using awk but double quotes not preserving for every field. any help to solve this is much... (5 Replies)
Hi Experts,
Please bear with me, i need help
I am learning AWk and stuck up in one issue.
First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique.
Second point : For... (1 Reply)
Hello Team,
My source data (INput) is like below
EPIC1 router EPIC2 Targetdefinition
Exp1 Expres rtr1 Router
SQL SrcQual Exp1 Expres
rtr1 Router EPIC1 Targetdefinition
My output like
SQL SrcQual Exp1 Expres
Exp1 Expres rtr1 Router
rtr1 Router EPIC1 Targetdefinition... (5 Replies)
Discussion started by: sekhar.lsb
5 Replies
LEARN ABOUT DEBIAN
swiss::gns
SWISS::GNs(3pm) User Contributed Perl Documentation SWISS::GNs(3pm)Name
SWISS::GNs.pm
Description
SWISS::GNs represents the GN lines within an SWISS-PROT + TrEMBL entry as specified in the user manual
http://www.expasy.org/sprot/userman.html . The GNs object is a container object which holds a list of SWISS::GeneGroup objects.
Inherits from
SWISS::ListBase.pm
Attributes
"list"
Each list element is a SWISS::GeneGroup object.
"and" (deprecated, for old format only)
Delimiter used between genes. Defaults to " AND ".
"or" (deprecated, for old format only)
Delimiter used between gene names. Defaults to " OR ".
Methods
Standard methods
new
fromText
toText
Reading/Writing methods
text [($newText)]
Sets the text of the GN line to the parameter if it is present, and returns the (unwrapped) text of the line. Also sets 'and' and 'or'
delimiters to the first occurrences of the words "OR" and "AND" in the line, conserving the case.
lowercase (deprecated, for old format only)
Sets the GNs::and and GNs::or delimiters to their lower case values.
uppercase (deprecated, for old format only)
Sets the GNs::and and GNs::or delimiters to their upper case values.
getFirst()
Returns first gene name in gene line
getTags($target)
Returns evidence tags associated with $target
$target is a string
isPresent($target)
Returns 1 if $target is present in the GN line
$target is a string
needsReCasing($target)
If $target is present in the GN line, but wrongly cased, method returns the matching name in its current case
$target is a string
replace($newName, $target, $evidenceTag)
Replaces the first GN object in the GN line whose text attribute is $target with a new GN object whose text attribute is set to
$newName and whose evidenceTags attribute is is set using values set by splitting $evidenceTag on /, / (as name is not being changed,
programs should keep old tag and add new tag). Does nothing if $target is not found.
delete($target)
Removes synonym/single-member gene group matching $target. Note that if a "Name" is deleted, the first "Synonym" will be promoted to
"Name"
addAsNewSynonym($newName, $target, $evidenceTag, $location)
Adds a new GN object (with text attribute set to new $newName, and evidenceTags attribute set to ($evidenceTag)), as a synonym to the
first gene group in which $target is a gene name. Does nothing if $target is not found. Will not add a duplicate gene name.
$location determines where in gene group new object is added: if $location == 1, 2, 3, ..., new object added in the 1st, 2nd, 3rd, ...
position; if $location == 0, new object added before $target; if $location == -1, new object added after $target (default); if
$location == -2, new object added at end of gene group. Note that if the new synonym is inserted in the first postion, it will become
the "Name" and the previous "Name" will be downgraded to first "Synonym"
addAsNewGeneGroup($newName, $target, $evidenceTag, $location)
Adds a new GeneGroup object, comprising 1 GN object (with text attribute set to new $newName, and evidenceTags attribute set to
($evidenceTag)). Will not add a duplicate gene name. $location and $target determine where in GNs line new group is added: if
$location == 1, 2, 3, ..., new object added in the 1st, 2nd, 3rd, ... position; if $location == 0, new object added before $target; if
$location == -1, new object added after $target (default); if $location == -2, new object added at end of GNs line. Does nothing if
$target is not found, and $location == 0 or -1; otherwise $target does not need to be set.
replaceGeneGroup($newGeneGroup, $target)
Replaces the first gene group containing $target with $newGeneGroup. Creating the $newGeneGroup correctly is the user's responsibility
getGeneGroup($target)
Returns the first gene group that contains $target
setToOr()
Retruns a new GNs object, but with all GNs objects in a single gene group. Needed when adding 'C' to 'A and B', when the relationship
of 'C' to 'A' and 'B' is unknown: the universal use of ' or ' is the default delimeter for TrEMBL entries
TRANSITION
The format of the GN line will change in 2004 from:
GN (CYSA1 OR CYSA OR RV3117 OR MT3199 OR MTCY164.27) AND (CYSA2 OR
GN RV0815C OR MT0837 OR MTV043.07C).
to:
GN Name=CysA1; Synonyms=CysA; OrderedLocusNames=Rv3117, MT3199;
GN ORFNames=MtCY164.27;
GN and
GN Name=CysA2; OrderedLocusNames=Rv0815c, MT0837; ORFNames=MTV043.07c;
This module supports both formats. To convert an entry from the old to the new format, do:
$entry->GNs->is_old_format(0);
perl v5.10.1 2006-01-26 SWISS::GNs(3pm)