awk to match keyword and return matches and unique fields
Trying to use awk to find a keyword and return the matches in the row, but also $1 and $2, which are the unique id's, but they only appear once. Thank you .
file
Code:
name 31 Index Chromosomal Position Gene Inheritance
122 2106725 TSC2 AD
124 2115481 TSC2 AD
121 2105400 TSC2 AD
82 135782221 TSC1 AD
81 135782026 TSC1 AD
126 2138218 TSC2 AD
123 2113107 TSC2 AD
125 2126142 TSC2 AD
name2 12 Index Chromosomal Position Gene Inheritance
1 43396568 SLC2A1 AD, AR
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC2 AD
181 2110617 TSC2 AD
190 2137857 TSC2 AD
189 2137806 TSC2 AD
186 2133798 TSC2 AD
187 2135074 TSC2 AD
180 2105400 TSC2 AD
183 2122822 TSC2 AD
192 2138218 TSC2 AD
185 2125937 TSC2 AD
184 2125788 TSC2 AD
193 2138269 TSC2 AD
182 2112981 TSC2 AD
Desired output
Code:
name 31 Index Chromosomal Position Gene Inheritance
82 135782221 TSC1 AD
81 135782026 TSC1 AD
name3 20 Index Chromosomal Position Gene Inheritance
188 2135240 TSC1 AD
179 2103379 TSC1 AD
191 2137899 TSC1 AD
Hi.
I have a tab separated file that has a couple nearly identical lines. When doing:
sort file | uniq > file.new
It passes through the nearly identical lines because, well, they still are unique.
a)
I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Hi All,
I need to get the count of records in the file, if the passing parameter matches with the list of records in the file. Below is my example
source file: Test1.dat
20120913
20120913
20120912
20120912
20120912
20120912
20120912
20120913
20120913
20120912
In my script I am... (5 Replies)
Hi all I have a need of searching some pattern in file by month and then count unique records
D11
G11
R11 -------> Pattern available in file
S11
Jan$1 to $5 column contains some records in which I want to find unique
for this purpose I have written script like below
awk '/Jan/ ||... (4 Replies)
Trying to combine the matching $5 values between file1 and file2. If a match is found then the last $6 value in the match and the sum of $7 are outputted to a new file. The awk below I hope is a good start. Thank you :).
file1
chr12 9221325 9221448 chr12:9221325-9221448 A2M 1... (5 Replies)
Trying to get the unique count of the below input, but if the text in beginning of $5 is a partial match to another line in the file then it is not unique.
awk
awk '!seen++ {n++} END {print n}' input
7 input
chr1 159174749 159174770 chr1:159174749-159174770 ACKR1
chr1 ... (2 Replies)
Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases... (4 Replies)
Hi there,
I have data with similar structure as this:
CHR START-SNP END-SNP REF ALT PATIENT1 PATIENT2 PATIENT3 PATIENT4
chr1 69511 69511 A G homo hetero homo hetero
chr2 69513 69513 T C . hetero homo hetero
chr3 69814 69814 G C . . homo homo
chr4 69815 69815 C A hetero . . hetero
is... (10 Replies)
In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz.
It works fine, but when I apply it to large... (6 Replies)
I have a text file with many thousands of lines, a small sample of which looks like this:
InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0
PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
Discussion started by: jvoot
5 Replies
LEARN ABOUT DEBIAN
bio::liveseq::gene
Bio::LiveSeq::Gene(3pm) User Contributed Perl Documentation Bio::LiveSeq::Gene(3pm)NAME
Bio::LiveSeq::Gene - Range abstract class for LiveSeq
SYNOPSIS
# documentation needed
DESCRIPTION
This is used as storage for all object references concerning a particular gene.
AUTHOR - Joseph A.L. Insana
Email: Insana@ebi.ac.uk, jinsana@gmx.net
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
new
Title : new
Usage : $gene = Bio::LiveSeq::Gene->new(-name => "name",
-features => $hashref
-upbound => $min
-downbound => $max);
Function: generates a new Bio::LiveSeq::Gene
Returns : reference to a new object of class Gene
Errorcode -1
Args : one string and one hashreference containing all features defined
for the Gene and the references to the LiveSeq objects for those
features.
Two labels for defining boundaries of the gene. Usually the
boundaries will reflect max span of transcript, exon... features,
while the DNA sequence will be created with some flanking regions
(e.g. with the EMBL_SRS::gene2liveseq routine).
If these two labels are not given, they will default to the start
and end of the DNA object.
Note : the format of the hash has to be like
DNA => reference to LiveSeq::DNA object
Transcripts => reference to array of transcripts objrefs
Transclations => reference to array of transcripts objrefs
Exons => ....
Introns => ....
Prim_Transcripts => ....
Repeat_Units => ....
Repeat_Regions => ....
Only DNA and Transcripts are mandatory
verbose
Title : verbose
Usage : $self->verbose(0)
Function: Sets verbose level for how ->warn behaves
-1 = silent: no warning
0 = reduced: minimal warnings
1 = default: all warnings
2 = extended: all warnings + stack trace dump
3 = paranoid: a warning becomes a throw and the program dies
Note: a quick way to set all LiveSeq objects at the same verbosity
level is to change the DNA level object, since they all look to
that one if their verbosity_level attribute is not set.
But the method offers fine tuning possibility by changing the
verbose level of each object in a different way.
So for example, after $loader= and $gene= have been retrieved
by a program, the command $gene->verbose(0); would
set the default verbosity level to 0 for all objects.
Returns : the current verbosity level
Args : -1,0,1,2 or 3
perl v5.14.2 2012-03-02 Bio::LiveSeq::Gene(3pm)