Script to search and extract the gene sub-location from gff file.
Hi, my problem is that I have two files. File no. 1 is a gff text file (say gi1) that has gene information like :
********************
Code:
gene 39389788..39395643
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
mRNA join(39389788..39389839,39390696..39390861,
39391681..39391799,39393855..39394100,39394750..39394878,
39394997..39395162,39395375..39395643)
/gene="RPSA"
/product="ribosomal protein SA, transcript variant 1"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NM_002295.4"
/db_xref="GI:70609879"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
mRNA join(39390696..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395643)
/gene="RPSA"
/product="ribosomal protein SA, transcript variant 2"
/exception="unclassified transcription discrepancy"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NM_001012321.1"
/db_xref="GI:59859884"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
CDS join(39390729..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395469)
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/codon_start=1
/product="40S ribosomal protein SA"
/protein_id="NP_001012321.1"
/db_xref="GI:59859885"
/db_xref="CCDS:CCDS2686.1"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
CDS join(39390729..39390861,39391681..39391799,
39393855..39394100,39394750..39394878,39394997..39395162,
39395375..39395469)
/gene="RPSA"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/codon_start=1
/product="40S ribosomal protein SA"
/protein_id="NP_002286.2"
/db_xref="GI:9845502"
/db_xref="CCDS:CCDS2686.1"
/db_xref="GeneID:3921"
/db_xref="HGNC:6502"
/db_xref="HPRD:01038"
/db_xref="MIM:150370"
gene 39391466..39391614
/gene="SNORA6"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/db_xref="GeneID:574040"
/db_xref="HGNC:32591"
ncRNA 39391466..39391614
/gene="SNORA6"
/ncRNA_class="snoRNA"
/product="small nucleolar RNA, H/ACA box 6"
/note="Derived by automated computational analysis using
gene prediction method: BestRefseq."
/transcript_id="NR_002325.1"
/db_xref="GI:68510025"
/db_xref="GeneID:574040"
/db_xref="HGNC:32591"
gene 39394155..39394308
/gene="SNORA62"
/note="Derived by automated computational analysis using...
*****************************************
now, file no. 2 is a mapped txt file like:
*********************************
Code:
Gene_input_file: f3
sno_input_file: chr3
319 found_in_gene 52698648..52707224 at 52704105 and_count: 5457
68 found_in_gene 52698648..52707224 at 52705463 and_count: 6815
82 found_in_gene 52698648..52707224 at 52701967 and_count: 3319
124 found_in_gene 39793218..40244467 at 40222682 and_count: 429464
202 found_in_gene 9443305..10558922 at 10110734 and_count: 667429
228 found_in_gene 46262602..46896241 at 46629723 and_count: 367121
..and so on.
**************************************
so, I need to extract the region from file 2 say, 52698648..52707224 for id-319, which begins from position 52704105 in gff file. And then search it in a file 1, for the sub-location of this gene, say, whether its in cDNA, mRNA etc. If its not fount the output should be:
Code:
'319 not found Intron'
else, if its found, output should be
'
Code:
319 found_in mRNA.'
please help me with the shell scripting or perl (or both)..I am new to this linux world.
Good evening All,
I have a perl script to pull out all occurrences of a files beginning with xx and ending in .p. I will then loop through all 1K files in a directory. I can grep for xx*.p files but it gives me the entire line. I wish to output to a single colum with only the hits found. ... (3 Replies)
Create a script that copies files from one specified directory to another specified directory, in the order they were created in the original directory between specified times. Copy the files at a specified interval. (2 Replies)
Hi,
I am logging to a linux server through a user "user1" in /home directory.
There is a script in a directory in 'root' for which all permissions are available including the directory. This script when executed creates a file in the directory.
When the script is added to crontab, on... (1 Reply)
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
I have a file with
<suit:run date="Trump Tue 06/19/2012 11:41 AM EDT" machine="garg-ln" build="19921" level="beta" release="6.1.5" os="Linux">
Need to find word "build" then
extract build number, which is 19921 also
release number, which is 6.1.5 then
concatenate them to one variable as... (6 Replies)
I have hundreds of files to process. In each file
I need to look for a pattern then
extract value(s) from next line and then
search for value(s) selected from point (2) in the same file at a specific position.
HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V
TITLE CYTOCHROME... (7 Replies)
Hi
This is my third past and very impressed with previous post replies
Hoping the same for below query
How to find a existing file location and directory location in solaris box (1 Reply)
Hi,
I have a log file which is the output from a xml script :
<?xml version="1.0" ?>
<!DOCTYPE svc_result SYSTEM "MLP_SVC_RESULT_320.DTD">
<svc_result ver="3.2.0">
<slia ver="3.0.0">
<pos>
<msid type="MSISDN" enc="ASC">8093078040</msid>
<poserr>
... (4 Replies)
I have the following data set about the snps ID txt file
POS ID
78599583 rs987435
33395779 rs345783
189807684 rs955894
33907909 rs6088791
75664046 rs11180435
218890658 rs17571465
127630276 rs17011450
90919465 rs6919430
and a gene... (7 Replies)
I want to search a small string in a large string and find the locations of the string. For this I used grep "string" -ob <file name where the large string is stored>. Now this gives me the locations of that string. Now how do I store these locations in a text file.
Please use CODE tags as... (7 Replies)