Snps annotation


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Snps annotation
# 1  
Old 01-28-2017
Snps annotation

I have the following Snps data
Code:
 CHROM	POS	ID
chr7	78599583	rs987435
chr15	33395779	rs987436
chr1	189807684	rs987437
chr20	33907909	rs987438
chr12	75664046	rs987439

and the following gene data
Code:
genename    name    chrom   strand  txstart txend
MAGI2	NM_001301128	chr7	-	77484309	78920826	77486567	78920572	21
MRPS18C	NM_001297769	chr4	+	84596108	84601900	84596254	84601206	5
MRPS18C	NM_001297767	chr4	+	84596108	84601900	84596254	84601374	5
GCOM1	NM_001018091	chr15	+	55671393	55797047	55671524	55794059	13
FAHD1	NM_001018104	chr16	+	1817225	1830204	1817231	1828128	3
FAM134B	NM_001034850	chr5	-	16526146	16670167	16527849	16670080	9

I'm trying to find the snp whose postion (POS) lie between the txstart and txend in the gene data
so for this data I have the first SNP will lie between MAGI2 txstart and txend

the out put should be
Code:
gene      ID        CHROM  POS       txstart      txend
MAGI2 rs987435 chr7 78599583 77484309 78920826

I have tried the following but it didn't work
Code:
awk '
FNR==1 {next}
FNR == NR {
        POS[++k]=$1
        ID[k]=$2
        next
}
{       for(i = 1; i <= k; i++)
                if(POS[i]>=$5 && POS[i]<=$6)
                        print $1, ID[i], $3, POS[i], $5, $6
}' pre_snpinfo_tumor.txt  refFlat.txt

snps data= pre_snoinfo_tumor.txt
gene dat=refFlat.txt

Last edited by marwah; 01-29-2017 at 05:10 AM..
# 2  
Old 01-29-2017
You might have mentioned that the code you quoted above came from a different thread (Map snps into a ref gene file) in which your first input file had two fields (with headings POS and ID) instead of the three fields (with headings CHROM, POS, and ID) that you are showing in this thread.

Did you think that I wouldn't recognize code I suggested for you a week ago if you reposted it in a different thread without attribution?

Why don't you try to modify that code to print the header line you now say that you want and to align the fields captured from the first input file with where that data is located in your new specification for that input file, instead of just saying that code that you implied you wrote doesn't work?
# 3  
Old 01-29-2017
you are so hard on me , by the way i'm not trying to confuse you or test your memory, I don't know how to modeify the code , If you can help please do so
# 4  
Old 01-29-2017
Quote:
Originally Posted by marwah
you are so hard on me , by the way i'm not trying to confuse you or test your memory, I don't know how to modeify the code , If you can help please do so
I'm not trying to be hard on you. This site is here to help you and many, many others learn how to write code to do what you want or need to do. It is not here to act as your unpaid programming staff.

I didn't mean to imply that you were trying to confuse me or test me. I meant to say that copying someone's code without attribution and implying it is your own is plagiarism.

If you don't understand how that code works and what needs to be done to it to make it work with a different input file format, why don't you look at the awk man page on your system and try to figure out what the code is doing. If you can't figure it out, ask questions and the volunteers here will be happy to try to help you learn how it works and suggest what might need to be changed to work with your new file format.

Someone might also show you how to add a BEGIN clause to your awk script to print the heading you want in your output file. Maybe even something like:
Code:
awk '
BEGIN {	print "gene      ID        CHROM  POS       txstart      txend"
}
FNR == 1 {
	next
}
FNR == NR {
	POS[++k] = $2
	ID[k] = $3
	next
}
{	for(i = 1; i <= k; i++)
		if(POS[i] >= $5 && POS[i] <= $6)
			print $1, ID[i], $3, POS[i], $5, $6
}' pre_snpinfo_tumor.txt  refFlat.txt

As always, since you haven't bothered to tell us what operating system you're using, if you're using a Solaris/SunOS system change awk to /usr/xpg4/bin/awk or nawk.

With the sample input files you provided in post #1 in this thread, the above code produces the output:
Code:
gene      ID        CHROM  POS       txstart      txend
MAGI2 rs987435 chr7 78599583 77484309 78920826

exactly as requested.
# 5  
Old 01-29-2017
I'm sorry I wasn't trying to say that this is my code, I just forgot to write it is not mine, I'm using terminal in mac

---------- Post updated at 11:46 PM ---------- Previous update was at 11:36 PM ----------

is there an error in the code because I didn't get the correct result

---------- Post updated at 11:54 PM ---------- Previous update was at 11:46 PM ----------

Code:
 after for(i = 1; i <= k; i++) it lists all my files in the directory

awk: cmd. line:12: >= $5 && POS[i] <= $6)
awk: cmd. line:12: ^ syntax error
awk: cmd. line:12: >= $5 && POS[i] <= $6)
awk: cmd. line:12:                      ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:     ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:            ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:                        ^ syntax error

error I got

Last edited by marwah; 01-29-2017 at 02:00 AM..
# 6  
Old 01-29-2017
There is absolutely nothing wrong with the code I suggested in post #4 in this thread. When run on macOS Sierra release 10.12.3 using the shells bash, ksh, sh, and zsh it produces exactly the output requested in post #1 in this thread as shown in post #4 in this thread.

One might guess that you either made a mistake when copying the code, you used an editor that converted the UNIX line terminators required in a shell script into DOS or old MAC line separators, or used an editor that converted some of the characters that are part of the syntax of awk statements into "pretty-printed" versions of those characters that are not recognized by awk.

If you are absolutely positive that you didn't make any mistakes when copying the code, please:
  1. tell us exactly what release of OS X or macOS you're using,
  2. tell us what shell you're using,
  3. show us the exact command that you used to invoke that awk script, and
  4. show us the output you get from the command:
    Code:
    od -bc < filename

    (where filename is the name of the file containing your awk script).
# 7  
Old 01-29-2017
1) mac OS 10.12.2
2) shell is /bin/bash

can you explain 3,4 what exactly do you want

I copied and paste the code in the terminal that is it

by the way I'm working under a server from my University using SSH so the data are in that server
and I type unix code to get my result, put in this case it gave me errors

Last edited by marwah; 01-29-2017 at 03:00 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

2 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Map snps into a ref gene file

I have the following data set about the snps ID txt file POS ID 78599583 rs987435 33395779 rs345783 189807684 rs955894 33907909 rs6088791 75664046 rs11180435 218890658 rs17571465 127630276 rs17011450 90919465 rs6919430 and a gene... (7 Replies)
Discussion started by: marwah
7 Replies

2. UNIX for Dummies Questions & Answers

How to average value if they have the same annotation names?

Hi I have a file like this input_fileCR387793 -0.8 CR387793 -5.5 CR387794 -5.3 CR387795 -0.9 AR388755 -3.0 AR388755 3.8 AR388755 4.5 Each line has annotation name and its correlated value. The annotation name and the value are seperated by a space. I want to average the value if the lines... (4 Replies)
Discussion started by: yuejian
4 Replies
Login or Register to Ask a Question