Snps annotation

01-28-2017

Registered User

11, 0

Join Date: Jan 2017

Last Activity: 29 January 2017, 1:48 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Snps annotation

I have the following Snps data

Code:

 CHROM	POS	ID
chr7	78599583	rs987435
chr15	33395779	rs987436
chr1	189807684	rs987437
chr20	33907909	rs987438
chr12	75664046	rs987439

and the following gene data

Code:

genename    name    chrom   strand  txstart txend
MAGI2	NM_001301128	chr7	-	77484309	78920826	77486567	78920572	21
MRPS18C	NM_001297769	chr4	+	84596108	84601900	84596254	84601206	5
MRPS18C	NM_001297767	chr4	+	84596108	84601900	84596254	84601374	5
GCOM1	NM_001018091	chr15	+	55671393	55797047	55671524	55794059	13
FAHD1	NM_001018104	chr16	+	1817225	1830204	1817231	1828128	3
FAM134B	NM_001034850	chr5	-	16526146	16670167	16527849	16670080	9

I'm trying to find the snp whose postion (POS) lie between the txstart and txend in the gene data
so for this data I have the first SNP will lie between MAGI2 txstart and txend

the out put should be

Code:

gene      ID        CHROM  POS       txstart      txend
MAGI2 rs987435 chr7 78599583 77484309 78920826

I have tried the following but it didn't work

Code:

awk '
FNR==1 {next}
FNR == NR {
        POS[++k]=$1
        ID[k]=$2
        next
}
{       for(i = 1; i <= k; i++)
                if(POS[i]>=$5 && POS[i]<=$6)
                        print $1, ID[i], $3, POS[i], $5, $6
}' pre_snpinfo_tumor.txt  refFlat.txt

snps data= pre_snoinfo_tumor.txt
gene dat=refFlat.txt

Last edited by marwah; 01-29-2017 at 05:10 AM..

marwah

View Public Profile for marwah

Find all posts by marwah

01-29-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You might have mentioned that the code you quoted above came from a different thread (Map snps into a ref gene file) in which your first input file had two fields (with headings POS and ID) instead of the three fields (with headings CHROM, POS, and ID) that you are showing in this thread.

Did you think that I wouldn't recognize code I suggested for you a week ago if you reposted it in a different thread without attribution?

Why don't you try to modify that code to print the header line you now say that you want and to align the fields captured from the first input file with where that data is located in your new specification for that input file, instead of just saying that code that you implied you wrote doesn't work?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-29-2017

Registered User

11, 0

Join Date: Jan 2017

Last Activity: 29 January 2017, 1:48 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

you are so hard on me , by the way i'm not trying to confuse you or test your memory, I don't know how to modeify the code , If you can help please do so

marwah

View Public Profile for marwah

Find all posts by marwah

01-29-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by marwah

you are so hard on me , by the way i'm not trying to confuse you or test your memory, I don't know how to modeify the code , If you can help please do so

I'm not trying to be hard on you. This site is here to help you and many, many others learn how to write code to do what you want or need to do. It is not here to act as your unpaid programming staff.

I didn't mean to imply that you were trying to confuse me or test me. I meant to say that copying someone's code without attribution and implying it is your own is plagiarism.

If you don't understand how that code works and what needs to be done to it to make it work with a different input file format, why don't you look at the awk man page on your system and try to figure out what the code is doing. If you can't figure it out, ask questions and the volunteers here will be happy to try to help you learn how it works and suggest what might need to be changed to work with your new file format.

Someone might also show you how to add a BEGIN clause to your awk script to print the heading you want in your output file. Maybe even something like:

Code:

awk '
BEGIN {	print "gene      ID        CHROM  POS       txstart      txend"
}
FNR == 1 {
	next
}
FNR == NR {
	POS[++k] = $2
	ID[k] = $3
	next
}
{	for(i = 1; i <= k; i++)
		if(POS[i] >= $5 && POS[i] <= $6)
			print $1, ID[i], $3, POS[i], $5, $6
}' pre_snpinfo_tumor.txt  refFlat.txt

As always, since you haven't bothered to tell us what operating system you're using, if you're using a Solaris/SunOS system change awk to /usr/xpg4/bin/awk or nawk.

With the sample input files you provided in post #1 in this thread, the above code produces the output:

Code:

gene      ID        CHROM  POS       txstart      txend
MAGI2 rs987435 chr7 78599583 77484309 78920826

exactly as requested.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-29-2017

Registered User

11, 0

Join Date: Jan 2017

Last Activity: 29 January 2017, 1:48 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

I'm sorry I wasn't trying to say that this is my code, I just forgot to write it is not mine, I'm using terminal in mac

---------- Post updated at 11:46 PM ---------- Previous update was at 11:36 PM ----------

is there an error in the code because I didn't get the correct result

---------- Post updated at 11:54 PM ---------- Previous update was at 11:46 PM ----------

Code:

 after for(i = 1; i <= k; i++) it lists all my files in the directory

awk: cmd. line:12: >= $5 && POS[i] <= $6)
awk: cmd. line:12: ^ syntax error
awk: cmd. line:12: >= $5 && POS[i] <= $6)
awk: cmd. line:12:                      ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:     ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:            ^ syntax error
awk: cmd. line:13: t $1, ID[i], $3, POS[i], $5, $6
awk: cmd. line:13:                        ^ syntax error

error I got

Last edited by marwah; 01-29-2017 at 02:00 AM..

marwah

View Public Profile for marwah

Find all posts by marwah

01-29-2017

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

There is absolutely nothing wrong with the code I suggested in post #4 in this thread. When run on macOS Sierra release 10.12.3 using the shells bash, ksh, sh, and zsh it produces exactly the output requested in post #1 in this thread as shown in post #4 in this thread.

One might guess that you either made a mistake when copying the code, you used an editor that converted the UNIX line terminators required in a shell script into DOS or old MAC line separators, or used an editor that converted some of the characters that are part of the syntax of awk statements into "pretty-printed" versions of those characters that are not recognized by awk.

If you are absolutely positive that you didn't make any mistakes when copying the code, please:

tell us exactly what release of OS X or macOS you're using,
tell us what shell you're using,
show us the exact command that you used to invoke that awk script, and
show us the output you get from the command:
Code:
```
od -bc < filename
```
(where filename is the name of the file containing your awk script).

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

01-29-2017

Registered User

11, 0

Join Date: Jan 2017

Last Activity: 29 January 2017, 1:48 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

1) mac OS 10.12.2
2) shell is /bin/bash

can you explain 3,4 what exactly do you want

I copied and paste the code in the terminal that is it

by the way I'm working under a server from my University using SSH so the data are in that server
and I type unix code to get my result, put in this case it gave me errors

Last edited by marwah; 01-29-2017 at 03:00 AM..

marwah

View Public Profile for marwah

Find all posts by marwah

UNIX for Beginners Questions & Answers

Snps annotation

2 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Map snps into a ref gene file

Discussion started by: marwah

2. UNIX for Dummies Questions & Answers

How to average value if they have the same annotation names?

Discussion started by: yuejian