Awk script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk script
# 8  
Old 02-20-2011
No change, Still fine.
Code:
$ cat infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=PAC:18235173;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=PAC:18235173;PACid=18235173

$ awk -F"[=;]" 'NR==1{s=$2 ".1";print;FS=";";OFS=";";next}{sub(/=.*/,"="s,$1)}1' infile
scaffold_1 phytozome6 gene 12632 13612 . + . ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1 phytozome6 mRNA 12632 13612 . + . ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1 phytozome6 5'-UTR 12632 12638 . + . Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12639 12650 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 12768 12891 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13117 13226 . + 2 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 CDS 13310 13384 . + 0 Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1 phytozome6 3'-UTR 13385 13612 . + . Parent=POPTR_0001s00200.1;PACid=18235173

Is your awk support -F"[=;]"
# 9  
Old 02-20-2011
Yes It supports FS,Thank you for your help but It I want unique sub sets depend on the POPTR_ for example,those are gene names
Code:
scaffold_997    phytozome6      gene    1687    2351    .       -       .       ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997    phytozome6      mRNA    1687    2351    .       -       .       ID=PAC:18226942;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997    phytozome6      CDS     2240    2317    .       -       0       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      5'-UTR  2318    2351    .       -       .       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      CDS     2078    2111    .       -       0       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      3'-UTR  1687    1866    .       -       .       Parent=PAC:18226942;PACid=18226942
scaffold_997    phytozome6      CDS     1867    1997    .       -       2       Parent=PAC:18226942;PACid=18226942

changed to
Code:
scaffold_997    phytozome6      gene    1687    2351    .       -       .       ID=POPTR_0997s00200;Name=POPTR_0997s00200
scaffold_997    phytozome6      mRNA    1687    2351    .       -       .       ID=POPTR_0997s00200.1;Name=POPTR_0997s00200.1;PACid=18226942;Parent=POPTR_0997s00200
scaffold_997    phytozome6      CDS     2240    2317    .       -       0       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      5'-UTR  2318    2351    .       -       .       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      CDS     2078    2111    .       -       0       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      3'-UTR  1687    1866    .       -       .       Parent=POPTR_0997s00200.1;PACid=18226942
scaffold_997    phytozome6      CDS     1867    1997    .       -       2       Parent=POPTR_0997s00200.1;PACid=18226942

Do you think you can adjust your function? I really like your function,instead of following one
Code:
awk '{if(substr($9,1,9)=="ID=POPTR_") n=substr($9,4,16)".1"; gsub("ID=PAC:"substr($9,8,8),"ID="n);
  gsub("Parent=PAC:"substr($9,12,8),"Parent="n); print; }' input>output


Last edited by Scott; 02-20-2011 at 12:47 PM..
# 10  
Old 02-20-2011
I don't care what your code is doing, I write the code to export the output you want.

I try your latest input and output with my command, still perfect, I don't see any issue.

So you need point out the error what I need adjust.
# 11  
Old 02-21-2011
Here is the expected input and output,but your code not generate following output
Code:
sscaffold_1	phytozome6	gene	12632	13612	.	+	.	ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1	phytozome6	mRNA	12632	13612	.	+	.	ID=PAC:18235173;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1	phytozome6	5'-UTR	12632	12638	.	+	.	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	12639	12650	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	12768	12891	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	13117	13226	.	+	2	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	CDS	13310	13384	.	+	0	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	3'-UTR	13385	13612	.	+	.	Parent=PAC:18235173;PACid=18235173
scaffold_1	phytozome6	gene	19769	22804	.	+	.	ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1	phytozome6	mRNA	19769	22804	.	+	.	ID=PAC:18238552;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1	phytozome6	5'-UTR	19769	19827	.	+	.	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	CDS	19828	20136	.	+	0	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	CDS	22190	22516	.	+	0	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	3'-UTR	22517	22804	.	+	.	Parent=PAC:18238552;PACid=18238552
scaffold_1	phytozome6	gene	74076	75893	.	+	.	ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1	phytozome6	mRNA	74076	75893	.	+	.	ID=PAC:18237390;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1	phytozome6	CDS	74076	74235	.	+	0	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	CDS	74359	74634	.	+	2	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	CDS	75259	75893	.	+	2	Parent=PAC:18237390;PACid=18237390
scaffold_1	phytozome6	gene	80191	81289	.	-	.	ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1	phytozome6	mRNA	80191	81289	.	-	.	ID=PAC:18235601;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1	phytozome6	CDS	81161	81289	.	-	0	Parent=PAC:18235601;PACid=18235601
scaffold_1	phytozome6	CDS	80191	80385	.	-	0	Parent=PAC:18235601;PACid=18235601

desired output
Code:
scaffold_1	phytozome6	gene	12632	13612	.	+	.	ID=POPTR_0001s00200;Name=POPTR_0001s00200
scaffold_1	phytozome6	mRNA	12632	13612	.	+	.	ID=POPTR_0001s00200.1;Name=POPTR_0001s00200.1;PACid=18235173;Parent=POPTR_0001s00200
scaffold_1	phytozome6	5'-UTR	12632	12638	.	+	.	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	12639	12650	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	12768	12891	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	13117	13226	.	+	2	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	CDS	13310	13384	.	+	0	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	3'-UTR	13385	13612	.	+	.	Parent=POPTR_0001s00200.1;PACid=18235173
scaffold_1	phytozome6	gene	19769	22804	.	+	.	ID=POPTR_0001s00210;Name=POPTR_0001s00210
scaffold_1	phytozome6	mRNA	19769	22804	.	+	.	ID=POPTR_0001s00210.1;Name=POPTR_0001s00210.1;PACid=18238552;Parent=POPTR_0001s00210
scaffold_1	phytozome6	5'-UTR	19769	19827	.	+	.	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	CDS	19828	20136	.	+	0	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	CDS	22190	22516	.	+	0	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	3'-UTR	22517	22804	.	+	.	Parent=POPTR_0001s00210.1;PACid=18238552
scaffold_1	phytozome6	gene	74076	75893	.	+	.	ID=POPTR_0001s00220;Name=POPTR_0001s00220
scaffold_1	phytozome6	mRNA	74076	75893	.	+	.	ID=POPTR_0001s00220.1;Name=POPTR_0001s00220.1;PACid=18237390;Parent=POPTR_0001s00220
scaffold_1	phytozome6	CDS	74076	74235	.	+	0	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	CDS	74359	74634	.	+	2	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	CDS	75259	75893	.	+	2	Parent=POPTR_0001s00220.1;PACid=18237390
scaffold_1	phytozome6	gene	80191	81289	.	-	.	ID=POPTR_0001s00230;Name=POPTR_0001s00230
scaffold_1	phytozome6	mRNA	80191	81289	.	-	.	ID=POPTR_0001s00230.1;Name=POPTR_0001s00230.1;PACid=18235601;Parent=POPTR_0001s00230
scaffold_1	phytozome6	CDS	81161	81289	.	-	0	Parent=POPTR_0001s00230.1;PACid=18235601
scaffold_1	phytozome6	CDS	80191	80385	.	-	0	Parent=POPTR_0001s00230.1;PACid=18235601

I hope you can clearly see the issue now.Thank you

Last edited by shen; 02-21-2011 at 02:40 AM..
# 12  
Old 02-21-2011
Code:
awk -F \; '/gene/{split($(NF-1),a,"=");print;next}{sub(/=.*/,"="a[2]".1",$1)}1 ' infile

# 13  
Old 02-22-2011
Code:
awk -F"=|;" '/Name=/{sub($2,$4);v=$2}{$2=v}1' file


Last edited by yinyuemi; 02-22-2011 at 03:40 AM..
# 14  
Old 02-22-2011
Thanks rdcwayx you are really talented,I just want to remove spaces between ; because your output gave following
Code:
Parent=POPTR_0001s00230.1; PACid=18235601

---------- Post updated at 04:04 AM ---------- Previous update was at 04:01 AM ----------

Quote:
Originally Posted by yinyuemi
Code:
awk -F"=|;" '/Name=/{sub($2,$4);v=$2}{$2=v}1' file

Thank you yinyuemi your out put missing ;
Code:
sscaffold_1     phytozome6      gene    12632   13612   .       +       .       ID POPTR_0001s00200 Name POPTR_0001s00200
scaffold_1      phytozome6      mRNA    12632   13612   .       +       .       ID POPTR_0001s00200.1 Name POPTR_0001s00200.1 PACid 18235173 Parent POPTR_0001s00200
scaffold_1      phytozome6      5'-UTR  12632   12638   .       +       .       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     12639   12650   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     12768   12891   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     13117   13226   .       +       2       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      CDS     13310   13384   .       +       0       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      3'-UTR  13385   13612   .       +       .       Parent POPTR_0001s00200.1 PACid 18235173
scaffold_1      phytozome6      gene    19769   22804   .       +       .       ID POPTR_0001s00210 Name POPTR_0001s00210
scaffold_1      phytozome6      mRNA    19769   22804   .       +       .       ID POPTR_0001s00210.1 Name POPTR_0001s00210.1 PACid 18238552 Parent POPTR_0001s00210
scaffold_1      phytozome6      5'-UTR  19769   19827   .       +       .       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      CDS     19828   20136   .       +       0       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      CDS     22190   22516   .       +       0       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      3'-UTR  22517   22804   .       +       .       Parent POPTR_0001s00210.1 PACid 18238552
scaffold_1      phytozome6      gene    74076   75893   .       +       .       ID POPTR_0001s00220 Name POPTR_0001s00220
scaffold_1      phytozome6      mRNA    74076   75893   .       +       .       ID POPTR_0001s00220.1 Name POPTR_0001s00220.1 PACid 18237390 Parent POPTR_0001s00220
scaffold_1      phytozome6      CDS     74076   74235   .       +       0       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      CDS     74359   74634   .       +       2       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      CDS     75259   75893   .       +       2       Parent POPTR_0001s00220.1 PACid 18237390
scaffold_1      phytozome6      gene    80191   81289   .       -       .       ID POPTR_0001s00230 Name POPTR_0001s00230
scaffold_1      phytozome6      mRNA    80191   81289   .       -       .       ID POPTR_0001s00230.1 Name POPTR_0001s00230.1 PACid 18235601 Parent POPTR_0001s00230
scaffold_1      phytozome6      CDS     81161   81289   .       -       0       Parent POPTR_0001s00230.1 PACid 18235601
scaffold_1      phytozome6      CDS     80191   80385   .       -       0       Parent POPTR_0001s00230.1 PACid 18235601


Last edited by shen; 02-22-2011 at 05:10 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to call and sort awk script and output

I'm trying to create a shell script that takes a awk script that I wrote and a filename as an argument. I was able to get that done but I'm having trouble figuring out how to keep the header of the output at the top but sort the rest of the rows alphabetically. This is what I have now but it is... (1 Reply)
Discussion started by: Eric7giants
1 Replies

2. Shell Programming and Scripting

awk script to call another script based on second column entry

Hi I have a text file (Input.txt) with two column entries separated by tab as given below: aaa str1 bbb str2 cccccc str3 dddd str4 eee str3 ssss str2 sdf str3 hhh str1 fff str2 ccc str3 ..... ..... ..... (1 Reply)
Discussion started by: my_Perl
1 Replies

3. UNIX for Dummies Questions & Answers

Passing shell script parameter value to awk command in side the script

I have a shell script (.sh) and I want to pass a parameter value to the awk command but I am getting exception, please assist. diff=$1$2.diff id=$2 new=new_$diff echo "My id is $1" echo "I want to sync for user account $id" ##awk command I am using is as below cat $diff |... (1 Reply)
Discussion started by: Sarita Behera
1 Replies

4. Post Here to Contact Site Administrators and Moderators

Unable to pass shell script parameter value to awk command in side the same script

Variable I have in my shell script diff=$1$2.diff id=$2 new=new_$diff echo "My id is $1" echo "I want to sync for user account $id" ##awk command I am using is as below cat $diff | awk -F'~' ''$2 == "$id"' {print $0}' > $new I could see value of $id is not passing to the awk... (0 Replies)
Discussion started by: Ashunayak
0 Replies

5. Shell Programming and Scripting

Calling shell script within awk script throws error

I am getting the following error while passing parameter to a shell script called within awk script. Any idea what's causing this issue and how to ix it ? Thanks sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `./billdatecalc.sh ... (10 Replies)
Discussion started by: Sudhakar333
10 Replies

6. Shell Programming and Scripting

Passing awk variable argument to a script which is being called inside awk

consider the script below sh /opt/hqe/hqapi1-client-5.0.0/bin/hqapi.sh alert list --host=localhost --port=7443 --user=hqadmin --password=hqadmin --secure=true >/tmp/alerts.xml awk -F'' '{for(i=1;i<=NF;i++){ if($i=="Alert id") { if(id!="") if(dt!=""){ cmd="sh someScript.sh... (2 Replies)
Discussion started by: vivek d r
2 Replies

7. Shell Programming and Scripting

Help: How to convert this bash+awk script in awk script only?

This is the final first release of the dynamic menu generator for pekwm (WM). #!/bin/bash function param_val { awk "/^${1}=/{gsub(/^${1}="'/,""); print; exit}' $2 } echo "Dynamic {" for CF in `ls -c1 /usr/share/applications/*.desktop` do name=$(param_val Name $CF) ... (3 Replies)
Discussion started by: alexscript
3 Replies

8. Shell Programming and Scripting

Call shell script function from awk script

hi everyone i am trying to do this bash> cat abc.sh deepak() { echo Deepak } deepak bash>./abc.sh Deepak so it is giving me write simply i created a func and it worked now i modified it like this way bash> cat abc.sh (2 Replies)
Discussion started by: aishsimplesweet
2 Replies

9. Shell Programming and Scripting

want to pass parameters to awk script from shell script

Hello, I have this awk script that I want to execute by passing parameters through a shell script. I'm a little confused. This awk script removes duplicates from an input file. Ok, so I have a .sh file called rem_dups.sh #!/usr/bin/sh... (4 Replies)
Discussion started by: script_op2a
4 Replies

10. Shell Programming and Scripting

create a shell script that calls another script and and an awk script

Hi guys I have a shell script that executes sql statemets and sends the output to a file.the script takes in parameters executes sql and sends the result to an output file. #!/bin/sh echo " $2 $3 $4 $5 $6 $7 isql -w400 -U$2 -S$5 -P$3 << xxx use $4 go print"**Changes to the table... (0 Replies)
Discussion started by: magikminox
0 Replies
Login or Register to Ask a Question