awk reformat file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk reformat file
# 8  
Old 12-05-2013
Thanks Akshay!
Could you please explain your script a little bit more?
# 9  
Old 12-05-2013
Quote:
Originally Posted by yifangt
Thanks Akshay!
Could you please explain your script a little bit more?

/^>/ ? ---> check whether line begins with > if yes then following
NR == 1 ? ---> check whether it's first line of file if yes print line and Row separator$0--->line RS -->\n if it's not first line print row separator line and again row separatorRS--->\n $0--->line RS -->\n.
if line does not begin with > then remove space in line gsub(/[[:space:]]/,x) and print complete line $0 without any field separator which combines your n number of rows into one row.
END{printf RS} ---> at the end print "\n"
RS --> \n --> By default

Last edited by Akshay Hegde; 12-05-2013 at 03:37 PM..
This User Gave Thanks to Akshay Hegde For This Post:
# 10  
Old 12-05-2013
Yes, that space was in your input file. If you can't make sure there are no unwanted spaces in input, use a gsub like Akshay Hegde does.
If you are afraid of TAB chars in your first line, use another placeholder char like
Code:
awk 'BEGIN{RS=">"} NR>1 {sub("\n","\001"); gsub("\n",""); sub ("\001","\n"); print RS$0}' file

This User Gave Thanks to RudiC For This Post:
# 11  
Old 12-05-2013
The new version is less brain-twisting but with a bug, which is a number is always prefixed to the sequence lines:
Code:
>YAL069W-1.334 Putative promoter
1CCACACCACACCCACACACCCCACACCACACCCACACACC 0ACACCACACCCACACACACA1ACAGCCCTAATCTAACCCACAGCCCTAATCTAACCC 
>YAL068C-7235.2170 Putative ABC sequence
1TACGAGAATAATTTTACGAGAATAATTT 0ACGTAAATGAAGTT1TATATATAAATATATATAAA 
>gi|31044174|gb|AY143560.1| Tintinnopsis
0GAAACTGCGAATGGCTCATTAAAA0TAATTCTAGAGCTAATACATGCTG0AGCATCTGCTATTGTGGTGACTCATAGT
>gi|31044185|gb|AY143571.1|  
1ATTACCCAATCCTATTACCCAATCCT 0GGGCACCACCAG

Never used printf() combined with condition unary. I re-wrote it as:
Code:
awk '{ printf (/^>/ ? (NR == 1 ? $0 RS : RS $0 RS) : gsub(/[[:space:]]/,$0) $0) }' infile.fasta

The printf () and the x variable confused me very much.
# 12  
Old 12-05-2013
Use first one, Sorry I didn't notice it.

Code:
awk '/^\</{gsub(/[[:space:]]/,x)}{printf /^>/? NR == 1 ? $0 RS : RS $0 RS : $0}END{printf RS}'

gsub(/[[:space:]]/,x) --> if line contains space replace it with x since x is not set it's null so it just removes space you can use gsub(/[[:space:]]/,y)gsub(/[[:space:]]/,z) or even just gsub(/[[:space:]]/,"")

Last edited by Akshay Hegde; 12-05-2013 at 03:42 PM..
This User Gave Thanks to Akshay Hegde For This Post:
# 13  
Old 01-01-2014
Hello,

Hope this may help also for same.


Code:
awk '!/^>/ {f=f$1}  /^>/ {print f"\n"$0; f=""} END{print f}' file_name

Output will be as follows.

Code:
>YAL069W-1.334 Putative promoter
CCACACCACACCCACACACCACACCACACCCACACACACAACAGCCCTAATCTAACCC
>YAL068C-7235.2170 Putative ABC sequence
TACGAGAATAATTTACGTAAATGAAGTTTATATATAAA
>gi|31044174|gb|AY143560.1| Tintinnopsis
GAAACTGCGAATGGCTCATTAAAATAATTCTAGAGCTAATACATGCTGAGCATCTGCTATTGTGGTGACTCATAGT
>gi|31044185|gb|AY143571.1|
ATTACCCAATCCTGGGCACCACCAG


Where input file is:

Code:
>YAL069W-1.334 Putative promoter
CCACACCACACCCACACACC
ACACCACACCCACACACACA
ACAGCCCTAATCTAACCC
>YAL068C-7235.2170 Putative ABC sequence
TACGAGAATAATTT
ACGTAAATGAAGTT
TATATATAAA
>gi|31044174|gb|AY143560.1| Tintinnopsis
GAAACTGCGAATGGCTCATTAAAA
TAATTCTAGAGCTAATACATGCTG
AGCATCTGCTATTGTGGTGACTCATAGT
>gi|31044185|gb|AY143571.1|
ATTACCCAATCCT
GGGCACCACCAG


Thanks,
R. Singh

Last edited by RavinderSingh13; 01-01-2014 at 11:49 AM..
This User Gave Thanks to RavinderSingh13 For This Post:
# 14  
Old 01-01-2014
This is cool too!
Happy New Year!!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to reformat output if input file is empty, but not if file has data in it

The below awk improved bu @MadeInGermany, works great as long as the input file has data in it in the below format: input chrX 25031028 25031925 chrX:25031028-25031925 ARX 631 18 chrX 25031028 25031925 chrX:25031028-25031925 ARX 632 14... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk to reformat text file

Howdy. AWK beginner here. I need to reformat a text file in the following format: TTGS08-2014001 6018.00 143563.00 ... (2 Replies)
Discussion started by: c47v3770
2 Replies

3. Shell Programming and Scripting

Reformat awk output

I need to rearrange the output but i am unable to arrange it to match the format. In the output i need NAME=\"To in the column . Bash: #!/bin/bash cd /cygdrive/c/output/a cat *.txt > output.txt i=/cygdrive/c/output/a/output.csv #echo "NE_Name, Source, Destination, OSPF_AREA_ID"... (4 Replies)
Discussion started by: adgjmpt
4 Replies

4. Shell Programming and Scripting

Using awk to reformat file output

Hi there. I need to reformat a large file. Here is a sample of the file. NETIK0102_UCS_Boot_a,NETIK0102_UCS_Boot_b 5200 2438 70G 5200 2439 70G NETIK0102_UCS_HBA0_a,NETIK0102_UCS_HBA1_b,NETIK0102_UCS_HBA2_a,NETIK0102_UCS_HBA3_b 2673 19D7 55G 2673 19C0 30G 2673 19F5 120G... (5 Replies)
Discussion started by: kieranfoley
5 Replies

5. Shell Programming and Scripting

awk to reformat text

I have this input and want output like below, how can I achieve that through awk: Input: CAT1 FRY-01 CAT1 FRY-04 CAT1 DRY-03 CAT1 FRY-02 CAT1 DRY-04 CAT2 FRY-03 CAT2 FRY-02 CAT2 DRY-01 FAT3 DRY-12 FAT3 FRY-06 Output: category CAT1 item FRY-01 (7 Replies)
Discussion started by: aydj
7 Replies

6. Shell Programming and Scripting

need awk or sed help to reformat output

We have the following output: server1_J00_data_20120711122243 server1_J00_igs_20120711122243 server1_J00_j2ee_20120711122243 server1_J00_sec_20120711122243 server1_J00_data_20120711131819 server1_J00_igs_20120711131819 server1_J00_j2ee_20120711131819 server2_J00_data_20120711122245... (10 Replies)
Discussion started by: ux4me
10 Replies

7. Shell Programming and Scripting

Reformat MLS Data - Use AWK?

I am helping my wife set up a real estate site and I am starting to integrate MLS listings. We are using a HostGator level 5 VPS running CentOS and have full root and SSH access to the VPS. Thus far I have automated the daily FTP download of listings from our MLS server using a little sh script.... (4 Replies)
Discussion started by: Chicago_Realtor
4 Replies

8. Shell Programming and Scripting

awk to reformat a text file

I am definitely not an expert with awk, and I want to reformat a text file like the following. This is probably a very easy one for an expert out there. I would like to keep the lines in the same order, but move the heading to only be listed once above the lines. This is what the text file... (7 Replies)
Discussion started by: linux4life
7 Replies

9. Shell Programming and Scripting

reformat date, awk and sed

The command below is getting me the output I need. awk -F"," ' { if ($6 = 475) print "@@"$3 " " "0000" $10 "0" $1 "00000000" $8}' ${DIR1}${TMPFILE1} | sed -e 's/@@1//g' > ${DIR2}${TPRFILE} Output: 900018732 00004961160200805160000000073719 Now I need to incorporate... (5 Replies)
Discussion started by: mondrar
5 Replies

10. Shell Programming and Scripting

help reformat data with awk

I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far: # This is an awk program to convert the dates from... (4 Replies)
Discussion started by: climbak
4 Replies
Login or Register to Ask a Question