Hello:
When I tried a perl-oneliner to re-format fasta file.
infile.fasta
outfile.fasta
which reminds me of an old post:
One step from what I need, but could not figure out the awk script easily.
Any help please? Thanks a lot!
Yes, except the space among the sequence row, which may be due to the space in my original file.
Could you explain the the two sub() and gsub() functions in the script? I understand these two functions, but not sure how they work in this script.
I am assuming: sub("\n","\t")is to replace the newline with tab? gsub("\n",""); remove all the newlines within each record? sub ("\t","\n"); then replace the tab back to newline? (What if there is a tab between the ">" and the DNA sequence?)
And I was very nervous about those space/tab chars in the header lines (i.e. lines with ">" char)------ There may be tab space in the header line.
Thanks!
He is setting RS to >, so awk reads in blocks delimited by > instead of blocks delimited by \n. This means the first "line", as far as awk is concerned, will look like this:
The first sub(), matches the first \n it finds, but no further, changing it to a tab. It does this so it can find it later (instead of removing it, like the rest.)
The gsub() matches all further newlines, deleting them:
The final sub() turns the tab back into a newline:
...then the program prints it, sticking RS -- which is > -- onto the front first.
This should work nicely if you are using GNU awk on Linux, but awk on other systems may have a record-size limitation of 1 or 2 kilobytes.
Thanks Corona688!
Just to confirm: awk works on RECORD individually if RS is specified, otherwise, by row, right?
While I was replying you submitted your answer. My concern is the tab may be embedded within the first line of each record. I was thinking the way to remember the first row as the first field ($1) and the rest as the the other ($2).
Thanks!
The below awk improved bu @MadeInGermany, works great as long as the input file has data in it in the below format:
input
chrX 25031028 25031925 chrX:25031028-25031925 ARX 631 18
chrX 25031028 25031925 chrX:25031028-25031925 ARX 632 14... (3 Replies)
I need to rearrange the output but i am unable to arrange it to match the format. In the output i need NAME=\"To in the column .
Bash:
#!/bin/bash
cd /cygdrive/c/output/a
cat *.txt > output.txt
i=/cygdrive/c/output/a/output.csv
#echo "NE_Name, Source, Destination, OSPF_AREA_ID"... (4 Replies)
Hi there. I need to reformat a large file. Here is a sample of the file.
NETIK0102_UCS_Boot_a,NETIK0102_UCS_Boot_b
5200 2438 70G
5200 2439 70G
NETIK0102_UCS_HBA0_a,NETIK0102_UCS_HBA1_b,NETIK0102_UCS_HBA2_a,NETIK0102_UCS_HBA3_b
2673 19D7 55G
2673 19C0 30G
2673 19F5 120G... (5 Replies)
I have this input and want output like below, how can I achieve that through awk:
Input:
CAT1 FRY-01
CAT1 FRY-04
CAT1 DRY-03
CAT1 FRY-02
CAT1 DRY-04
CAT2 FRY-03
CAT2 FRY-02
CAT2 DRY-01
FAT3 DRY-12
FAT3 FRY-06
Output:
category CAT1
item FRY-01 (7 Replies)
We have the following output:
server1_J00_data_20120711122243
server1_J00_igs_20120711122243
server1_J00_j2ee_20120711122243
server1_J00_sec_20120711122243
server1_J00_data_20120711131819
server1_J00_igs_20120711131819
server1_J00_j2ee_20120711131819
server2_J00_data_20120711122245... (10 Replies)
I am helping my wife set up a real estate site and I am starting to integrate MLS listings. We are using a HostGator level 5 VPS running CentOS and have full root and SSH access to the VPS.
Thus far I have automated the daily FTP download of listings from our MLS server using a little sh script.... (4 Replies)
I am definitely not an expert with awk, and I want to reformat a text file like the following. This is probably a very easy one for an expert out there. I would like to keep the lines in the same order, but move the heading to only be listed once above the lines.
This is what the text file... (7 Replies)
The command below is getting me the output I need.
awk -F"," ' {
if ($6 = 475) print "@@"$3 " " "0000" $10 "0" $1 "00000000" $8}' ${DIR1}${TMPFILE1} | sed -e 's/@@1//g' > ${DIR2}${TPRFILE}
Output:
900018732 00004961160200805160000000073719
Now I need to incorporate... (5 Replies)
I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far:
# This is an awk program to convert the dates from... (4 Replies)