How to append two fasta files?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to append two fasta files?
# 8  
Old 06-13-2019
Scrutinizer:
Could you elaborate this part ?

Code:
FS=RS; RS=">";

I tried FS=RS=">"; but it did not work. What's the difference between the two: FS=RS; RS=">"; vs FS=RS=">";
# 9  
Old 06-13-2019
Hi yifangt,

The default value for RS is '\n'
So after FS=RS; RS=">" FS is '\n' and RS is '>'

If we use FS=RS=">", then they both become equal to '>'

S.
# 10  
Old 06-13-2019
Post #5 is interesting.
I found it can be enhanced, by a ^ anchor (match at the beginning only), and by inserting a ^ character that becomes an anchor for the following grep
Code:
tr $'\n>' $' \n' <file1|  grep -vf <(sed -n 's/^>/^/p' file2) | cat  <(tr $'\n>' $' \n' <file2) - |  tr  -s $' \n' $'\n>'

Post #7 did not work for me.? Did not further analyze it.
Here is an embedded multi-line awk code that works for me:
Code:
#!/bin/sh
awk '
{
  ishead=($1~/^>/)
}
FNR==NR {
  # file1: store everything in F[head]
  if (ishead) {
     head=$1
     sep=""
  } else {
     F[head]=(F[head] sep $1)
     sep=ORS
  }
  next
}
{
  # file2: print everything, delete corresponding F[head] from file1
  if (ishead) {
    delete F[$1]
  }
  print
}
END {
  # print the remaining F[head] from file1
  for (head in F) {
    print head
    print F[head]
  }
}
' file1 file2

I chose $1 not $0 because I found some trailing spaces in the given examples; spaces are stripped if $1 is used.
# 11  
Old 06-13-2019
Post #7 probably did not work for you because there are excess trailing spaces that need to be removed in the file samples, but that are not going to be present in the actual FASTA files, see the note underneath...

--
Here they are without the spaces:
File1:
Code:
>Contig_1:90600-91187
GACCGTCATCAATTCCTGTTCCTTGCCCTTGACGACCTCATCCACGTCCTTGATGGCCTT
>Contig_24:26615-28387
TTCGCCGCGCTCCAAACGGGCGATCTCCTCGGCGCGGGCCGCCAGGATCAGCGCCG
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG               
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGA

File2:
Code:
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGG
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGA
>Contig_24:26615-28387
GCTGCGGCGCTGATCCTGGCGGCCCGCGCCGAGGAGATCGCCCGTTTGGAGCGCGGCGAA

Of course another thing is the order that is mixed up when because it is undefined in the array structure in awk. That could easily be fixed of course if need be.

Last edited by Scrutinizer; 06-13-2019 at 06:12 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 12  
Old 06-13-2019
Ah now I see - and you do not have the $1 workaround because you set a different RS...
After trimming the input files it worked.
This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

2. Shell Programming and Scripting

Append string to all the files inside a directory excluding subdirectories and .zip files

Hii, Could someone help me to append string to the starting of all the filenames inside a directory but it should exclude .zip files and subdirectories. Eg. file1: test1.log file2: test2.log file3 test.zip After running the script file1: string_test1.log file2: string_test2.log file3:... (4 Replies)
Discussion started by: Ravi Kishore
4 Replies

3. Shell Programming and Scripting

Unzip all the files with subdirectories present and append a part of string from the main .zip files

Hi frnds, My requirement is I have a zip file with name say eg: test_ABC_UH_ccde2a_awdeaea_20150422.zip within that there are subdirectories on each directory we again have .zip files and in that we have files like mama20150422.gz and so on. Iam in need of a bash script so that it unzips... (0 Replies)
Discussion started by: Ravi Kishore
0 Replies

4. UNIX for Dummies Questions & Answers

Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux? (10 Replies)
Discussion started by: Mauve
10 Replies

5. UNIX for Dummies Questions & Answers

Append Files

Hi All, I have to append 2 lines at the end of a text file. If those 2 lines are already there then do not append else append the 2 lines to the text file. Eg: I have a text file, file.txt This text file might look like this, /home/kp/make.jsp /home/pk/model.jsp I have to append... (1 Reply)
Discussion started by: pavan_test
1 Replies

6. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

7. Shell Programming and Scripting

append to two files

I tried to write a script ( not working) to append first value from mylist to a file called my myfirstResult and to another called mysecondResult awk ' {print $1} >> myfirsResult ' < mylist awk ' {print $1} >> mysecondResult ' < mylist $ cat mylist A 02/16/2012 B 02/19/2012 C... (3 Replies)
Discussion started by: Sara_84
3 Replies

8. UNIX for Dummies Questions & Answers

renaming (renumbering) fasta files

I have a fasta file that looks like this: >Noname ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG >Noname ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA ... I want to... (2 Replies)
Discussion started by: Oyster
2 Replies

9. UNIX for Dummies Questions & Answers

grep FASTA files

I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this: > Seq I ACGACTAGACGATAGACGATAGA > Seq 2 ACGATGACGTAGCAGT > Seq 3 ACGATACGAT I know I can extract the IDs alone with the following code grep... (3 Replies)
Discussion started by: Xterra
3 Replies

10. UNIX for Dummies Questions & Answers

append two files

Hi, I have two files where 1 contains data and the other contains strings eg file 1 -0.00000 0.00000 0.00000 0.00000 0.00000 0.80000 0.50000 0.50000 0.60000 0.50000 0.50000 0.20000 -0.00000 0.00000 0.40000 file 2 F F F F F F T T T T T T T T T How to I append file2 to file 1 to... (1 Reply)
Discussion started by: princessotes
1 Replies
Login or Register to Ask a Question