Concatenating sequence length to another file


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 2 Weeks Ago
Concatenating sequence length to another file

I want to add the sequence length of File_1.fa and File _2.fa to form the form the fifth column in File_1_pos.txt and File_2_poa.txt respectively using awk and bash. Can anyone help me? Thanks

Get sequence length of each file
Code:
File_1.fa
File_2.fa

Add the sequence length to be the third column of File 3
File_1_pos.txt
Code:
File_1_pos       253     164
File_1_pos      738     827

File_2_pos.txt
Code:
File_2_pos      1494    1583
File_2_pos      1785    1874

Expected Output
File_1_pos.txt
Code:
1   File_1_pos       253     164    8126
2  File_1_pos      738     827    8126

File_2_poa.txt
Code:
1   File_2_pos      1494    1583    9655
2   File_2_pos      1785    1874    9655

I tried this but I dint get my expected output
Code:
for file in *.fa; do a=`awk '/^>/ {if (seqlen){print seqlen};next; } { seqlen += length($0)}END{print seqlen}' $file` | awk -F, '{$1=++i OFS"\t" $1;}1' ${file%.*}pos.txt | awk 'BEGIN{OFS="\t"}{print $1,$2,$3,$4,a}'; done


Last edited by Ibk; 2 Weeks Ago at 04:57 PM..
# 2  
Old 2 Weeks Ago
Please post input file samples. How do you calculate the "sequence length"? Is there one or more of them per file?
# 3  
Old 2 Weeks Ago
These are examples of the input sequence. Just one long sequence per file. thanks
>File_1.fa
Code:
TTGAAAGGGGGCCCGGGGGATCTCCCCCGCGGTAACTGGTCACAGTTGCCGCGGACGGAGATCATCCCCC
GGTTACCCCCTTTCGACGCGGGTACTGCGATAGTGCCACCCCAGTCCTTCCTACTCCCGACTCCCGACCC
CAACCCAGGTTCCTTGGAACAGGAACACCAATTTATTCATCCCTTGGATGCTGACTAATCAGAGGAACGT
CAGCATTTTCCGGCCCAGGCTAAGAGAAGTAGATAAGTTAGAATCTAAATTATTTATCATCCCCTTGACG
AATTCGCGTTGGAAAAGCACCTCTCACTTGCCGCTCTTCACACCCATCATTCTAATTCGGCCCCTGTGTT

>File_2.fa
Code:
GAGCCCCTTGTTGAAGTGTTTCCCTCCATCGCGACGTGGTTGGAGATCTAAGTTAACCGACTCCGACGAA
ACTACCATCATGCCTCCCCGATTATGTGATGCTTTCTGCCCTGCTGGGTGGAGCATCCTCGGGTTGAGAA
ATCTTTCTTCCTTTTACCTTGGACTCCGGTCCCCCGGTCTAAGCCGCTTGGAATAAGACAGGGTTATCTT
CACTCCTCTTCTTTTCTACTTCACAGTGTTCTATGCTGTGAAAGGGTATGTGTCGCCCCTTCCTTCTTCG


Last edited by vgersh99; 2 Weeks Ago at 03:51 PM..
# 4  
Old 2 Weeks Ago
Try
Code:
$ wc -cl *.fa | awk '
FILENAME == "-" {sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }

                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t" fil*pos.txt

$ cf *.new

---------- file_1_pos.txt.new: ----------

1    File_1_pos    253     164    350
2    File_1_pos    738     827    350

---------- file_2_pos.txt.new: ----------

1    File_2_pos    1494    1583    280
2    File_2_pos    1785    1874    280

Copy exactly as given; then mv the ".new" files over the old ".txt" files




EDIT: Given there are any number of .fa files, and each has a corresponding _pos.txt file, you could try
Code:
$ wc -cl *.fa |  
awk '
FILENAME == "-" {if ($3 == "total") next
                 sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 ARGV[ARGC++] = $3 "_pos.txt"
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }
                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t"


Last edited by RudiC; 2 Weeks Ago at 05:16 PM..
# 5  
Old 1 Week Ago
Thank you Rudic, I have tried the code but did not give me the required output. The input files are separate files as well as the expected output.
# 6  
Old 1 Week Ago
Code:
for file in *.fa;
do
  awk '
  NR==FNR {
     if ($0 ~ /^>/ && why_print_seqlen_for_this_line) {if (seqlen) print seqlen; next;}
     # is there a record separator /^>/ in the *.fa files not mentioned in samples?

     seqlen += length($0);
     next;
  }
  {print FNR, $0, seqlen}
  ' OFS="\t" $file ${file%.*}_pos.txt > t_$$

  mv -f t_$$ ${file%.*}_pos.txt
done

This User Gave Thanks to rdrtx1 For This Post:
Ibk (1 Week Ago)
# 7  
Old 1 Week Ago
Quote:
Originally Posted by Ibk
... The input files are separate files as well as the expected output.
So - which files exist, and by what feature are the files connected / related?
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Creating a master file of conjugated verbs by concatenating root and inflection from separate files gimley Shell Programming and Scripting 6 01-17-2018 03:26 AM
Inserting IDs from a text file into a sequence alignment file nans Shell Programming and Scripting 7 11-19-2014 04:32 PM
ConCATenating binaries but excluding last bytes from each file grolido UNIX for Dummies Questions & Answers 4 12-03-2013 04:38 PM
Concatenating 3 files into a single file Codesearcher Shell Programming and Scripting 3 01-28-2013 08:57 AM
Concatenating contents of a file with members in a directory Simanto Shell Programming and Scripting 3 09-25-2012 02:31 PM
find common entries and match the number with long sequence and cut that sequence in output manigrover Shell Programming and Scripting 1 09-19-2012 01:15 PM
Flat file-make field length equal to header length sonali.s.more Shell Programming and Scripting 5 03-28-2012 08:39 AM
Concatenating fixed length lines in shell script kmanyam Shell Programming and Scripting 8 04-21-2011 07:58 AM
Concatenating File and String for Sendmail high5 Shell Programming and Scripting 2 03-07-2010 02:25 PM
Concatenating Files In A Year/Month/Day File Structure Grizzly Shell Programming and Scripting 3 09-11-2009 07:50 AM
Convert a tab delimited/variable length file to fixed length file Everton_Silveir UNIX for Dummies Questions & Answers 2 12-19-2008 05:21 PM
What the command to find out the record length of a fixed length file? tranq01 UNIX for Dummies Questions & Answers 9 12-04-2008 03:04 PM
Concatenating the two lines in a file srivsn Shell Programming and Scripting 6 07-17-2008 09:03 PM
Concatenating values in a File amitkhiare Shell Programming and Scripting 7 10-09-2007 09:43 PM
concatenating static string to records in data file gillbates Shell Programming and Scripting 5 06-22-2006 06:22 PM