Concatenating sequence length to another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Concatenating sequence length to another file
# 1  
Old 03-05-2019
Concatenating sequence length to another file

I want to add the sequence length of File_1.fa and File _2.fa to form the form the fifth column in File_1_pos.txt and File_2_poa.txt respectively using awk and bash. Can anyone help me? Thanks

Get sequence length of each file
Code:
File_1.fa
File_2.fa

Add the sequence length to be the third column of File 3
File_1_pos.txt
Code:
File_1_pos       253     164
File_1_pos      738     827

File_2_pos.txt
Code:
File_2_pos      1494    1583
File_2_pos      1785    1874

Expected Output
File_1_pos.txt
Code:
1   File_1_pos       253     164    8126
2  File_1_pos      738     827    8126

File_2_poa.txt
Code:
1   File_2_pos      1494    1583    9655
2   File_2_pos      1785    1874    9655

I tried this but I dint get my expected output
Code:
for file in *.fa; do a=`awk '/^>/ {if (seqlen){print seqlen};next; } { seqlen += length($0)}END{print seqlen}' $file` | awk -F, '{$1=++i OFS"\t" $1;}1' ${file%.*}pos.txt | awk 'BEGIN{OFS="\t"}{print $1,$2,$3,$4,a}'; done


Last edited by Ibk; 03-05-2019 at 05:57 PM..
# 2  
Old 03-05-2019
Please post input file samples. How do you calculate the "sequence length"? Is there one or more of them per file?
# 3  
Old 03-05-2019
These are examples of the input sequence. Just one long sequence per file. thanks
>File_1.fa
Code:
TTGAAAGGGGGCCCGGGGGATCTCCCCCGCGGTAACTGGTCACAGTTGCCGCGGACGGAGATCATCCCCC
GGTTACCCCCTTTCGACGCGGGTACTGCGATAGTGCCACCCCAGTCCTTCCTACTCCCGACTCCCGACCC
CAACCCAGGTTCCTTGGAACAGGAACACCAATTTATTCATCCCTTGGATGCTGACTAATCAGAGGAACGT
CAGCATTTTCCGGCCCAGGCTAAGAGAAGTAGATAAGTTAGAATCTAAATTATTTATCATCCCCTTGACG
AATTCGCGTTGGAAAAGCACCTCTCACTTGCCGCTCTTCACACCCATCATTCTAATTCGGCCCCTGTGTT

>File_2.fa
Code:
GAGCCCCTTGTTGAAGTGTTTCCCTCCATCGCGACGTGGTTGGAGATCTAAGTTAACCGACTCCGACGAA
ACTACCATCATGCCTCCCCGATTATGTGATGCTTTCTGCCCTGCTGGGTGGAGCATCCTCGGGTTGAGAA
ATCTTTCTTCCTTTTACCTTGGACTCCGGTCCCCCGGTCTAAGCCGCTTGGAATAAGACAGGGTTATCTT
CACTCCTCTTCTTTTCTACTTCACAGTGTTCTATGCTGTGAAAGGGTATGTGTCGCCCCTTCCTTCTTCG


Last edited by vgersh99; 03-05-2019 at 04:51 PM..
# 4  
Old 03-05-2019
Try
Code:
$ wc -cl *.fa | awk '
FILENAME == "-" {sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }

                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t" fil*pos.txt

$ cf *.new

---------- file_1_pos.txt.new: ----------

1    File_1_pos    253     164    350
2    File_1_pos    738     827    350

---------- file_2_pos.txt.new: ----------

1    File_2_pos    1494    1583    280
2    File_2_pos    1785    1874    280

Copy exactly as given; then mv the ".new" files over the old ".txt" files




EDIT: Given there are any number of .fa files, and each has a corresponding _pos.txt file, you could try
Code:
$ wc -cl *.fa |  
awk '
FILENAME == "-" {if ($3 == "total") next
                 sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 ARGV[ARGC++] = $3 "_pos.txt"
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }
                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t"


Last edited by RudiC; 03-05-2019 at 06:16 PM..
# 5  
Old 03-07-2019
Thank you Rudic, I have tried the code but did not give me the required output. The input files are separate files as well as the expected output.
# 6  
Old 03-07-2019
Code:
for file in *.fa;
do
  awk '
  NR==FNR {
     if ($0 ~ /^>/ && why_print_seqlen_for_this_line) {if (seqlen) print seqlen; next;}
     # is there a record separator /^>/ in the *.fa files not mentioned in samples?

     seqlen += length($0);
     next;
  }
  {print FNR, $0, seqlen}
  ' OFS="\t" $file ${file%.*}_pos.txt > t_$$

  mv -f t_$$ ${file%.*}_pos.txt
done

This User Gave Thanks to rdrtx1 For This Post:
# 7  
Old 03-07-2019
Quote:
Originally Posted by Ibk
... The input files are separate files as well as the expected output.
So - which files exist, and by what feature are the files connected / related?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

ConCATenating binaries but excluding last bytes from each file

Hi there, shameful Linux Newbie here :p I was wondering if you could help with my problem... I have plenty of files I'd like to concatenate. I know how to basically use cat command but that won't be enough from what I need : excluding the last xx bytes from files before assembling since there's... (4 Replies)
Discussion started by: grolido
4 Replies

2. Shell Programming and Scripting

Concatenating 3 files into a single file

I have 3 files File1 C1 C2 c3 File 2 C1 c2 c3 File 3 C1 c2 c3 Now i want to have File1 as C1 c2 c3 I File2 as C1 c2 c3 O File3 as c1 c2 c3 D and these 3 files should be concatenated into a single file how can it be done in unix script? (3 Replies)
Discussion started by: Codesearcher
3 Replies

3. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

5. Shell Programming and Scripting

Concatenating fixed length lines in shell script

I have a peculiar file with record format like given below. Each line is wrapped to next lines after certain number of characters. I want to concatenate all wrapped lines into 1. Input:(wrapped after 10 columns) This is li ne1 This is li ne2 and this line is too lo ng Shortline ... (8 Replies)
Discussion started by: kmanyam
8 Replies

6. Shell Programming and Scripting

Concatenating File and String for Sendmail

I want o add a variable in addition to a file which will be send with sendmail. I have problems to find the correct syntax for concatenating this variable called $MyVariable. sendmail mai@domain.com </tmp/errormessage.txt $MyVariable] Thanks for your help! (2 Replies)
Discussion started by: high5
2 Replies

7. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

8. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

9. Shell Programming and Scripting

Concatenating the two lines in a file

hi My requirement is i have a file with some records like this file name ::xyz a=1 b=100,200 ,300,400 ,500,600 c=700,800 d=900 i want to change my file a=1 b=100,200,300,400 c=700,800 d=900 if record starts with " , " that line should fallows the previous line.please give... (6 Replies)
Discussion started by: srivsn
6 Replies

10. Shell Programming and Scripting

Concatenating values in a File

Hi All, I have a ',' delimited file and i would like concatenate a new value at a specific column. Example :- xXXX,XYZ,20071005,ABC,DEF,123 xXXX,XYZ,20071005,ABC,DEF,123 xXXX,XYZ,20071005,ABC,DEF,123 The output that i want is xXXX,XYZ,20071005001,ABC,DEF,123... (7 Replies)
Discussion started by: amitkhiare
7 Replies
Login or Register to Ask a Question