Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Concatenating sequence length to another file


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Concatenating sequence length to another file

I want to add the sequence length of File_1.fa and File _2.fa to form the form the fifth column in File_1_pos.txt and File_2_poa.txt respectively using awk and bash. Can anyone help me? Thanks

Get sequence length of each file
Code:
File_1.fa
File_2.fa

Add the sequence length to be the third column of File 3
File_1_pos.txt
Code:
File_1_pos       253     164
File_1_pos      738     827

File_2_pos.txt
Code:
File_2_pos      1494    1583
File_2_pos      1785    1874

Expected Output
File_1_pos.txt
Code:
1   File_1_pos       253     164    8126
2  File_1_pos      738     827    8126

File_2_poa.txt
Code:
1   File_2_pos      1494    1583    9655
2   File_2_pos      1785    1874    9655

I tried this but I dint get my expected output
Code:
for file in *.fa; do a=`awk '/^>/ {if (seqlen){print seqlen};next; } { seqlen += length($0)}END{print seqlen}' $file` | awk -F, '{$1=++i OFS"\t" $1;}1' ${file%.*}pos.txt | awk 'BEGIN{OFS="\t"}{print $1,$2,$3,$4,a}'; done


Last edited by Ibk; 03-05-2019 at 05:57 PM..
# 2  
Please post input file samples. How do you calculate the "sequence length"? Is there one or more of them per file?
# 3  
These are examples of the input sequence. Just one long sequence per file. thanks
>File_1.fa
Code:
TTGAAAGGGGGCCCGGGGGATCTCCCCCGCGGTAACTGGTCACAGTTGCCGCGGACGGAGATCATCCCCC
GGTTACCCCCTTTCGACGCGGGTACTGCGATAGTGCCACCCCAGTCCTTCCTACTCCCGACTCCCGACCC
CAACCCAGGTTCCTTGGAACAGGAACACCAATTTATTCATCCCTTGGATGCTGACTAATCAGAGGAACGT
CAGCATTTTCCGGCCCAGGCTAAGAGAAGTAGATAAGTTAGAATCTAAATTATTTATCATCCCCTTGACG
AATTCGCGTTGGAAAAGCACCTCTCACTTGCCGCTCTTCACACCCATCATTCTAATTCGGCCCCTGTGTT

>File_2.fa
Code:
GAGCCCCTTGTTGAAGTGTTTCCCTCCATCGCGACGTGGTTGGAGATCTAAGTTAACCGACTCCGACGAA
ACTACCATCATGCCTCCCCGATTATGTGATGCTTTCTGCCCTGCTGGGTGGAGCATCCTCGGGTTGAGAA
ATCTTTCTTCCTTTTACCTTGGACTCCGGTCCCCCGGTCTAAGCCGCTTGGAATAAGACAGGGTTATCTT
CACTCCTCTTCTTTTCTACTTCACAGTGTTCTATGCTGTGAAAGGGTATGTGTCGCCCCTTCCTTCTTCG


Last edited by vgersh99; 03-05-2019 at 04:51 PM..
# 4  
Try
Code:
$ wc -cl *.fa | awk '
FILENAME == "-" {sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }

                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t" fil*pos.txt

$ cf *.new

---------- file_1_pos.txt.new: ----------

1    File_1_pos    253     164    350
2    File_1_pos    738     827    350

---------- file_2_pos.txt.new: ----------

1    File_2_pos    1494    1583    280
2    File_2_pos    1785    1874    280

Copy exactly as given; then mv the ".new" files over the old ".txt" files




EDIT: Given there are any number of .fa files, and each has a corresponding _pos.txt file, you could try
Code:
$ wc -cl *.fa |  
awk '
FILENAME == "-" {if ($3 == "total") next
                 sub (".fa", "", $3)
                 T[$3] = $2 - $1
                 ARGV[ARGC++] = $3 "_pos.txt"
                 next
                }
FNR == 1        {IX = FILENAME
                 sub (/_[^_]*\..*$/, "", IX)
                }
                {print FNR, $0, T[IX] > (FILENAME ".new")
                }
' - OFS="\t"


Last edited by RudiC; 03-05-2019 at 06:16 PM..
# 5  
Thank you Rudic, I have tried the code but did not give me the required output. The input files are separate files as well as the expected output.
# 6  
Code:
for file in *.fa;
do
  awk '
  NR==FNR {
     if ($0 ~ /^>/ && why_print_seqlen_for_this_line) {if (seqlen) print seqlen; next;}
     # is there a record separator /^>/ in the *.fa files not mentioned in samples?

     seqlen += length($0);
     next;
  }
  {print FNR, $0, seqlen}
  ' OFS="\t" $file ${file%.*}_pos.txt > t_$$

  mv -f t_$$ ${file%.*}_pos.txt
done

This User Gave Thanks to rdrtx1 For This Post:
# 7  
Quote:
Originally Posted by Ibk
... The input files are separate files as well as the expected output.
So - which files exist, and by what feature are the files connected / related?
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Concatenating 3 files into a single file

I have 3 files File1 C1 C2 c3 File 2 C1 c2 c3 File 3 C1 c2 c3 Now i want to have File1 as C1 c2 c3 I File2 as C1 c2 c3 O File3 as c1 c2 c3 D and these 3 files should be concatenated into a single file how can it be done in unix script? (3 Replies)
Discussion started by: Codesearcher
3 Replies

2. Shell Programming and Scripting

Concatenating contents of a file with members in a directory

Hi, I have a unix file with the below structure - CustId1 CustName1 CustPhn1 /u/home/xmldata/A000001 CustId2 CustName2 CustPhn2 /u/home/xmldata/A000002 CustId3 CustName3 CustPhn3 /u/home/xmldata/A000003 Then I have another unix directory /u/home/xmldata This directory has... (3 Replies)
Discussion started by: Simanto
3 Replies

3. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

5. Shell Programming and Scripting

Concatenating fixed length lines in shell script

I have a peculiar file with record format like given below. Each line is wrapped to next lines after certain number of characters. I want to concatenate all wrapped lines into 1. Input:(wrapped after 10 columns) This is li ne1 This is li ne2 and this line is too lo ng Shortline ... (8 Replies)
Discussion started by: kmanyam
8 Replies

6. Shell Programming and Scripting

Concatenating File and String for Sendmail

I want o add a variable in addition to a file which will be send with sendmail. I have problems to find the correct syntax for concatenating this variable called $MyVariable. sendmail mai@domain.com </tmp/errormessage.txt $MyVariable] Thanks for your help! (2 Replies)
Discussion started by: high5
2 Replies

7. UNIX for Dummies Questions & Answers

Convert a tab delimited/variable length file to fixed length file

Hi, all. I need to convert a file tab delimited/variable length file in AIX to a fixed lenght file delimited by spaces. This is the input file: 10200002<tab>US$ COM<tab>16/12/2008<tab>2,3775<tab>2,3783 19300978<tab>EURO<tab>16/12/2008<tab>3,28523<tab>3,28657 And this is the expected... (2 Replies)
Discussion started by: Everton_Silveir
2 Replies

8. UNIX for Dummies Questions & Answers

What the command to find out the record length of a fixed length file?

I want to find out the record length of a fixed length file? I forgot the command. Any body know? (9 Replies)
Discussion started by: tranq01
9 Replies

9. Shell Programming and Scripting

Concatenating the two lines in a file

hi My requirement is i have a file with some records like this file name ::xyz a=1 b=100,200 ,300,400 ,500,600 c=700,800 d=900 i want to change my file a=1 b=100,200,300,400 c=700,800 d=900 if record starts with " , " that line should fallows the previous line.please give... (6 Replies)
Discussion started by: srivsn
6 Replies

10. Shell Programming and Scripting

Concatenating values in a File

Hi All, I have a ',' delimited file and i would like concatenate a new value at a specific column. Example :- xXXX,XYZ,20071005,ABC,DEF,123 xXXX,XYZ,20071005,ABC,DEF,123 xXXX,XYZ,20071005,ABC,DEF,123 The output that i want is xXXX,XYZ,20071005001,ABC,DEF,123... (7 Replies)
Discussion started by: amitkhiare
7 Replies

Featured Tech Videos