reformat data with a shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting reformat data with a shell script
# 1  
Old 08-07-2009
reformat data with a shell script

Can anyone help me with a shell script that can do the following:

I have a data in fasta format (first line is the header, followed by a sequence of characters).
Code:
>ALLLY
GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC
GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC
AGTAAAAATCTCGGAGAGCTGACACCAAGTCCTCCCCTGCCACGTAGCAGTGGTAAAGTC
CGAAGCTCAAATTCCGAGAATTGAGCTCTGTTGATTCTTAGAACTGGGGTTCTTAGAAGT
>BLLLK
CTGGTCTCAGTCTGGTACTGAAGTCAGGAATGGCTTAAGGTGAAATCGTGGTCCTCTGGT
GAAGCTCAGCGAAGACCCCCTCGCCTTGTTTATGACAAGAGAACTTCTGGGGGCGGGAGG
AAGAGTCCCTGTTACGATGCTGATCATCATTGAGCTTTTGCTGAGCAGAAAACTCTTTAG
TACTCAAGGTCGAGAGTCTCTGGTGGTCTGCCTGGCACCAGGCACCTTCCTACAACCCTA
GTTTTCCAAAAGGACAAAGCCTGGGGCAGGCGACGTCCTAGCTCGCATTTGAACAGGGCC
GCGGGCCAGCAGAGATGCGCGATGCCCAACTCTTTCCAAGAGCACCTCGCGTCCCGAACC

I want to reformat the data such that I get it in the following format, such that the entire sequence of characters for one entry is printed in one line and the name of the entry eg. ALLLY is now printed as a tab delimited besides the sequence of characters.
Code:
ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTCGAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGCAGTAAAAATCTCGGAGAGCTGACACCAAGTCCTCCCCTGCCACGTAGCAGTGGTAAAGTCCGAAGCTCAAATTCCGAGAATTGAGCTCTGTTGATTCTTAGAACTGGGGTTCTTAGAAGT
BLLLK CTGGTCTCAGTCTGGTACTGAAGTCAGGAATGGCTTAAGGTGAAATCGTGGTCCTCTGGTGAAGCTCAGCGAAGACCCCCTCGCCTTGTTTATGACAAGAGAACTTCTGGGGGCGGGAGGAAGAGTCCCTGTTACGATGCTGATCATCATTGAGCTTTTGCTGAGCAGAAAACTCTTTAGTACTCAAGGTCGAGAGTCTCTGGTGGTCTGCCTGGCACCAGGCACCTTCCTACAACCCTAGTTTTCCAAAAGGACAAAGCCTGGGGCAGGCGACGTCCTAGCTCGCATTTGAACAGGGCCGCGGGCCAGCAGAGATGCGCGATGCCCAACTCTTTCCAAGAGCACCTCGCGTCCCGAACC

Any suggestion or working script is highly appreciated.

biobee

Last edited by Franklin52; 08-07-2009 at 02:18 PM.. Reason: adding code tags, please use code tags!
# 2  
Old 08-07-2009
Here's what I would do under vi:
Code:
:v/^>/j!
:g/^>/j
:g/^>/s///

# 3  
Old 08-07-2009
This is basic for awk
Code:
#!/bin/awk -f
# join.awk
/^>/  { header=substr($0,2) ; next }
        {print header,$0 }

Code:
chmod a+rx join.awk
./join.awk < inputfile > outputfile
cat some | ./join.awk | somecmd
#...

# 4  
Old 08-07-2009
Here's one way to do it with Perl:

Code:
$
$ cat data.txt
>ALLLY
GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC
GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC
AGTAAAAATCTCGGAGAGCTGACACCAAGTCCTCCCCTGCCACGTAGCAGTGGTAAAGTC
CGAAGCTCAAATTCCGAGAATTGAGCTCTGTTGATTCTTAGAACTGGGGTTCTTAGAAGT
>BLLLK
CTGGTCTCAGTCTGGTACTGAAGTCAGGAATGGCTTAAGGTGAAATCGTGGTCCTCTGGT
GAAGCTCAGCGAAGACCCCCTCGCCTTGTTTATGACAAGAGAACTTCTGGGGGCGGGAGG
AAGAGTCCCTGTTACGATGCTGATCATCATTGAGCTTTTGCTGAGCAGAAAACTCTTTAG
TACTCAAGGTCGAGAGTCTCTGGTGGTCTGCCTGGCACCAGGCACCTTCCTACAACCCTA
GTTTTCCAAAAGGACAAAGCCTGGGGCAGGCGACGTCCTAGCTCGCATTTGAACAGGGCC
GCGGGCCAGCAGAGATGCGCGATGCCCAACTCTTTCCAAGAGCACCTCGCGTCCCGAACC
$
$
$ perl -ne 'chomp; if (/^>/) {s/^>//; print $. != 1 ? "\n":"",$_,"\t"} else {print} END {print "\n"}' data.txt
ALLLY   GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTCGAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGCAGTAAAAATCTCGGAGAGCTGACACCAAGTCCTCCCCTGCCACGTAGCAGTGGTAAAGTCCGAAGCTCAAATTCCGAGAATTGAGCTCTGTTGATTCTTAGAACTGGGGTTCTTAGAAGT
BLLLK   CTGGTCTCAGTCTGGTACTGAAGTCAGGAATGGCTTAAGGTGAAATCGTGGTCCTCTGGTGAAGCTCAGCGAAGACCCCCTCGCCTTGTTTATGACAAGAGAACTTCTGGGGGCGGGAGGAAGAGTCCCTGTTACGATGCTGATCATCATTGAGCTTTTGCTGAGCAGAAAACTCTTTAGTACTCAAGGTCGAGAGTCTCTGGTGGTCTGCCTGGCACCAGGCACCTTCCTACAACCCTAGTTTTCCAAAAGGACAAAGCCTGGGGCAGGCGACGTCCTAGCTCGCATTTGAACAGGGCCGCGGGCCAGCAGAGATGCGCGATGCCCAACTCTTTCCAAGAGCACCTCGCGTCCCGAACC
$
$

tyler_durden
# 5  
Old 08-07-2009
re:Tyler

Hi Tyler,

Thanks for the perl one liner. I ran it and it gives me an error:

perl -ne 'chomp; if (/^>/) {s/^>//; print $. != 1 ? "\n":"",$_,"\t"} else {print} END {print "\n"}' data.txt

Can't find string terminator "'" anywhere before EOF at -e line 1.

---------- Post updated at 08:55 AM ---------- Previous update was at 08:46 AM ----------

Hi Tyler,
It works in Unix. So its fine now.

thanks
# 6  
Old 08-09-2009
Code:
sed -n '/^>/{
1{h;}
1!{x;s/\n/	/;s/^>//;s/\n//g;p;d;}
}
/^>/!{
${H;x;s/\n/	/;s/^>//;s/\n//g;p;d;}
$!{H;}
}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with reformat data set

Input file 4CL1 O24145 CoA1 4CL1 P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777 coumarate 4CL3 Q54P79 coumarate 4CL3 QP7932 coumarate Desired output result 4CL1 O24145#P31684 CoA1 4CL1 Q54P77 CoA_1 73 O36421 Unknown 4CL3 Q9S777#Q54P79#QP7932 coumarate I... (5 Replies)
Discussion started by: perl_beginner
5 Replies

2. Shell Programming and Scripting

Help with reformat data structure

Input file: bv|111259484|pir||T49736_real_data bv|159484|pir||T9736_data_figure bv|113584|prf|T4736|truth bv|113584|pir||T4736_truth Desired output: bv|111259484|pir|T49736|real_data bv|159484|pir|T9736|data_figure bv|113584|prf|T4736|truth bv|113584|pir|T4736|truth Once the... (8 Replies)
Discussion started by: perl_beginner
8 Replies

3. Shell Programming and Scripting

Data reformat and rearrangement problem asking

Input file: dependent general_process dependent general_process regulation general_process - - template component food component binding data_rearrangement binding data_rearrangement specific_activity data_rearrangement - ... (7 Replies)
Discussion started by: cpp_beginner
7 Replies

4. Shell Programming and Scripting

Reformat MLS Data - Use AWK?

I am helping my wife set up a real estate site and I am starting to integrate MLS listings. We are using a HostGator level 5 VPS running CentOS and have full root and SSH access to the VPS. Thus far I have automated the daily FTP download of listings from our MLS server using a little sh script.... (4 Replies)
Discussion started by: Chicago_Realtor
4 Replies

5. Shell Programming and Scripting

Help with reformat input data

Input file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Desired output file: 58227131 50087390 57339526 40578034 65348841 55614853 64363217 44178559 Command that I try: (4 Replies)
Discussion started by: perl_beginner
4 Replies

6. Shell Programming and Scripting

Help with reformat data content

input file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score Desired output file: hsa-miR-4726-5p Score hsa-miR-483-5p Score hsa-miR-125b-2* Score hsa-miR-4492 hsa-miR-4508 hsa-miR-4486 Score ... (6 Replies)
Discussion started by: perl_beginner
6 Replies

7. Shell Programming and Scripting

Reformat the data of a file.

I have a file which have data like A.txt a 1Jan I am in a1. 1Jan I was born. 2Jan I am here. 3Jan I am in a3. b 1Jan I am in b1. c 2Jan I am in c2. d 2Jan I am in d2. 5jan I am in d5. date in the file might be vary evertime. (9 Replies)
Discussion started by: samkhu
9 Replies

8. Shell Programming and Scripting

Shell Script to Reformat a flat file

Hi , I have a text file noname.txt containing 1000+ records like this. One of the record I have given below. Input will b e like this BOT: 2010/06/01 00:25:59 21 = "private" Access-Method = 31 NCC = GBR 01 = "340806@osiris.fr.ft" 04 =... (2 Replies)
Discussion started by: smalya
2 Replies

9. Shell Programming and Scripting

Reformat Data (Perl)

I am new to Perl. I need to reformat a data file as the last part of a script I am working on. I am stuck on this. Here is the current format: CUSTOMER Filename 09/04/07-08:49 CUSTOMER Filename 09/04/07-08:52 CUSTOMER Filename 09/04/07-08:52 CUSTOMER2 Filename 09/04/07-08:49 CUSTOMER2... (3 Replies)
Discussion started by: flood
3 Replies

10. Shell Programming and Scripting

help reformat data with awk

I am trying to write an awk program to reformat a data table and convert the date to julian time. I have all the individual steps working, but I am having some issues joing them into one program. Can anyone help me out? Here is my code so far: # This is an awk program to convert the dates from... (4 Replies)
Discussion started by: climbak
4 Replies
Login or Register to Ask a Question