Sequence in one single line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sequence in one single line
# 1  
Old 06-18-2010
Sequence in one single line

My file looks like this (60 characters per line):
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC
CCCGGGGC
>GHXCZCC01APUO5
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC
CGGGGCGA
>GHXCZCC01AQSRP
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG
CCCCGGGG
But I need something like this (the entire sequence in one line):
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACGC CCCGGGGC
>GHXCZCC01APUO5
TGATGTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCCC CGGGGCGA
>GHXCZCC01AQSRP
TTGATGTTGCCAGCTGCCGTTGGTGTGTATCAGCTGGATTTTCTGGGACG CCCCGGGG
The sequences are of different lengths.
Any help will be very much appreciated!
# 2  
Old 06-18-2010
Try this:
Code:
awk '{printf("%s%s",$0,NR%3==2?" ":"\n")}' file

# 3  
Old 06-18-2010
Franklin

Thank you very much. However, there is a problem with the code. I just need to remove the newline at the end of each sequence lines and leave the ID intact. Using your code this is what I get:
Quote:
>GHL8OVD01A44BI
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGACGTTCTGATTGGCGCCTGGGTTA AGAGACCAGCAAGCCTGCTCACGGTGCGGCTGGCAGCCTCCCCGGTGGTGTGGGTTTTCG
CGTCAACGCCGGCAAATAGCAGCAGCACTACCAAGACCTTTGCCCAGTTCCCCACCATGG
AGAAATACGCTATGCCCCGCCAGGACTCCCCCAGTGGGCACCAGCGATCATGTCCAAGAT GGCTTGTGGGATCCGGAGCAGCTGCGCTACTACCAATGCCGTCGTAGGGGACCAGTTCAT
CATCATATCCTGTCGA
>GHL8OVD01AL2MI TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGAC
>GHL8OVD01A9GNF
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGACGTT >GHL8OVD01AYRUS
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGG
>GHL8OVD01AL5XY TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGACGTTCTGATTGGCGCCTGGGTTA
AAGAGACCAGCAAGCCTGCTCACGGTGCGGCTGGCAGCCTCCCCGGTGGTGTGGGTTCTC
GCGTCAACGCCGGCAAAATAGCAGCAGCACTACCAAGACCTTTGCCCAGTTCCCCACCAT GGAGAAATACGCTATGCCCGCCAGGACTCCCCAGTGGGCACCAGCGATCATGTCCAAGAT
GGCTTGTGGGATCCGGAGCAGCTGCGCTACTACCAATGCCGTCGTAGGGGACCAGTTCAT
CATCATATCCTGTCGA >GHL8OVD01BGQC9
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGAC
>GHL8OVD01ALZ08 TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGAC
>GHL8OVD01BP7ZA
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGAC >GHL8OVD01AL8W3
TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGACGTT
>GHL8OVD01ALXG8 TTGATGTGCCAGCTGCCGTTGGTGTTGATCAGCTGGACTTTCTGATTGGCGCCTAGGTCA
AAGAGACCAGCAAACCTGCCCACGGTGCGGCCGGCAGCCTCCCCGGTGGTGTGGGTTTTC
GCGTCAACGCCGGCAAATAGCAGCAGCACTACCAAGACCTTTGCCCAGTTCCCCACCATG GAGAAATACGCTATGCCCGCCAGGACTCCCCAGTGGGCACCAGCGATCATGTCCAAGATG
GCTTGTGGGATCCGGAGCAGCTGCGCTACTACCAATGCCGTCGTAGGGGACCAGTTCATC
ATCATATCCTGTCGA >GHL8OVD01AL5V0
# 4  
Old 06-18-2010
You could try something like this:
Code:
awk '/^>/ { 
  print (buff ? buff RS : null) $0
  buff = null; next
  }
{ 
  buff = buff ? buff FS $0 : $0 
  }
END { print buff }' infile

# 5  
Old 06-18-2010
radoulov

Almost there! The problem is that there is a gap every 60 characters and it should be continuous instead. I have uploaded 3 files including the input, output and ideal file so you could see what is happening and what I would like to accomplish.
Thank you very much!
# 6  
Old 06-18-2010
I believe, taking out the "FS" in the following line would do it:

Code:
  buff = buff ? buff FS $0 : $0

Change to:
Code:
  buff = buff ? buff $0 : $0

# 7  
Old 06-19-2010
What about this one ?
Code:
awk '{printf (/^>/)?((NR==1)?x:RS)$0 RS:$0}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print Line as per the dependent sequence in shell script.

Hi i have a file like this as shown below: DA PROCESS_ID IDENTIFIER DA_FILE STATUS WAITING_FOR SCOPED_DEPENDENT 1836 21000 01052019 BH90P.TEMP.DA1836.FTP W NULL ... (6 Replies)
Discussion started by: krishnaswarnkar
6 Replies

2. Shell Programming and Scripting

Creating a sequence of numbers in a line for 1000 files

Hi, I try to explain my problem , I have a file like this: aasdsaffsc23 scdsfsddvf46567 mionome0001.pdb asdsdvcxvds dsfdvcvc2324w What I need to do is to create 1000 files in which myname line listing a sequence of numbers from 0001 to 1000. So I want to have : nomefile0001.txt that must... (10 Replies)
Discussion started by: danyz84
10 Replies

3. UNIX for Dummies Questions & Answers

To find and display the middle line in a file using single line command.

Hi all, How can i display the middle line of a file using a single line command? (6 Replies)
Discussion started by: Lakme Pemmaiah
6 Replies

4. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

5. Shell Programming and Scripting

How to get line after occurence of sequence of patterns

In the past I needed a help with the problem how to search for pattern after the occurence of another pattern which is described in this thread: https://www.unix.com/shell-programmin...-pattern1.html Now I would need something quite similar, only the pattern which is to be searched must be... (3 Replies)
Discussion started by: sameucho
3 Replies

6. Shell Programming and Scripting

Multiple lines in a single column to be merged as a single line for a record

Hi, I have a requirement with, No~Dt~Notes 1~2011/08/1~"aaa bbb ccc ddd eee fff ggg hhh" Single column alone got splitted into multiple lines. I require the output as No~Dt~Notes 1~2011/08/1~"aaa<>bbb<>ccc<>ddd<>eee<>fff<>ggg<>hhh" mean to say those new lines to be... (1 Reply)
Discussion started by: Bhuvaneswari
1 Replies

7. Shell Programming and Scripting

filter out a sequence from multiple lines line

Hi, I have an unwanted string at random lines of my verilog (*.v) file. (* abccddee *) input A; (* xyz *) input B; (* 1234 *) output C; I want a clean file like this: input A; input B; output C; the unwanted string begins with "(*" and ends with "*)" at multiple lines. Any help... (2 Replies)
Discussion started by: return_user
2 Replies

8. Shell Programming and Scripting

printing sequence of line no. with comma separated

Kindly i want to concatenate every 12 lines ina file, using a comma separator between fields (each line)? can anyone help please? thanks a lot in advance. (5 Replies)
Discussion started by: m_wassal
5 Replies

9. UNIX for Dummies Questions & Answers

Executing a sequence of commands as a single background process

I'm writing a PHP script which will take a given media file and convert it into a flash (flv) file. In addition to this, once the flv file has been generated I create two thumbnails (of different sizes) from the newly generated flv file. In order to do this I'm calling ffmpeg from the command... (4 Replies)
Discussion started by: phatphug
4 Replies

10. Shell Programming and Scripting

Reorder the sequence of line groupings/QIF export

Hi All, I need to reorder the sequence of line groupings - specifically the output from a bank QIF (Quicken Interchange Format) export. Sample is like this: !Type:Bank D12/05/2008 T-10.00 N1 Details of Charge 1 ^ D07/05/2008 T-20.00 N2 Details of Charge 2 ^ D17/04/2008 T-30.00 (0 Replies)
Discussion started by: mark101
0 Replies
Login or Register to Ask a Question