Split single file into multiple files based on the number in the column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split single file into multiple files based on the number in the column
# 1  
Old 12-20-2009
Split single file into multiple files based on the number in the column

Dear All,

I would like to split a file of the following format into multiple files based on the number in the 6th column (numbers 1, 2, 3...):
Code:
ATOM      1  N   GLY A   1      -3.198  27.537  -5.958  1.00  0.00           N  
ATOM      2  CA  GLY A   1      -2.199  28.399  -6.617  1.00  0.00           C  
ATOM      3  C   GLY A   1      -2.168  29.706  -5.855  1.00  0.00           C  
ATOM      4  O   GLY A   2      -3.205  30.358  -5.782  1.00  0.00           O  
ATOM      5  H1  GLY A   2      -3.280  26.649  -6.428  1.00  0.00           H  
ATOM      6  HA2 GLY A   3      -1.220  27.923  -6.579  1.00  0.00           H  
ATOM      7  HA3 GLY A   3      -2.492  28.588  -7.649  1.00  0.00           H  
ATOM      8  N   SER A   3      -1.051  30.010  -5.194  1.00  0.00           N  
ATOM      9  CA  SER A   4      -1.141  30.319  -3.777  1.00  0.00           C  
ATOM     10  C   SER A   4       0.107  31.009  -3.229  1.00  0.00           C  
ATOM     11  O   SER A   5       1.081  31.273  -3.935  1.00  0.00           O  
ATOM     12  CB  SER A   5      -1.242  29.003  -2.978  1.00  0.00           C  
ATOM     13  OG  SER A   5      -2.210  28.079  -3.427  1.00  0.00           O  
ATOM     14  H   SER A   5      -0.165  29.571  -5.395  1.00  0.00           H  
ATOM     15  HA  SER A   5      -2.021  30.936  -3.581  1.00  0.00           H  
ATOM     16  HB2 SER A   6      -0.271  28.504  -2.981  1.00  0.00           H  
ATOM     17  HB3 SER A   6      -1.481  29.244  -1.942  1.00  0.00           H

I would like to get separate files based on the information in the 6th column:
Code:
File 1:

ATOM      1  N   GLY A   1      -3.198  27.537  -5.958  1.00  0.00           N  
ATOM      2  CA  GLY A   1      -2.199  28.399  -6.617  1.00  0.00           C  
ATOM      3  C   GLY A   1      -2.168  29.706  -5.855  1.00  0.00           C  
ATOM      4  O   GLY A   2      -3.205  30.358  -5.782  1.00  0.00           O  
ATOM      5  H1  GLY A   2      -3.280  26.649  -6.428  1.00  0.00           H  

File 2:

ATOM      4  O   GLY A   2      -3.205  30.358  -5.782  1.00  0.00           O  
ATOM      5  H1  GLY A   2      -3.280  26.649  -6.428  1.00  0.00           H  
ATOM      6  HA2 GLY A   3      -1.220  27.923  -6.579  1.00  0.00           H  
ATOM      7  HA3 GLY A   3      -2.492  28.588  -7.649  1.00  0.00           H  
ATOM      8  N   SER A   3      -1.051  30.010  -5.194  1.00  0.00           N  

File 3:

ATOM      6  HA2 GLY A   3      -1.220  27.923  -6.579  1.00  0.00           H  
ATOM      7  HA3 GLY A   3      -2.492  28.588  -7.649  1.00  0.00           H  
ATOM      8  N   SER A   3      -1.051  30.010  -5.194  1.00  0.00           N  
ATOM      9  CA  SER A   4      -1.141  30.319  -3.777  1.00  0.00           C  
ATOM     10  C   SER A   4       0.107  31.009  -3.229  1.00  0.00           C  

File 4:

ATOM      9  CA  SER A   4      -1.141  30.319  -3.777  1.00  0.00           C  
ATOM     10  C   SER A   4       0.107  31.009  -3.229  1.00  0.00           C  
ATOM     11  O   SER A   5       1.081  31.273  -3.935  1.00  0.00           O  
ATOM     12  CB  SER A   5      -1.242  29.003  -2.978  1.00  0.00           C  
ATOM     13  OG  SER A   5      -2.210  28.079  -3.427  1.00  0.00           O  
ATOM     14  H   SER A   5      -0.165  29.571  -5.395  1.00  0.00           H  
ATOM     15  HA  SER A   5      -2.021  30.936  -3.581  1.00  0.00           H  

File 5:

ATOM     11  O   SER A   5       1.081  31.273  -3.935  1.00  0.00           O  
ATOM     12  CB  SER A   5      -1.242  29.003  -2.978  1.00  0.00           C  
ATOM     13  OG  SER A   5      -2.210  28.079  -3.427  1.00  0.00           O  
ATOM     14  H   SER A   5      -0.165  29.571  -5.395  1.00  0.00           H  
ATOM     15  HA  SER A   5      -2.021  30.936  -3.581  1.00  0.00           H  
ATOM     16  HB2 SER A   6      -0.271  28.504  -2.981  1.00  0.00           H  
ATOM     17  HB3 SER A   6      -1.481  29.244  -1.942  1.00  0.00           H

I would be very grateful if you could please write me a few lines of bash/awk/sed/csplit code that goes through the file and outputs multiple files. The file format given above (PDB) is used to describe 3D protein structures.

I thank you for your help in advance.

Thanks,
Tomas

Last edited by Scott; 12-20-2009 at 09:21 PM.. Reason: Please use code tags
# 2  
Old 12-20-2009
Hi,

You have mentioned

...multiple files based on the number in the 6th column (numbers 1, 2, 3...):..


But your expected output looks different.

Do you mean this ?

Code:
$ awk '{close(f);f=$6}{print > f".txt"}' input.txt

Code:
$ cat 1.txt
ATOM      1  N   GLY A   1      -3.198  27.537  -5.958  1.00  0.00           N
ATOM      2  CA  GLY A   1      -2.199  28.399  -6.617  1.00  0.00           C
ATOM      3  C   GLY A   1      -2.168  29.706  -5.855  1.00  0.00           C

$ cat 2.txt
ATOM      4  O   GLY A   2      -3.205  30.358  -5.782  1.00  0.00           O
ATOM      5  H1  GLY A   2      -3.280  26.649  -6.428  1.00  0.00           H

Hope this helps.
# 3  
Old 12-21-2009
Code:
awk '{print > ("file" $6)}' infile

..see previous answer Smilie
# 4  
Old 12-21-2009
Code:
$ awk '{var=$6; var=var-1} {print >"file"$6} {print >"file"var}' urfile

$ cat file1
ATOM      1  N   GLY A   1      -3.198  27.537  -5.958  1.00  0.00           N
ATOM      2  CA  GLY A   1      -2.199  28.399  -6.617  1.00  0.00           C
ATOM      3  C   GLY A   1      -2.168  29.706  -5.855  1.00  0.00           C
ATOM      4  O   GLY A   2      -3.205  30.358  -5.782  1.00  0.00           O
ATOM      5  H1  GLY A   2      -3.280  26.649  -6.428  1.00  0.00           H

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split file into multiple files based on empty lines

I am using below code to split files based on blank lines but it does not work. awk 'BEGIN{i=0}{RS="";}{x="F"++i;}{print > x;}' Your help would be highly appreciated find attachment of sample.txt file (2 Replies)
Discussion started by: imranrasheedamu
2 Replies

2. Shell Programming and Scripting

Split a single file into multiple files based on a value.

Hi All, I have the sales_data.csv file in the directory as below. SDDCCR; SOM ; MD6546474777 ;05-JAN-16 ABC ; KIRAN ; CB789 ;04-JAN-16 ABC ; RAMANA; KS566767477747 ;06-JAN-16 ABC ; KAMESH; A33535335 ;04-JAN-16 SDDCCR; DINESH; GD6674474747 ;08-JAN-16... (4 Replies)
Discussion started by: ROCK_PLSQL
4 Replies

3. Shell Programming and Scripting

Split a big file into multiple files based on first four characters

I have a requirement to split a huge file to smaller text files based on first four characters which look like ABCD 1234 DFGH RREX : : : : : 0000 Each of these records are OF EQUAL bytes with a different internal layout based on the above first digit identifier.. Any help to start... (5 Replies)
Discussion started by: etldev
5 Replies

4. Shell Programming and Scripting

Split single file into multiple files using pattern matching

I have one single shown below and I need to break each ST|850 & SE to separate file using unix script. Below example should create 3 files. We can use ST & SE to filter as these field names will remain same. Please advice with the unix code. ST|850 BEG|PO|1234 LIN|1|23 SE|4 ST|850... (3 Replies)
Discussion started by: prasadm
3 Replies

5. Shell Programming and Scripting

Split a file into multiple files based on field value

Hi, I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values. How can I achieve this Unix Here is the sample data. In this case I have split the files based on date column(c4) Input file c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies

6. Shell Programming and Scripting

Split a file into multiple files based on line numbers and first column value

Hi All I have one query,say i have a requirement like the below code should be move to diffent files whose maximum lines can be of 10 lines.Say in the below example,it consist of 14 lines. This should be moved logically using the data in the fisrt coloumn to file1 and file 2.The data of first... (2 Replies)
Discussion started by: sarav.shan
2 Replies

7. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

8. UNIX for Dummies Questions & Answers

Split single file into n number of files

Hi, I am new to unix. we have a requirement here to split a single file into multiples files based on the number of people available for processing. So i tried my hand at writing some code as below. #!/bin/bash var1=`wc -l $filename` var2=$var1/$splitno split -l $var2 $1 Please help me... (6 Replies)
Discussion started by: quirkguy
6 Replies

9. Shell Programming and Scripting

Split the single file lines into multiple files

Let's assume that I have a file name called ‘A' and it has 100 lines in it and would like to split these 100 lines into 4 files as specified bellow. INPUT: Input file name A 1 2 3 4 5 6 7 8 9 ........100 Output: 4 output files (x,y,z,w) File x should contains (Skip 4 lines)... (15 Replies)
Discussion started by: subbarao25
15 Replies

10. UNIX for Dummies Questions & Answers

split a single sql file into multiple files

Hi,I have a single sql file containing many create table ddl's.Example: CREATE TABLE sec_afs ( rpt_per_typ_c char(1) NOT NULL, rpt_per_typ_t varchar(20) NULL, LOCK ALLPAGES go EXEC sp_primarykey 'sec_afs', rpt_per_typ_c go GRANT SELECT ON sec_afs TO developer_read_only... (5 Replies)
Discussion started by: smarter_aries
5 Replies
Login or Register to Ask a Question