Split command


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Split command
# 1  
Old 03-13-2013
Split command

Hi I have a sequence which looks like this
Code:
# PH01000000
PH01000000G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
PH01000001G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
PH01278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
PH01278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

i want to split the first colum into 2 columns seperating first 10 bits as column 1 and then remainnig as column 2 and retain the remaining columns as it is.

Code:
PH01000000  G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
PH01000001  G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
PH01278028   G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
PH01278104   G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

i am doing this becoz i want to modify the first column and after modification i want to merge again.
So is it possible to first split the 1st column into 2 and then after my modification merge them again?

What command can i use to split and merge them

Last edited by Scott; 03-13-2013 at 01:23 PM.. Reason: Please use code tags, and a meaningful thread title
# 2  
Old 03-13-2013
One way would be
Code:
sed 's:.:& :10' file

to split, and
Code:
 sed 's: ::' file

to merge again.
This User Gave Thanks to RudiC For This Post:
# 3  
Old 03-13-2013
What makes you think you need to split the 1st column before modifying it?

Why not just modify the 1st 10 characters on the line instead of splitting, modifying the 1st 10 characters on the line, and merging?
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 03-13-2013
my required result is
Code:
string0G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
string1G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
string278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
string278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

So if i need this to happen,I need to replace the entries of this format
PH01 by string in first column directly
but if i do it

the entries of
PH01278028G0010 will become string 278028G0001 as per my requirement
but my entries of
PH01000000G0240 will look like string000000G0240 which i want as string0G0240
so i thought i will split from 10 bits n do selective replace only on the first column

Is my approach too run around the situation?
thanks between for your feedback!!
Smilie
# 5  
Old 03-13-2013
siya,
Your description of what you are trying to do is not at all clear. Looking at the "required result" in message #4 in this thread, I'm guessing that you want to replace PH01 immediately followed by up to four zeros with string. If that is what you want, the following awk script will do that for you:
Code:
awk 'match($1, /^PH010{0,4}/) {
        $1 = "string" substr($1, RLENGTH+1)
}
1' input

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.

If the file input contains the data specified in message #1 in this thread, the output is:
Code:
string0G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
string1G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
string278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
string278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

which matches what you specified in message #4 in this thread.
This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 03-13-2013
Hi,
Sorry for the confusion!!

I want to basically convert ONLY the first column of my entire sequence
from

[
Code:
B]PH0100000[/B]0G0240                   to                       string0G0240
PH01000001G0190                   to                       string1G0190
PH01000002G0120                   to                       string2G0120

,....
....

PH01270000G0010                   to                       string270000G0010   
PH01278028G0014                   to                       string278028G0014   
PH012781040010                     to                       string278104G0010


With respect to code,why does it have {0,4 }in initial part?

I dint understand the part in code : awk 'match($1, /^PH010{0,4}/)
Please do advise.
ThanksSmilie

Last edited by siya@; 03-13-2013 at 06:50 PM..
# 7  
Old 03-13-2013
Quote:
Originally Posted by siya@
Hi,
Sorry for the confusion!!

I want to basically convert ONLY the first column of my entire sequence
from

PH01000000G0240 to string0G0240
PH01000001G0190 to string1G0190
PH01000002G0120 to string2G0120

,....
....

PH01270000G0010 to string270000G0010
PH01278028G0014 to string278028G0014
PH012781040010 to string278104G0010


With respect to code,why does it have {0,4 }in initial part?

I dint understand the part in code : awk 'match($1, /^PH010{0,4}/)
Please do advise.
ThanksSmilie
Apparently my script didn't work for you. That is because you won't describe in English the transformation that is to be performed. I explained in my last post what the script I gave you would do. And, it made all of the transformations your 5 examples showed.

But, it will not insert the G shown in red in your new example. That G did not appear at all in the 1st string whether or not we would break it into an initial 10 character field and a 2nd field with the remaining characters, or left it as a single field.

PLEASE explain in English what you are trying to do instead of giving a small set of inconsistent examples!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with Split Command

Hi All, I have a txt file which I would like to partition into 2 separate output files. I would like to partition the odd or even groups of 4 lines from the txt file. So I would like lines 1-4 to go to file1, and lines 5-8 to go to file2, and so on until the whole txt file is divided into two... (1 Reply)
Discussion started by: landrjos
1 Replies

2. UNIX for Beginners Questions & Answers

Urgent..!!Split command

Hi All, I want to split the file after size gets above 100kb. So I am using below command. split -b 100kb File.txt Test But after first file, my record is breaking as in middle of the record, size of file is getting above 100kb. So after splitting half record is in one file and half... (1 Reply)
Discussion started by: Amey Dixit
1 Replies

3. Shell Programming and Scripting

Want to split awk command

Hi, There is an awk command in script and it is running successfully. I want to split that command in 2 lines. I have tried using '\' but its not working.. Please suggest me the solution. (11 Replies)
Discussion started by: Sanket Dalvi
11 Replies

4. UNIX for Dummies Questions & Answers

change split with another command

hi all, i have problem with my script in unix ...i have script with split -d (--numeric-suffixes) in linux its working but in solaris machine the option -d isn't have so how to i can change split -d (this output) will same in output solaris can i change with awk and how do that thx before (2 Replies)
Discussion started by: zvtral
2 Replies

5. UNIX for Advanced & Expert Users

split command

./myapp | split -b 10m -d -a 1 - "myappLog" here split command is reading the input from the output of myapp and it will write the text in to file where in each file size is 10MB and it will create upto 10 files. I have observed split is flushing the data for every 4096 bytes. if my... (7 Replies)
Discussion started by: arv600
7 Replies

6. UNIX for Dummies Questions & Answers

filenames from split command

Is there an option or a way with the split command to rename the partitioned files with a counter. For example, can the files testaa, testab, testac be renamed to test1, test2, test3 from the split command without explicilty renaming files. Thanks, - CB (3 Replies)
Discussion started by: ChicagoBlues
3 Replies

7. UNIX for Advanced & Expert Users

Split Command in Perl

Hi, I have to split a line of the form 1232423#asdf#124324#54534#dcfg#wert#rrftt#4567 into an array in perl. I am using @fields; @fields=split('#',$line); if($fields eq "1") But this is not working. By using the syntax, the statements in "if" are never executed. Please help.... (9 Replies)
Discussion started by: rochitsharma
9 Replies

8. UNIX for Advanced & Expert Users

Split Command options

HI! All iam using Split command to split a large .txt file in to smaller files, The syntax iam using split -25000 Product.txt iam getting four output files but not in .txt format but in some other format , when i checked the properties the Type of the output files is Type can any... (7 Replies)
Discussion started by: mohdtausifsh
7 Replies

9. Shell Programming and Scripting

Split command

Can anyone tell me what this command will do? split -b$SPLITSIZE - $file1 < $file2 Will it split file1 or file2? Please explain. Malay (1 Reply)
Discussion started by: malaymaru
1 Replies

10. UNIX for Dummies Questions & Answers

Problem in split command

I want to split a file containing millions of records. I am issuing the command split -l 20000 filename which will split the file in 20K records each. It works fine except in some files, data after one particular field is lost( the field with space). Say the record is ... (4 Replies)
Discussion started by: superprogrammer
4 Replies
Login or Register to Ask a Question