Split command | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Split command

UNIX for Dummies Questions & Answers


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-13-2013
siya@ siya@ is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 19 June 2014, 11:50 PM EDT
Posts: 23
Thanks: 10
Thanked 0 Times in 0 Posts
Split command

Hi I have a sequence which looks like this

Code:
# PH01000000
PH01000000G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
PH01000001G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
PH01278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
PH01278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

i want to split the first colum into 2 columns seperating first 10 bits as column 1 and then remainnig as column 2 and retain the remaining columns as it is.


Code:
PH01000000  G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
PH01000001  G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
PH01278028   G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
PH01278104   G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

i am doing this becoz i want to modify the first column and after modification i want to merge again.
So is it possible to first split the 1st column into 2 and then after my modification merge them again?

What command can i use to split and merge them

Last edited by Scott; 03-13-2013 at 12:23 PM.. Reason: Please use code tags, and a meaningful thread title
Sponsored Links
    #2  
Old 03-13-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 30 July 2014, 4:55 PM EDT
Location: Aachen, Germany
Posts: 3,936
Thanks: 63
Thanked 935 Times in 887 Posts
One way would be
Code:
sed 's:.:& :10' file

to split, and
Code:
 sed 's: ::' file

to merge again.
The Following User Says Thank You to RudiC For This Useful Post:
siya@ (03-13-2013)
Sponsored Links
    #3  
Old 03-13-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 1:20 AM EDT
Location: San Jose, CA, USA
Posts: 4,199
Thanks: 164
Thanked 1,430 Times in 1,213 Posts
What makes you think you need to split the 1st column before modifying it?

Why not just modify the 1st 10 characters on the line instead of splitting, modifying the 1st 10 characters on the line, and merging?
The Following User Says Thank You to Don Cragun For This Useful Post:
siya@ (03-13-2013)
    #4  
Old 03-13-2013
siya@ siya@ is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 19 June 2014, 11:50 PM EDT
Posts: 23
Thanks: 10
Thanked 0 Times in 0 Posts
my required result is
string0G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
string1G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
string278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
string278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

So if i need this to happen,I need to replace the entries of this format
PH01 by string in first column directly
but if i do it

the entries of
PH01278028G0010 will become string 278028G0001 as per my requirement
but my entries of
PH01000000G0240 will look like string000000G0240 which i want as string0G0240
so i thought i will split from 10 bits n do selective replace only on the first column

Is my approach too run around the situation?
thanks between for your feedback!!
Sponsored Links
    #5  
Old 03-13-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 1:20 AM EDT
Location: San Jose, CA, USA
Posts: 4,199
Thanks: 164
Thanked 1,430 Times in 1,213 Posts
siya,
Your description of what you are trying to do is not at all clear. Looking at the "required result" in message #4 in this thread, I'm guessing that you want to replace PH01 immediately followed by up to four zeros with string . If that is what you want, the following awk script will do that for you:

Code:
awk 'match($1, /^PH010{0,4}/) {
        $1 = "string" substr($1, RLENGTH+1)
}
1' input

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk .

If the file input contains the data specified in message #1 in this thread, the output is:

Code:
string0G0240 P.he_genemodel_v1.0 CDS 120721 121773 . - . ID=PH01000000G0240.CDS;Parent=PH01000000G0240
string1G0190 P.he_genemodel_v1.0 mRA 136867 137309 . - . ID=PH01000001G0190.mRNA;Parent=PH01000001G0190
.............................................
string278028G0010 P.he_genemodel_v1.0 CDS 27 501.. . - . ID=PH01278028G0010;Description="oereed"
string278104G0010 P.he_genemodel_v1.0 CDS 34 171 . - . ID=PH01278104G0010.CDS;Parent=PH01278104G0010

which matches what you specified in message #4 in this thread.
The Following User Says Thank You to Don Cragun For This Useful Post:
siya@ (03-13-2013)
Sponsored Links
    #6  
Old 03-13-2013
siya@ siya@ is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 19 June 2014, 11:50 PM EDT
Posts: 23
Thanks: 10
Thanked 0 Times in 0 Posts
Hi,
Sorry for the confusion!!

I want to basically convert ONLY the first column of my entire sequence
from

PH01000000G0240 to string0G0240
PH01000001G0190 to string1G0190
PH01000002G0120 to string2G0120

,....
....

PH01270000G0010 to string270000G0010
PH01278028G0014 to string278028G0014
PH012781040010 to string278104G0010


With respect to code,why does it have {0,4 }in initial part?

I dint understand the part in code : awk 'match($1, /^PH010{0,4}/)
Please do advise.
Thanks

Last edited by siya@; 03-13-2013 at 05:50 PM..
Sponsored Links
    #7  
Old 03-13-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 31 July 2014, 1:20 AM EDT
Location: San Jose, CA, USA
Posts: 4,199
Thanks: 164
Thanked 1,430 Times in 1,213 Posts
Quote:
Originally Posted by siya@ View Post
Hi,
Sorry for the confusion!!

I want to basically convert ONLY the first column of my entire sequence
from

PH01000000G0240 to string0G0240
PH01000001G0190 to string1G0190
PH01000002G0120 to string2G0120

,....
....

PH01270000G0010 to string270000G0010
PH01278028G0014 to string278028G0014
PH012781040010 to string278104G0010


With respect to code,why does it have {0,4 }in initial part?

I dint understand the part in code : awk 'match($1, /^PH010{0,4}/)
Please do advise.
Thanks
Apparently my script didn't work for you. That is because you won't describe in English the transformation that is to be performed. I explained in my last post what the script I gave you would do. And, it made all of the transformations your 5 examples showed.

But, it will not insert the G shown in red in your new example. That G did not appear at all in the 1st string whether or not we would break it into an initial 10 character field and a 2nd field with the remaining characters, or left it as a single field.

PLEASE explain in English what you are trying to do instead of giving a small set of inconsistent examples!
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
split command arv600 UNIX for Advanced & Expert Users 7 01-12-2010 01:33 PM
filenames from split command ChicagoBlues UNIX for Dummies Questions & Answers 3 11-11-2008 03:03 PM
Split Command in Perl rochitsharma UNIX for Advanced & Expert Users 9 03-09-2008 03:56 AM
Split command malaymaru Shell Programming and Scripting 1 11-18-2005 01:40 AM
Problem in split command superprogrammer UNIX for Dummies Questions & Answers 4 06-06-2005 01:25 AM



All times are GMT -4. The time now is 01:22 AM.