Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

How to split a column based on |?

UNIX for Beginners Questions & Answers


Tags
cut, split

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 5 Days Ago
BioBing BioBing is offline
Registered User
 
Join Date: Jun 2017
Last Activity: 20 June 2017, 3:14 AM EDT
Posts: 2
Thanks: 6
Thanked 0 Times in 0 Posts
How to split a column based on |?

Hi all,

Newbie here, so please bear over with my stupid question Linux

I have used far too long time today on figuring this out, so I hope that someone here can help me move on.

I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a column with GIs and REFs (I still want keep the rest of the columns - just split the IDs)

This is an example of a row in my data set:

Code:
TRINITY_DN17272_c0_g1_i1	gi|242003970|ref|XP_002422928.1|	57.5	127	54	0	2	382	65	191	1.2e-41	176.0

What I want:

Code:
TRINITY_DN17272_c0_g1_i1	242003970   XP_002422928.1   57.5	127	54	0	2	382	65	191	1.2e-41	176.0

I have tried the following:

Code:
cut -d'|' -f2,4 File.m8 | tr '|' '\t'

It splits perfectly - too perfectly because I loose everything else besides the two columns:

Code:
242003970   XP_002422928.1

Can someone please help me out?

Thanks! Birgitte
Sponsored Links
    #2  
Old Unix and Linux 4 Days Ago
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 24 June 2017, 9:52 AM EDT
Location: Aachen, Germany
Posts: 10,953
Thanks: 277
Thanked 3,367 Times in 3,100 Posts
Your target field separator is not clear - do you want the two elements to be each a field of its own with a <TAB> between them, or both making up one field, separated by a space or two?
For the first case, try

Code:
awk '{split ($2, T, "|"); $2 = T[2] OFS T[4]}1' OFS="\t" file
TRINITY_DN17272_c0_g1_i1    242003970    XP_002422928.1    57.5    127    54    0    2    382    65    191    1.2e-41    176.0


Last edited by RudiC; 4 Days Ago at 10:32 AM..
The Following User Says Thank You to RudiC For This Useful Post:
BioBing (4 Days Ago)
Sponsored Links
    #3  
Old Unix and Linux 4 Days Ago
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 June 2017, 6:17 PM EDT
Posts: 1,633
Thanks: 46
Thanked 622 Times in 583 Posts

Code:
perl -pae '$F[1] = join "\t", (split /\|/, $F[1])[1,3] and $_ = join "\t", @F' BioBing.example

The Following User Says Thank You to Aia For This Useful Post:
BioBing (4 Days Ago)
    #4  
Old Unix and Linux 4 Days Ago
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 23 June 2017, 6:44 PM EDT
Location: Saskatchewan
Posts: 22,254
Thanks: 1,105
Thanked 4,197 Times in 3,883 Posts
A less brute-force method:

Code:
tr '|' '\t' < infile > outfile

The Following User Says Thank You to Corona688 For This Useful Post:
BioBing (4 Days Ago)
Sponsored Links
    #5  
Old Unix and Linux 4 Days Ago
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 June 2017, 6:17 PM EDT
Posts: 1,633
Thanks: 46
Thanked 622 Times in 583 Posts
Quote:
Originally Posted by Corona688 View Post
A less brute-force method:

Code:
tr '|' '\t' < infile > outfile

Hi Corona688,

The OP requires more than just substituting every pipe symbol for a tab.
The Following User Says Thank You to Aia For This Useful Post:
BioBing (4 Days Ago)
Sponsored Links
    #6  
Old Unix and Linux 4 Days Ago
rbatte1 rbatte1 is offline Forum Staff  
Root armed
 
Join Date: Jun 2007
Last Activity: 23 June 2017, 11:03 AM EDT
Location: Lancashire, UK
Posts: 3,141
Thanks: 1,316
Thanked 600 Times in 542 Posts
If you columns are fixed, then perhaps you could:-
Code:
tab=$(printf "\t")
tr '|' "$tab" < input_file | cut -d "$tab" -f1,3,5,7-

Does that help? What I'm trying is to convert every | to a tab and then cut the fields you want separating on a tab character. I hope it does what you wanted. I've set the variable tab just to make it clear. Feel free to replace it as it works best for you.




Kind regards,
Robin
The Following User Says Thank You to rbatte1 For This Useful Post:
BioBing (4 Days Ago)
Sponsored Links
    #7  
Old Unix and Linux 4 Days Ago
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 June 2017, 6:17 PM EDT
Posts: 1,633
Thanks: 46
Thanked 622 Times in 583 Posts
Quote:
Originally Posted by BioBing View Post
Hi all,

Newbie here, so please bear over with my stupid question Linux

I have used far too long time today on figuring this out, so I hope that someone here can help me move on.

I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a column with GIs and REFs (I still want keep the rest of the columns - just split the IDs)

This is an example of a row in my data set:

Code:
TRINITY_DN17272_c0_g1_i1	gi|242003970|ref|XP_002422928.1|	57.5	127	54	0	2	382	65	191	1.2e-41	176.0

What I want:

Code:
TRINITY_DN17272_c0_g1_i1	242003970   XP_002422928.1   57.5	127	54	0	2	382	65	191	1.2e-41	176.0

I have tried the following:

Code:
cut -d'|' -f2,4 File.m8 | tr '|' '\t'

It splits perfectly - too perfectly because I loose everything else besides the two columns:

Code:
242003970   XP_002422928.1

Can someone please help me out?

Thanks! Birgitte
Since gi and ref seems to be always there, perhaps another suggestion:

Code:
perl -pe 's/\|?(?:\s+gi|ref)?\|\s*/\t/g' file

Output:

Code:
TRINITY_DN17272_c0_g1_i1        242003970       XP_002422928.1  57.5    127     54      02382     65      191     1.2e-41 176.0

The Following User Says Thank You to Aia For This Useful Post:
BioBing (4 Days Ago)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Split 1 column into numerous columns based on patterns mmab UNIX for Dummies Questions & Answers 11 10-13-2015 07:33 AM
Split file based on a column/field value galaxy_rocky Shell Programming and Scripting 6 09-01-2014 02:50 AM
awk to sum a column based on duplicate strings in another column and show split totals prashob123 Shell Programming and Scripting 5 01-10-2014 12:39 PM
Split file based on column radius UNIX for Dummies Questions & Answers 10 04-30-2013 10:48 PM
Split the file based on column sol_nov Shell Programming and Scripting 6 01-16-2013 05:41 PM



All times are GMT -4. The time now is 09:58 AM.