Unix/Linux Go Back    


UNIX for Beginners Questions & Answers If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

How to split a column based on |?

UNIX for Beginners Questions & Answers


Tags
cut, split

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
BioBing BioBing is offline
Registered User
 
Join Date: Jun 2017
Last Activity: 26 June 2017, 4:51 AM EDT
Posts: 2
Thanks: 6
Thanked 0 Times in 0 Posts
How to split a column based on |?

Hi all,

Newbie here, so please bear over with my stupid question Linux

I have used far too long time today on figuring this out, so I hope that someone here can help me move on.

I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a column with GIs and REFs (I still want keep the rest of the columns - just split the IDs)

This is an example of a row in my data set:

Code:
TRINITY_DN17272_c0_g1_i1	gi|242003970|ref|XP_002422928.1|	57.5	127	54	0	2	382	65	191	1.2e-41	176.0

What I want:

Code:
TRINITY_DN17272_c0_g1_i1	242003970   XP_002422928.1   57.5	127	54	0	2	382	65	191	1.2e-41	176.0

I have tried the following:

Code:
cut -d'|' -f2,4 File.m8 | tr '|' '\t'

It splits perfectly - too perfectly because I loose everything else besides the two columns:

Code:
242003970   XP_002422928.1

Can someone please help me out?

Thanks! Birgitte
Sponsored Links
    #2  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 November 2017, 11:37 AM EST
Location: Aachen, Germany
Posts: 11,644
Thanks: 321
Thanked 3,618 Times in 3,323 Posts
Your target field separator is not clear - do you want the two elements to be each a field of its own with a <TAB> between them, or both making up one field, separated by a space or two?
For the first case, try

Code:
awk '{split ($2, T, "|"); $2 = T[2] OFS T[4]}1' OFS="\t" file
TRINITY_DN17272_c0_g1_i1    242003970    XP_002422928.1    57.5    127    54    0    2    382    65    191    1.2e-41    176.0


Last edited by RudiC; 06-19-2017 at 11:32 AM..
The Following User Says Thank You to RudiC For This Useful Post:
BioBing (06-20-2017)
Sponsored Links
    #3  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 November 2017, 9:27 AM EST
Posts: 1,673
Thanks: 49
Thanked 641 Times in 601 Posts

Code:
perl -pae '$F[1] = join "\t", (split /\|/, $F[1])[1,3] and $_ = join "\t", @F' BioBing.example

The Following User Says Thank You to Aia For This Useful Post:
BioBing (06-20-2017)
    #4  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
Corona688 Corona688 is offline Forum Staff  
Mead Rotor
 
Join Date: Aug 2005
Last Activity: 21 November 2017, 3:22 PM EST
Location: Saskatchewan
Posts: 22,518
Thanks: 1,154
Thanked 4,273 Times in 3,946 Posts
A less brute-force method:

Code:
tr '|' '\t' < infile > outfile

The Following User Says Thank You to Corona688 For This Useful Post:
BioBing (06-20-2017)
Sponsored Links
    #5  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 November 2017, 9:27 AM EST
Posts: 1,673
Thanks: 49
Thanked 641 Times in 601 Posts
Quote:
Originally Posted by Corona688 View Post
A less brute-force method:

Code:
tr '|' '\t' < infile > outfile

Hi Corona688,

The OP requires more than just substituting every pipe symbol for a tab.
The Following User Says Thank You to Aia For This Useful Post:
BioBing (06-20-2017)
Sponsored Links
    #6  
Old Unix and Linux 06-19-2017   -   Original Discussion by BioBing
rbatte1 rbatte1 is offline Forum Staff  
Root armed
 
Join Date: Jun 2007
Last Activity: 21 November 2017, 12:04 PM EST
Location: Lancashire, UK
Posts: 3,373
Thanks: 1,456
Thanked 665 Times in 598 Posts
If you columns are fixed, then perhaps you could:-
Code:
tab=$(printf "\t")
tr '|' "$tab" < input_file | cut -d "$tab" -f1,3,5,7-

Does that help? What I'm trying is to convert every | to a tab and then cut the fields you want separating on a tab character. I hope it does what you wanted. I've set the variable tab just to make it clear. Feel free to replace it as it works best for you.




Kind regards,
Robin
The Following User Says Thank You to rbatte1 For This Useful Post:
BioBing (06-20-2017)
Sponsored Links
    #7  
Old Unix and Linux 06-20-2017   -   Original Discussion by BioBing
Aia's Unix or Linux Image
Aia Aia is offline
Registered User
 
Join Date: May 2008
Last Activity: 20 November 2017, 9:27 AM EST
Posts: 1,673
Thanks: 49
Thanked 641 Times in 601 Posts
Quote:
Originally Posted by BioBing View Post
Hi all,

Newbie here, so please bear over with my stupid question Linux

I have used far too long time today on figuring this out, so I hope that someone here can help me move on.

I have some annotation data for a transcriptome where I want to split a column containing NCBI accession IDs into a column with GIs and REFs (I still want keep the rest of the columns - just split the IDs)

This is an example of a row in my data set:

Code:
TRINITY_DN17272_c0_g1_i1	gi|242003970|ref|XP_002422928.1|	57.5	127	54	0	2	382	65	191	1.2e-41	176.0

What I want:

Code:
TRINITY_DN17272_c0_g1_i1	242003970   XP_002422928.1   57.5	127	54	0	2	382	65	191	1.2e-41	176.0

I have tried the following:

Code:
cut -d'|' -f2,4 File.m8 | tr '|' '\t'

It splits perfectly - too perfectly because I loose everything else besides the two columns:

Code:
242003970   XP_002422928.1

Can someone please help me out?

Thanks! Birgitte
Since gi and ref seems to be always there, perhaps another suggestion:

Code:
perl -pe 's/\|?(?:\s+gi|ref)?\|\s*/\t/g' file

Output:

Code:
TRINITY_DN17272_c0_g1_i1        242003970       XP_002422928.1  57.5    127     54      02382     65      191     1.2e-41 176.0

The Following User Says Thank You to Aia For This Useful Post:
BioBing (06-20-2017)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Split 1 column into numerous columns based on patterns mmab UNIX for Dummies Questions & Answers 11 10-13-2015 08:33 AM
Split file based on a column/field value galaxy_rocky Shell Programming and Scripting 6 09-01-2014 03:50 AM
awk to sum a column based on duplicate strings in another column and show split totals prashob123 Shell Programming and Scripting 5 01-10-2014 01:39 PM
Split file based on column radius UNIX for Dummies Questions & Answers 10 04-30-2013 11:48 PM
Split the file based on column sol_nov Shell Programming and Scripting 6 01-16-2013 06:41 PM



All times are GMT -4. The time now is 12:40 PM.