How transpose column in a row?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How transpose column in a row?
# 1  
Old 12-28-2018
How transpose column in a row?

Hello guys,
First of all happy holidays and happy new year.
I'm new in bioinformatic and also it is my first time that I write in this forum. Therefore, sorry if I make some mistakes.
I'm writing to ask your help to fix a problem:
I have a file like this:

Code:
gene1	GO:0016491|GO:0055114
gene2	GO:0004665|GO:0006571|GO:0008977|GO:0055114
gene3	GO:0005515
gene4	GO:0016491|GO:0055114
gene5	GO:0004427|GO:0009678|GO:0015992|GO:0016020

I have to modify the file in order to have an output like this:
Code:
gene1.     GO:0016491
gene1      GO:0055114
gene2      GO:0004665
gene2      GO:0006571
gene2      GO:0008977
gene2      GO:0055114


Could anyone help me to modify the file, please?
Thank you to all for your help in advance
Sal

Moderator's Comments:
Mod Comment edit by bakunin: Please use CODE-tags not only for code but also for data and terminal output. Thank you.

Last edited by bakunin; 12-28-2018 at 06:02 AM..
# 2  
Old 12-28-2018
Quote:
Originally Posted by Salvatore_espos
Could anyone help me to modify the file, please?
Yes. My suggestion is to enter "column" and "row" as keywords into the advanced search form, select "Search titles only" (directly under the keywords) and hit <ENTER>. You will be presented the same plethora of hits as i was presented with because we had this question over and over again. If there are still questions left you are welcome back to ask them.

I hope this helps.

bakunin
# 3  
Old 12-28-2018
Hi Bakunin,
thank you for your suggestion. However, I have alredy search in different discussions here. There many examples but I did not find something helpul for me.
Could you suggest a specific discussun that I can read deeply, please?
Thank you for replay me
# 4  
Old 12-28-2018
Quote:
Originally Posted by Salvatore_espos
There many examples but I did not find something helpul for me.
This is because you didn't describe your problem well enough. I mean, you know what you have and you know what you would like to get out of it, but as far as i can see you haven't thought through what it would take to get from here to there. There is noprogramming involved, just plain thinking:

You have this:
Code:
gene1	GO:0016491|GO:0055114
gene2	GO:0004665|GO:0006571|GO:0008977|GO:0055114
gene3	GO:0005515
gene4	GO:0016491|GO:0055114
gene5	GO:0004427|GO:0009678|GO:0015992|GO:0016020

and want to get this:

Code:
gene1.     GO:0016491
gene1      GO:0055114
gene2      GO:0004665
gene2      GO:0006571
gene2      GO:0008977
gene2      GO:0055114

Now, first step is: which lines of the outcome are correlated to which input? Obvious this line:

Code:
gene2	GO:0004665|GO:0006571|GO:0008977|GO:0055114

accounts for these:

Code:
gene2      GO:0004665
gene2      GO:0006571
gene2      GO:0008977
gene2      GO:0055114

I suppose the reason why "gene3", "gene3" and "gene5" are missing from what you showed as output is that it is simply a sample and you did cut somewhere - yes? Or are there filters in place you haven't told us about? If so, which ones?

Now, concentrating on transforming the one line, what did we find:

1) the input line has a first field (like "gene1", "gene2", etc.), which should show up as first part of the respective output line(s). The field is delimited by the start of the line and the first tab character, if i interpret your data correctly.

2) The second field consists of several sub-fields which are delimited by pipe characters ("|"). For every such sub-field there should be a separate line in the output with the first field and the respective sub-field.

If this is correct as i described it the necessary code to implement it already "springs out" of that, no? So try that and show your efforts, then we will go over what you have written and - if necessary - correct it. Some questions you should answer for yourself, though, just to know if you have to guard against such possibilities in your code:

a) Could there be input lines with no second fields, like "gene2" here:

Code:
gene1	GO:0016491|GO:0055114
gene2	
gene3	GO:0005515

And, if yes, what do you want to do with them?

b) will the sub-fields in the second field always be of this form ("G"-"O"-":" plus 6 digits 0-9) or might there be something else, like:

Code:
gene1	GO:0016491|who knows|GO:0055114

If yes, what should be done with these?

c) could there be "double entries" like one of these:

Code:
gene1	GO:0016491|GO:0016491
gene2	GO:0005515
gene2	GO:0005515

Again, if yes: what do you want to do with these?

d) What about long lines? Might it happen that the line has so many sub-fields that it is broken into the next line like this:

Code:
gene1	GO:0016491|GO:0016492|GO:0016493|GO:0016494|GO:0016495|....(a long line)
		GO:0016600|GO:0016601

I am sure you get what this aims at and you surely know your data better than me, so maybe some of my points are moot - but it pays to make oneself aware of the point being moot. So sit down, analyse your problem, try to write some code and show it here. We gladly help, but we help you help yourself, we won't do your work for you.

I hope this helps.

bakunin
# 5  
Old 12-28-2018
Welcome to the forum.



On top of what bakunin suggested, you'll find quite some threads dealing with similar topics and giving you ideas / starting points at the lower left of this page under "More UNIX and Linux Forum Topics You Might Find Helpful". Esp. this one comes close to a solution to your problem.

Last edited by RudiC; 12-28-2018 at 11:50 AM..
This User Gave Thanks to RudiC For This Post:
# 6  
Old 12-29-2018
Hi guys,
First of all I would say thank you to both. I'm sorry if I didn't explain well the problem and of course I've thought about it, but it was not easy explain it fully. I try to better explain the problem and what I've done.
I read what RudiC suggested, and I studied the following code on my table:

Code:
BEGIN { FS = OFS = "|" }
{ for (fld = 1; fld <= NF; fld++) {
  print $0
  }
}

After that my file changed from its native form:
Code:
VIT_AGLc1g09770.1    GO:0016491|GO:0055114
VIT_AGLc9g366030.1    GO:0004665|GO:0006571|GO:0008977|GO:0055114
VIT_AGLc6g304750.1    GO:0005515
VIT_AGLc1g09770.2    GO:0016491|GO:0055114
VIT_AGLc11g42510.1    GO:0004427|GO:0009678|GO:0015992|GO:0016020
VIT_AGLc11g41480.1    GO:0004672|GO:0005524|GO:0006468
VIT_AGLc1g09770.3    GO:0016491|GO:0055114
VIT_AGLc6g304750.3    GO:0005515
VIT_AGLc1g09770.4    GO:0016491|GO:0055114
VIT_AGLc5g276360.1    GO:0004672|GO:0005524|GO:0006468

To:

Code:
VIT_AGLc1g09770.1    GO:0016491|GO:0055114
VIT_AGLc1g09770.1    GO:0016491|GO:0055114
VIT_AGLc9g366030.1    GO:0004665|GO:0006571|GO:0008977|GO:0055114
VIT_AGLc9g366030.1    GO:0004665|GO:0006571|GO:0008977|GO:0055114
VIT_AGLc9g366030.1    GO:0004665|GO:0006571|GO:0008977|GO:0055114
VIT_AGLc9g366030.1    GO:0004665|GO:0006571|GO:0008977|GO:0055114
VIT_AGLc6g304750.1    GO:0005515
VIT_AGLc1g09770.2    GO:0016491|GO:0055114
VIT_AGLc1g09770.2    GO:0016491|GO:0055114


Hence, each time that I have a field (in the second column of my table, separated by |) > 1, it prints the row for the maximum number of field.
This a part of what I was looking for and I was happy to got it. However, I should have in the second column, only one field "each time"
Indeed the expected results should look like this:

Code:
VIT_AGLc1g09770.1    GO:0016491
VIT_AGLc1g09770.1    GO:0055114
VIT_AGLc9g366030.1    GO:0004665
VIT_AGLc9g366030.1    GO:0006571
VIT_AGLc9g366030.1    GO:0008977
VIT_AGLc9g366030.1    GO:0055114
VIT_AGLc6g304750.1    GO:0005515
VIT_AGLc1g09770.2    GO:0016491
VIT_AGLc1g09770.2    GO:0055114

Anyway I m grateful to both.

Last edited by RudiC; 12-29-2018 at 05:23 PM.. Reason: Added (a few) CODE tags.
# 7  
Old 12-29-2018
Code:
awk -F'\t|\\|' '{for(i=2;i<=NF;i++)print $1"\t"$i}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Transpose the Row and column

Hi, I have data in form of A ram B shyam C seeta D geeta A bob B methew C Richad D Mike and i want it in this form. A B C D ram shyam seeta geeta bob methew Richard Mike. please help by providing the scripting for this. (3 Replies)
Discussion started by: ricbha
3 Replies

2. Shell Programming and Scripting

Transpose from row to column using timestamp in first column

Gents, Transpose from row to column, taking in consideration the first column, which contends the date. Input file 72918,111000009,111000009,111000009,111000009,111000009,111000009,111000009,111000009,111000009 72918,2356,2357,2358,2359,2360,2361,2362,2363,2364 72918,0,0,0,0,0,0,0,0,0... (12 Replies)
Discussion started by: jiam912
12 Replies

3. Shell Programming and Scripting

Transpose row to column

I'm using the testawk.awk from the following thread https://www.unix.com/shell-programming-and-scripting/18897-row-column-transpose.htmlI'm getting the following output fieldname1 data1 fieldname2 data2 fieldname3 data3 How can I get like this instead 1 fieldname1 data1 2 fieldname2 data2... (1 Reply)
Discussion started by: makkan
1 Replies

4. Shell Programming and Scripting

To transpose row into column

Hi All, In shell, I have below data coming from some some text file as below: . 351706 5861.8 0.026 0.012 12.584 0.026 0.012 12.582 0.000 0.000 0.000 Now i need the above data to be transposed as below 351706... (16 Replies)
Discussion started by: Anamica
16 Replies

5. Shell Programming and Scripting

Transpose column to row - awk

Hi there, I have a small csv file example below: source,cu_001,cu_001_volume,cu_001_mass,cu_002,cu_002_volume,cu_002_mass,cu_003,cu_003_volume,cu_003_mass ja116,1.33,3024000,9374400,1.54,3026200,9375123,1.98,3028000,9385512 I want to transpose columns to rows starting at the second... (3 Replies)
Discussion started by: theflamingmoe
3 Replies

6. Shell Programming and Scripting

Column to row Transpose

Hi Folks, Iam a kinda newbie to unix shell scripting, the scenario is i have a text file containing the following info Charlie chicago 15 Charlie newyork 26 jonny chicago 14 jonny newyork 15 joe chicago 15 joe newyork 18output should be Name chicago ... (3 Replies)
Discussion started by: tech_frk
3 Replies

7. UNIX for Dummies Questions & Answers

Row to column transpose

Hi there, Below is sample three rows which i need transpose into multiple rows. By keeping first 2 fields static and split them into multiple rows depend following date field. Each into seperate rows. Sample code: ... (6 Replies)
Discussion started by: ganeshd
6 Replies

8. Shell Programming and Scripting

Row to column transpose between same pattern.

Hi All, I have been trying to transpose rows to column in an large file (about 15000 lines) between matching pattern. Searched all posts in this forum, but not able find the solution to my problem. Any help appreciated.!! Input /*------XXXXXX-------*/ owner: XXXX location: XXXX... (3 Replies)
Discussion started by: RobP
3 Replies

9. Shell Programming and Scripting

Transpose column to row

Hi i have a file which has values seperated by "," as shown below and I want to transpose for every doc_id in one row. Input: DOC_ID,KEYWORD 105,REGISTROS 105,GEOLOGIA 105,NUCLEOS 105,EXPEDIENTE 105,PROGRAMAS 10025,EXPEDIENTE 10025,LOCALIZACIONES 10025,OFICIOS 10025,PROGRAMAS... (4 Replies)
Discussion started by: juelillo
4 Replies

10. Shell Programming and Scripting

Row to column transpose

Can we transpose rows to columns? Fields within row are separated by a comma. (10 Replies)
Discussion started by: videsh77
10 Replies
Login or Register to Ask a Question