Hello guys,
First of all happy holidays and happy new year.
I'm new in bioinformatic and also it is my first time that I write in this forum. Therefore, sorry if I make some mistakes.
I'm writing to ask your help to fix a problem:
I have a file like this:
I have to modify the file in order to have an output like this:
Could anyone help me to modify the file, please?
Thank you to all for your help in advance
Sal
Moderator's Comments:
edit by bakunin: Please use CODE-tags not only for code but also for data and terminal output. Thank you.
Yes. My suggestion is to enter "column" and "row" as keywords into the advanced search form, select "Search titles only" (directly under the keywords) and hit <ENTER>. You will be presented the same plethora of hits as i was presented with because we had this question over and over again. If there are still questions left you are welcome back to ask them.
Hi Bakunin,
thank you for your suggestion. However, I have alredy search in different discussions here. There many examples but I did not find something helpul for me.
Could you suggest a specific discussun that I can read deeply, please?
Thank you for replay me
There many examples but I did not find something helpul for me.
This is because you didn't describe your problem well enough. I mean, you know what you have and you know what you would like to get out of it, but as far as i can see you haven't thought through what it would take to get from here to there. There is noprogramming involved, just plain thinking:
You have this:
and want to get this:
Now, first step is: which lines of the outcome are correlated to which input? Obvious this line:
accounts for these:
I suppose the reason why "gene3", "gene3" and "gene5" are missing from what you showed as output is that it is simply a sample and you did cut somewhere - yes? Or are there filters in place you haven't told us about? If so, which ones?
Now, concentrating on transforming the one line, what did we find:
1) the input line has a first field (like "gene1", "gene2", etc.), which should show up as first part of the respective output line(s). The field is delimited by the start of the line and the first tab character, if i interpret your data correctly.
2) The second field consists of several sub-fields which are delimited by pipe characters ("|"). For every such sub-field there should be a separate line in the output with the first field and the respective sub-field.
If this is correct as i described it the necessary code to implement it already "springs out" of that, no? So try that and show your efforts, then we will go over what you have written and - if necessary - correct it. Some questions you should answer for yourself, though, just to know if you have to guard against such possibilities in your code:
a) Could there be input lines with no second fields, like "gene2" here:
And, if yes, what do you want to do with them?
b) will the sub-fields in the second field always be of this form ("G"-"O"-":" plus 6 digits 0-9) or might there be something else, like:
If yes, what should be done with these?
c) could there be "double entries" like one of these:
Again, if yes: what do you want to do with these?
d) What about long lines? Might it happen that the line has so many sub-fields that it is broken into the next line like this:
I am sure you get what this aims at and you surely know your data better than me, so maybe some of my points are moot - but it pays to make oneself aware of the point being moot. So sit down, analyse your problem, try to write some code and show it here. We gladly help, but we help you help yourself, we won't do your work for you.
On top of what bakunin suggested, you'll find quite some threads dealing with similar topics and giving you ideas / starting points at the lower left of this page under "More UNIX and Linux Forum Topics You Might Find Helpful". Esp. this one comes close to a solution to your problem.
Hi guys,
First of all I would say thank you to both. I'm sorry if I didn't explain well the problem and of course I've thought about it, but it was not easy explain it fully. I try to better explain the problem and what I've done.
I read what RudiC suggested, and I studied the following code on my table:
After that my file changed from its native form:
To:
Hence, each time that I have a field (in the second column of my table, separated by |) > 1, it prints the row for the maximum number of field.
This a part of what I was looking for and I was happy to got it. However, I should have in the second column, only one field "each time"
Indeed the expected results should look like this:
Anyway I m grateful to both.
Last edited by RudiC; 12-29-2018 at 05:23 PM..
Reason: Added (a few) CODE tags.
Hi,
I have data in form of
A ram
B shyam
C seeta
D geeta
A bob
B methew
C Richad
D Mike
and i want it in this form.
A B C D
ram shyam seeta geeta
bob methew Richard Mike.
please help by providing the scripting for this. (3 Replies)
Gents,
Transpose from row to column, taking in consideration the first column, which contends the date.
Input file
72918,111000009,111000009,111000009,111000009,111000009,111000009,111000009,111000009,111000009
72918,2356,2357,2358,2359,2360,2361,2362,2363,2364
72918,0,0,0,0,0,0,0,0,0... (12 Replies)
I'm using the testawk.awk from the following thread
https://www.unix.com/shell-programming-and-scripting/18897-row-column-transpose.htmlI'm getting the following output
fieldname1 data1
fieldname2 data2
fieldname3 data3
How can I get like this instead
1 fieldname1 data1
2 fieldname2 data2... (1 Reply)
Hi All,
In shell, I have below data coming from some some text file as below:
. 351706 5861.8 0.026 0.012 12.584 0.026 0.012 12.582 0.000 0.000 0.000
Now i need the above data to be transposed as below
351706... (16 Replies)
Hi there,
I have a small csv file example below:
source,cu_001,cu_001_volume,cu_001_mass,cu_002,cu_002_volume,cu_002_mass,cu_003,cu_003_volume,cu_003_mass
ja116,1.33,3024000,9374400,1.54,3026200,9375123,1.98,3028000,9385512
I want to transpose columns to rows starting at the second... (3 Replies)
Hi Folks,
Iam a kinda newbie to unix shell scripting, the scenario is i have a text file containing the following info
Charlie chicago 15
Charlie newyork 26
jonny chicago 14
jonny newyork 15
joe chicago 15
joe newyork 18output should be
Name chicago ... (3 Replies)
Hi there,
Below is sample three rows which i need transpose into multiple rows.
By keeping first 2 fields static and split them into multiple rows depend following date field. Each into seperate rows.
Sample code:
... (6 Replies)
Hi All,
I have been trying to transpose rows to column in an large file (about 15000 lines) between matching pattern. Searched all posts in this forum, but not able find the solution to my problem. Any help appreciated.!!
Input
/*------XXXXXX-------*/
owner: XXXX
location: XXXX... (3 Replies)
Hi i have a file which has values seperated by "," as shown below and I want to transpose for every doc_id in one row.
Input:
DOC_ID,KEYWORD
105,REGISTROS
105,GEOLOGIA
105,NUCLEOS
105,EXPEDIENTE
105,PROGRAMAS
10025,EXPEDIENTE
10025,LOCALIZACIONES
10025,OFICIOS
10025,PROGRAMAS... (4 Replies)