[Solved] Insert tabs as delimiter


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting [Solved] Insert tabs as delimiter
# 1  
Old 01-27-2014
[Solved] Insert tabs as delimiter

Hello all,

I have an unstructured file with space as delimiter , which I want to structure.

The output file should actually have only 5 columns with tab as delimiter.
The 4th column can have only 3 values ( biological_process , cellular_component , molecular_function )

Here is how the input structure is (with only space as delimiter).
Col 1,2 and 4 are just one word, col 3 and 5 can have multiple words delimited by space. The keywords in col 4 viz. biological_process , cellular_component , molecular_function separate col 3 from col 5.

Sample input
Code:
goterm1_oneword gene1_oneword some description multiple words biological_process some description multiple words
goterm2_oneword gene1_oneword some description biological_process some description multiple words
goterm2_oneword gene1_oneword some description multiple words cellular_component  some description multiple words

Output format with tab delimiter, I have placed a [tab] where there should be an actual
Code:
\t

instead of space.
Code:
goterm1_oneword[tab]gene1_oneword[tab]some description multiple words[tab]biological_process[tab]some description multiple words
goterm2_oneword[tab]gene1_oneword[tab]some description multiple words[tab]biological_process[tab]some description multiple words
goterm2_oneword[tab]gene1_oneword[tab]some description multiple words[tab]cellular_component[tab]some description multiple words

# 2  
Old 01-27-2014
Code:
tr ' ' '\t' < inputfile

# 3  
Old 01-27-2014
Quote:
Originally Posted by anbu23
Code:
tr ' ' '\t' < inputfile

This is not as simple as replacing space with tab, col 3 and col 5 have spaces within.

Anything between col2 and col4 is col3... and anything after col4 is col5..if this makes is easier to understand.
# 4  
Old 01-27-2014
Hello,

Following may help you in same.

Code:
awk '{$2=" "$2} {$3=" "$3" "$4" "$5" "$6} {$4==""} {$5==""} {$6==""} gsub(/ /,"\t",$0);{print}' file_name

Output wil be as follows.


Code:
goterm1_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   biological_process      some  description      multiple        words
goterm1_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   biological_process      some  description      multiple        words
goterm2_oneword         gene1_oneword           some    description     biological_process      some    description     biological_process      some    description   multiple words
goterm2_oneword         gene1_oneword           some    description     biological_process      some    description     biological_process      some    description   multiple words
goterm2_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   cellular_component      some  description      multiple        words
goterm2_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   cellular_component      some  description      multiple        words



EDIT:

please let us know the expected Output with proper info it will help us more to understand the same please.



Thanks,
R. Singh

Last edited by RavinderSingh13; 01-27-2014 at 01:27 PM..
# 5  
Old 01-27-2014
Hi Ravinder,

Col 4 should be determined by the keywords biological_process , cellular_component , molecular_function , without that it is not possible to isolate col3 because it can have variable number of words.

---------- Post updated at 01:46 PM ---------- Previous update was at 01:28 PM ----------

Code:
goterm1_oneword gene1_oneword description of gene biological_process this should be col 5 

goterm2_oneword gene1_oneword gene description biological_process col 5 

goterm2_oneword gene1_oneword describing the gene product multiple words cellular_component  col 5 this is

Output

Code:
goterm1_oneword[tab]gene1_oneword[tab]description of gene[tab]biological_process[tab]this should be col 5 

goterm2_oneword[tab]gene1_oneword[tab]gene description[tab]biological_process[tab]col 5 

goterm2_oneword[tab]gene1_oneword[tab]describing the gene product multiple words[tab]cellular_component[tab]col 5 this is

---------- Post updated at 01:48 PM ---------- Previous update was at 01:46 PM ----------

Quote:
Originally Posted by RavinderSingh13
Hello,

Following may help you in same.

Code:
awk '{$2=" "$2} {$3=" "$3" "$4" "$5" "$6} {$4==""} {$5==""} {$6==""} gsub(/ /,"\t",$0);{print}' file_name

Output wil be as follows.


Code:
goterm1_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   biological_process      some  description      multiple        words
goterm1_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   biological_process      some  description      multiple        words
goterm2_oneword         gene1_oneword           some    description     biological_process      some    description     biological_process      some    description   multiple words
goterm2_oneword         gene1_oneword           some    description     biological_process      some    description     biological_process      some    description   multiple words
goterm2_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   cellular_component      some  description      multiple        words
goterm2_oneword         gene1_oneword           some    description     multiple        words   description     multiple        words   cellular_component      some  description      multiple        words

EDIT:

please let us know the expected Output with proper info it will help us more to understand the same please.



Thanks,
R. Singh
I have modified the input and output if that helps understand better, thanks
# 6  
Old 01-27-2014
something along these lines...
awk -f rita.awk myFile
rita.awk:
Code:
BEGIN {
   OFS="\t"
   nk=split("biological_process cellular_component molecular_function", t, FS)
   for(i in t)
     keys[t[i]]
}
{
   f3=frest=""
   for(i=3; !($i in keys); i++)
      f3=(f3)?f3 FS $i:$i

   key=$i

   for(i=i+2; i<=NF; i++)
      frest=(frest)?frest FS $i:$i

   print $1, $2, f3, key, frest
}

This User Gave Thanks to vgersh99 For This Post:
# 7  
Old 01-27-2014
The code is working perfectly for a few lines the exits with the error

Code:
awk: rita.awk:9:  (FILENAME=GOdesc_switchgrass_forAgriGO_sortedongenes.txt FNR=471) fatal:  attempt to access field -2147483648

how do I identify FNR=471 ?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert a value between two empty delimiter in the file.

Would like to insert value between two empty delimiter and at the very last too if empty. $ cat customerleft.tbl 300|Customer#000000300|I0fJfo60DRqQ|7|17-165-193-5964|8084.92|\N|p fluffily among the slyly express grouches. furiously express instruct||||||||||||||||||||||||\N... (3 Replies)
Discussion started by: Mannu2525
3 Replies

2. Shell Programming and Scripting

Insert a new column with sequence number (Delimiter as comma)

Hi All, I have a file which has data like a,b c,d e,f g,h And I need to insert a new column at the begining with sequence no( 1 to n) 1,a,b 2,c,d 3,e,f 4,g,h Please let me know how to acheive this in unix (3 Replies)
Discussion started by: weknowd
3 Replies

3. UNIX for Dummies Questions & Answers

[Solved] How to swap PIPE seperator delimiter?

I have file like below 1|4|OR|OLAP|INT|INT||CONSTANT|2012/08/07|9999/12/31|0|0|0|0|PRL|-358.1684563||||||||||36522|55791|LNR| 2|4|OR|OLAP|CLR|CLR||CONSTANT|2012/09/07|9999/12/31|0|0|0|0|PRL|-358.1684563||||||||||36522|57891|REGS|... (2 Replies)
Discussion started by: gkskumar
2 Replies

4. Shell Programming and Scripting

[Solved] How to use delimiter

Hi, I am using below script to get the below given output. But i am wondering how to pick the names from below output. Script: echo "dis ql(*) cluster(CT.CL.RIBRSBT3)"| runmqsc CT.QM.701t8|egrep QUEUE|sed -e 's/QUEUE(/ /'|sed -e 's/)/ /' Output: ... (10 Replies)
Discussion started by: darling
10 Replies

5. Shell Programming and Scripting

Selecting Specific Columns and Insert the delimiter TAB

Hi, I am writing a Perl Script for the below : I have a data file that consists of the header information which is 231 Lines and the footer information as 4 lines. The total number of line including the header and footer 1.2 Million with Pipe Delimited file. For example: Header Information:... (4 Replies)
Discussion started by: filter
4 Replies

6. Shell Programming and Scripting

insert delimiter

I have a data file that I would like to add delimiters to. Example: Turn This: 20110624000744000693000704000764 Into This: 20110624,000744,000693,000704,000764 I found this link but the only issue is I do not know how many colums I will have. The first field would needs to be 8... (9 Replies)
Discussion started by: oldman2
9 Replies

7. Shell Programming and Scripting

replace spaces/tabs with delimiter |

Hi, I'm looking for a command that replaces spaces/tabs with pipe symbol and store the result to the same file instead of routing it to another file. infile outfile Thanks. (11 Replies)
Discussion started by: dvah
11 Replies

8. UNIX for Advanced & Expert Users

Insert Delimiter at fixed locations in a flat file

Hi Can somebody help me with solution for this PLEASE? I have a flat file and need to insert delimiters at fixed positions in all the lines so that I can easily convert into EXCEL with columns defined as per their width. For Example Here is the file { kkjhdhal sdfewss sdtereetyw... (7 Replies)
Discussion started by: jd_mca
7 Replies

9. Shell Programming and Scripting

Insert Tabs / Indent text

Hi, i need replace the slash (/) with a newline (\n) and a tab (\t). With 'find -type f' in a folder i got this output: ./1999/01/file1 ./1999/01/file2 ./1999/02/file1 ./2000/04/file1 ./2000/04/file2 ./2000/04/file3 ./2000/04/file4 ./2000/06/file1 ./2000/06/file2 ./2000/06/file3... (8 Replies)
Discussion started by: Tonda
8 Replies

10. Shell Programming and Scripting

Insert text between delimiter

Can someone help me on this? I'm creating an Insert stmt script but Oracle does not accept blanks values. How can I insert the word null between two commas? I'm guessing awk or sed. Is there a good post or site with easy to understand info on awk and sed? I'm really new to unix scripts :D ... (5 Replies)
Discussion started by: ystee
5 Replies
Login or Register to Ask a Question