Text columns processing using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text columns processing using awk
# 1  
Old 07-17-2014
Text columns processing using awk

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I'm trying to build an awk statement to print from a file (file1):


Code:
A    1,2,3    *    
 A    4,5,6     **
 B    1    
 B    4,5    *

Another file like this:
Code:
 A    1,2,3    *    3    1    0.333
 A    4,5,6     **    3    2    0.666
 B    1        1    0    0
 B    4,5    *    2    1    0.5

In this new file, the first three columns are the same as in the original file. The forth column must contain the number of comma separated elements in column 2. The fifth column must contain the number of characters in column 3. The last column contains the proportion of column 5 on column 4.


I'm trying the following code:
Code:
awk  -F ',' '{print $1"\t"$2"\t"$3"\t"NF-1"\t"length($3)"\t"(length($3)/ NF-1)}' file1 > file2

But the output is unexpected (it seems that the second column is splitted, and thus all the calculations are wrong).
Code:
  
 ~$ cat file2 
 A    1    2    3    *        2    4    0.333333 
 A    4    5    6     **    2    5    0.666667 
 B    1                0    0    -1 
 B    4    5    *        1    0    -1


Thank you for your help.

EDIT
P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I actually fixed one of my errors, but I still have no idea why the fourth column doesn't print correctly. Here's my new code and output:

Code:
awk '{print $1"\t"$2"\t"$3"\t"(NF","$2 -1)"\t"length($3)"\t"(length($3)/(NF","$2-1))}' file1 > file2

Code:
:~$ cat file2 
 A	1,2,3	*	3,0	1	0.333333 
 A	4,5,6	**	3,3	2	0.666667 
 B	1		2,0	0	0 
 B	4,5	*	3,3	1	0.333333


Last edited by dovah; 07-17-2014 at 08:27 AM.. Reason: code review
# 2  
Old 07-17-2014
Since in the second example, where the FS is set to default, NF has no relation with the number of comma separated elements in $2 and $2-1 is unlikely to do what you want..

Compare:
Code:
$ awk '{print $2, $2-1}' file
1,2,3 0
4,5,6 3
1 0
4,5 3

It is probably best to use split() with a comma as separator on $2 to get the fields that you want..
# 3  
Old 07-17-2014
P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }
Code:
awk '{split($2,a,",")}' file1; awk '{OFS="\t"; print $1, $2, $3, length(a), length($3), length($3)/length($2)}' file1 > file2


I tried this but I get as output:
Code:
:~$ cat file2
A    1,2,3    *    0    1    0.2
A    4,5,6    **    0    2    0.4
B    1        0    0    0
B    4,5    *    0    1    0.333333




Close, but not correct. Column 4 has zeros.
# 4  
Old 07-17-2014
Why are you using two separate awk statements? The first awk has no output and thus no meaning. You should integrate the two..
# 5  
Old 07-17-2014
PRE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }PRE.ctl { font-family: "Lohit Hindi",monospace; }P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { }
Code:
awk '{l2=split($2,a,","); OFS="\t"; print $1, $2, $3, l2, length($3), length($3)/l2}'

This works. Thanks for the tips!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk for text processing

Hi,my file is in this format ", \"symbol\": \"Rbm38\" } ]" I want to convert it to a more user readable format _id pubmed text symbol 67196 18667844 Overexpression of UBE2T in NIH3T3 cells significantly promoted colony formation in mouse cell cultures Ube2t 56190 21764855 ... (3 Replies)
Discussion started by: biofreek
3 Replies

2. Shell Programming and Scripting

Text processing using awk

I dispose of two tab-delimited files (the first column is the primary key): File 1 (there are multiple rows sharing the same key, I cannot merge them) A 28,29,30,31 A 17,18,19 B 11,13,14,15 B 8,9File 2 (there is one only row beginning with a given key) A 2,8,18,30,31 B ... (3 Replies)
Discussion started by: dovah
3 Replies

3. Shell Programming and Scripting

awk - print columns with text and spaces

Hi, I'm using awk to print columns from a tab delimited text file: awk '{print " "$2" "$3" $6"}' file The problem I have is column 6 contains text with spaces etc which means awk only prints the first word. How can I tell awk to print the whole column content as column 6? Thanks, (10 Replies)
Discussion started by: keenboy100
10 Replies

4. Shell Programming and Scripting

How to concatenate 2-columns by 2 -columns for a text file?

Hello, I want to concatenate 2-columns by 2-columns separated by colon. How can I do so? For example, I have a text file containing 6 columns separated by tab. I want to concatenate column 1 and 2; column 3 and 4; column 5 and 6, respectively, and put a colon in between. input file: 1 0 0 1... (10 Replies)
Discussion started by: huiyee1
10 Replies

5. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

6. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

7. UNIX for Dummies Questions & Answers

How to convert text to columns in tab delimited text file

Hello Gurus, I have a text file containing nearly 12,000 tab delimited characters with 4000 rows. If the file size is small, excel can convert the text into coloumns. However, the file that I have is very big. Can some body help me in solving this problem? The input file example, ... (6 Replies)
Discussion started by: Unilearn
6 Replies

8. Shell Programming and Scripting

Awk text processing

Hi Very much appreciate if somebody could give me a clue .. I undestand that it could be done with awk but have a limited experience. I have the following text in the file 1 909 YES NO 2 500 No NO . ... 1 ... (8 Replies)
Discussion started by: zam
8 Replies

9. Shell Programming and Scripting

awk, perl Script for processing a single line text file

I need a script to process a huge single line text file: The sample of the text is: "forward_inline_item": "Inline", "options_region_Australia": "Australia", "server_event_err_msg": "There was an error attempting to save", "Token": "Yes", "family": "Family","pwd_login_tab": "Enter Your... (1 Reply)
Discussion started by: hmsadiq
1 Replies

10. Shell Programming and Scripting

text processing ( sed/awk)

hi.. I have a file having record on in 1 line.... I want every 400 characters in a new line... means in 1st line 1-400 in 2nd line - 401-800 etc pl help. (12 Replies)
Discussion started by: clx
12 Replies
Login or Register to Ask a Question