awk split and awk calculation in the same command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk split and awk calculation in the same command
# 1  
Old 03-22-2016
awk split and awk calculation in the same command

I am trying to run the awk below. My question is when I split the input, then run anotherawk to perform a calculation using that splitas the input there are no issues. When I try to combine them the output is not correct, is the split not working or did I do it wrong? Thank you Smilie.

input
Code:
 
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    1    15
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    2    16
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    3    16
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    4    14
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    5    17

after the split awk '{split($5,a,"-"); print $1,$2,$3,$4,a[1]}' input > split

split (uses the - in $5 and prints $1,$2,$3,$4,and the split a[1]

Code:
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN

if I use that file (split) in the below awk the output is correct

output ($5 count of lines that are the same and the sum of $3-$2
Code:
AGRN 5 1100

If I try to perform the split and run the calculation in the same awk, I get the below output:

Code:
awk '{split($5,a,"-"); print $1,$2,$3,$4,a[1]} {c1[a1]++; c2[a1]+=($3-$2)}
>     END{for (e in c1) print e, c1[e], c2[e]}' split

output
Code:
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
 5 1100

# 2  
Old 03-22-2016
use a[1] instead of a1
This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 03-22-2016
Try c1[a[1]] instead of c1[a1]
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 03-25-2016
Thank you both Smilie
# 5  
Old 05-19-2016
awk not calculating the same on input file

The below awk is producing different results since my input changed on some lines from (old input):
I can not seem to fix this and need some expert help Smilie... thank you Smilie.

awk
Code:
awk '{split($5,a,"-"); print $1,$2,$3,$4,a[1]} {c1[a[1]]++; c2[a[1]]+=($3-$2)}
     END{for (e in c1) print e, c1[e], c2[e]}' input

old input
Code:
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    1    15
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    2    16
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    3    16
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    4    14
chr1    955543    955763    chr1:955543-955763    AGRN-6|gc=75    5    17

output -- this is correct
Code:
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
chr1 955543 955763 chr1:955543-955763 AGRN
AGRN 5 1100

using the awk on the attached fille:


out.txt (attachment) -- does not count $5 or subtract each $3-$2 as it did before

Code:
chr7 121738788 121738930 chr7:121738788-121738930 AASS
chr7 121738788 121738930 chr7:121738788-121738930 AASS
chr7 121738788 121738930 chr7:121738788-121738930 AASS
chr7 121741414 121741502 chr7:121741414-121741502 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS
chr7 121769404 121769601 chr7:121769404-121769601 AASS

# 6  
Old 05-19-2016
Seems to work for me - when applied to a small subset of your input.
Code:
.
.
.
chr17 67160204 67160305 chr17:67160204-67160305 ABCA10
ABCA1 1 144
AASS 19 3469
ABCA10 14 1414

# 7  
Old 05-19-2016
On a small subset the awk seems to work, but using the actual input (attached, which is much larger) what the awk seems to be doing is printing the split and then the calculation below it. Do I need to perform the split separate in order not to see it in the output? Or can I print the results of the split in one file then use that file to output the calculations? Thank you Smilie.


Code:
. . . 
chr17 67160204 67160305 chr17:67160204-67160305 ABCA10  -- up to here is split 
ABCA1 1 144 
AASS 19 3469 
ABCA10 14 1414


Last edited by cmccabe; 05-19-2016 at 01:56 PM.. Reason: added details
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk command to split pipe delimited file

Hello, I need to split a pipe de-limited file based on the COLUMN 7 value . If the column value changes I need to split the file Source File Payment|ID|DATE|TIME|CONTROLNUMBER|NUMBER|NAME|INDICATOR 42156974|1137937|10/1/2018|104440|4232|2054391|CARE|1... (9 Replies)
Discussion started by: rosebud123
9 Replies

2. Shell Programming and Scripting

awk calculation with zero as N/A

In the below awk, I am trying to calculate percent for a given id. It is very close the problem is when the # being used in the calculation is zero. I am not sure how to code this condition into the awk as it happens frequently. The portion in italics was an attempt but that lead to an error. Thank... (13 Replies)
Discussion started by: cmccabe
13 Replies

3. Shell Programming and Scripting

awk split command to get the desired result

Dear all, I am using the awk 'split' command to get the particular value. FILE=InputFile_009_0.txt Temp=$(echo $FILE | awk '{split($FILE, a, "e_"); print a}') I would like to have the Temp take the value as : _009_0 ... (4 Replies)
Discussion started by: emily
4 Replies

4. Shell Programming and Scripting

Want to split awk command

Hi, There is an awk command in script and it is running successfully. I want to split that command in 2 lines. I have tried using '\' but its not working.. Please suggest me the solution. (11 Replies)
Discussion started by: Sanket Dalvi
11 Replies

5. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

6. Shell Programming and Scripting

Split a file using awk command.

awk 'FNR == 1 { c = 1 } { print > (f c) } !FNR%n { close(f c); ++c }' n=$files_per_stream f=$input_path/filename_ $input_file $input_file with some records are splitted into files named filename_1,filename_2...etc according to $files_per_stream. Plz help me know how and if anyone has... (7 Replies)
Discussion started by: guptam
7 Replies

7. Shell Programming and Scripting

using awk in perl with split command

Hi, I have an array with following data. First field shows the owner and second is unique name. Now i have to pic the latest value with respect to the date in case of duplicate. like "def" is from two owners "rahul/vineet", now i want the latest from the two and the owner name also for all the... (9 Replies)
Discussion started by: vineet.dhingra
9 Replies

8. Shell Programming and Scripting

Awk command to split file name

Hi I have few files with format access.2Nov-12:15AM. These files will be generated daily . I need to write a script so that if today's date is less than 10 then it has to zip the file and rename it to acess.02Nov-12:15AM.gz .please help me in this . Also please help me in splitting the file... (10 Replies)
Discussion started by: mskalyani9
10 Replies

9. Shell Programming and Scripting

awk calculation

Hallo all, I have a script which creates an output ... see below: root@a7germ:/tmp/pax > cat 20061117.txt 523.047 521.273 521.034 517.367 516.553 517.793 513.114 513.940 I would like to use awk to calculate the (a)total sum of the numbers (b) The average of the numbers. Please... (4 Replies)
Discussion started by: kekanap
4 Replies

10. Shell Programming and Scripting

awk command to split in to 2 files

Hi, I have a problem in grepping a file for 2 strings and writing them to 2 appropriate files. I need to use the awk command and read the file only once and write to the appropriate file. My file is very huge in size and it is taking a long time using cat command and grep command. Can anyone... (3 Replies)
Discussion started by: m_subra_mani
3 Replies
Login or Register to Ask a Question