Awk: Append new elements to an array


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk: Append new elements to an array
# 1  
Old 04-11-2014
Awk: Append new elements to an array

Hi all,

I'm dealing with a bash script to merge the elements of a set of files and counting how many times each element is present. The last field is the file name.

Sample files:
Code:
head -5 *.tab

==> 3J373_P15Ac1y2_01_LS.tab <==
Code:
chr1    1956362 1956362 G       A       hom     3J373_P15Ac1y2_01_LS.tab
chr1    1957037 1957037 T       C       hom     3J373_P15Ac1y2_01_LS.tab
chr1    1960926 1960926 T       C       hom     3J373_P15Ac1y2_01_LS.tab
chr1    17359676        17359676        C       A       hom     3J373_P15Ac1y2_01_LS.tab
chr1    17371152        17371152        T       C       het     3J373_P15Ac1y2_01_LS.tab

==> 7D300_P15Ac1y2_01_GATK.tab <==
Code:
chr1    1956362 1956362 G       A       het     7D300_P15Ac1y2_01_GATK.tab
chr1    1957037 1957037 T       C       het     7D300_P15Ac1y2_01_GATK.tab
chr1    1959107 1959107 G       C       het     7D300_P15Ac1y2_01_GATK.tab
chr1    1959699 1959699 G       A       het     7D300_P15Ac1y2_01_GATK.tab
chr1    17359676        17359676        C       A       hom     7D300_P15Ac1y2_01_GATK.tab
.
.
.

Up to several dozens of files...

Here is my code:

Code:
cat *.tab \
    | awk 'BEGIN {FS="\t";OFS="\t"} {s[$1":"$2"-"$3";"$4"/"$5]=$0; c[$1":"$2"-"$3";"$4"/"$5]++} END {for (i in s) print i,c[i],$7}' \
    | sort -V \
    > CommonVariants.bed

Output file:
Code:
cat CommonVariants.bed

Code:
chr1:1956362-1956362;G/A    36    7D300_P15Ac1y2_01_LS.tab
chr1:1957037-1957037;T/C    36    7D300_P15Ac1y2_01_LS.tab
chr1:1957112-1957112;C/T    2    7D300_P15Ac1y2_01_LS.tab
chr1:1959107-1959107;G/C    2    7D300_P15Ac1y2_01_LS.tab
chr1:1959138-1959138;G/C    2    7D300_P15Ac1y2_01_LS.tab
chr1:1959549-1959549;G/A    2    7D300_P15Ac1y2_01_LS.tab
chr1:1959699-1959699;G/A    4    7D300_P15Ac1y2_01_LS.tab
chr1:1959789-1959789;A/G    3    7D300_P15Ac1y2_01_LS.tab
chr1:1960674-1960674;C/T    6    7D300_P15Ac1y2_01_LS.tab
chr1:1960926-1960926;T/C    18    7D300_P15Ac1y2_01_LS.tab
chr1:1961144-1961144;C/T    2    7D300_P15Ac1y2_01_LS.tab
chr1:1961408-1961408;C/T    6    7D300_P15Ac1y2_01_LS.tab
chr1:1961466-1961466;C/T    2    7D300_P15Ac1y2_01_LS.tab
chr1:17359676-17359676;C/A    36    7D300_P15Ac1y2_01_LS.tab

I can create the index and count the lines. However I can't figure out how to append the file names into the $7 column.
I guess I have to replace "$7" with an array in the awk statement, but this is too much for me.

I really appreciate any help.

Thank you in advance

Last edited by Scrutinizer; 04-11-2014 at 08:40 AM.. Reason: Additional code tags
# 2  
Old 04-11-2014
Code:
cat *.tab \
    | awk 'BEGIN {FS="\t";OFS="\t"} {s[$1":"$2"-"$3";"$4"/"$5]=$0; c[$1":"$2"-"$3";"$4"/"$5]++; a[$1":"$2"-"$3";"$4"/"$5] = $7} END {for (i in s) print i,c[i],a[i]}' \
    | sort -V \
    > CommonVariants.bed

# 3  
Old 04-11-2014
Code:
chr1:1956362-1956362;G/A

is present in both 3J373_P15Ac1y2_01_LS.tab and 7D300_P15Ac1y2_01_GATK.tab, yet the output specifies: 7D300_P15Ac1y2_01_LS.tab. How does that work?
# 4  
Old 04-11-2014
Thank you SriniShoo, but I think I didn't explain it propperly. I need the name of all files where the index was found.

Exprected output:
Code:
chr1:1959138-1959138;G/C    2    7D300_P15Ac1y2_01_LS.tab, 3H682_P15Ac1y2_01_LS.tab
chr1:1959549-1959549;G/A    2    7D300_P15Ac1y2_01_LS.tab, 3H682_P15Ac1y2_01_LS.tab
chr1:1959699-1959699;G/A    4    7D300_P15Ac1y2_01_LS.tab, 3H682_P15Ac1y2_01_LS.tab, 3J188_P15Ac1y2_01_LS.tab, 3J270_P15Ac1y2_01_GATK.tab
chr1:1959789-1959789;A/G    3    7D300_P15Ac1y2_01_LS.tab, 3H682_P15Ac1y2_01_LS.tab, 3J188_P15Ac1y2_01_LS.tab

Thank you again


---------- Post updated at 02:08 PM ---------- Previous update was at 02:05 PM ----------

Scrutinizer, that's my problem. It always displays the file name of the last file where it found the index.
# 5  
Old 04-11-2014
Try something like:
Code:
awk '
  {
    i=$1":"$2"-"$3";"$4"/"$5
    c[i]++
  } 
  !P[i,$7]++ {
    F[i]=F[i] (F[i]?", ":x) $7
  } 
  END {
    for (i in c) print i,c[i],F[i]
  }
' FS='\t' OFS='\t' *.tab

Single line (too long) version:
Code:
awk '{i=$1":"$2"-"$3";"$4"/"$5; c[i]++} !P[i,$7]++{F[i]=F[i] (F[i]?", ":x) $7} END{for (i in c) print i,c[i],F[i]}' FS='\t' OFS='\t' *.tab


Last edited by Scrutinizer; 04-11-2014 at 09:29 AM..
# 6  
Old 04-11-2014
SOLVED

It works!

Awesome! Brilliant! Wonderful!

Thank you so much!!!

---------- Post updated at 03:44 PM ---------- Previous update was at 03:22 PM ----------

what does it means?
Code:
!P[i,$7]++{F[i]=F[i] (F[i]?", ":x) $7}

# 7  
Old 04-11-2014
You are welcome...

Code:
!P[i,$7]++                   # If an array element consisting of both i and $7 does not yet exist, 
                             # then ...  The first time this array element does not exist so 
                             # the negation becomes true. The second time the array
                             # is >0 so the negation becomes 0 (false)...
                             # This is a way to add a filename only for the first time to F[i]

F[i]=F[i] (F[i]?", ":x) $7   # append $7 to F[i] but put a field separator ( ", " ) in between if 
                             # F[i] already exists...


Last edited by Scrutinizer; 04-11-2014 at 11:03 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Get unique elements from Array

I have an array code and output is below: echo $1 while read -r fline; do echo "%%%%%%$fline%%%%%" fmy_array+=("$fline") done <<< "$1" Output: CR30903 YU0007 SRIL CR30903 Yogesh SRIL %%%%%%CR30903 YU0007 SRIL%%%%% %%%%%%CR30903 Yogesh SRIL%%%%% ... (8 Replies)
Discussion started by: mohtashims
8 Replies

2. Shell Programming and Scripting

Help reading the array and sum of the array elements

Hi All, need help with reading the array and sum of the array elements. given an array of integers of size N . You need to print the sum of the elements in the array, keeping in mind that some of those integers may be quite large. Input Format The first line of the input consists of an... (1 Reply)
Discussion started by: nishantrefound
1 Replies

3. Shell Programming and Scripting

Append awk results into file or array

for a in {1..100} do awk '{ sum+=$a} END {print sum}' a=$a file1 > file2 done I know I will get only one number if following the code above, how can I get 100 sum numbers in file2? (2 Replies)
Discussion started by: wanliushao
2 Replies

4. Shell Programming and Scripting

Match elements in an AWK multi-dimensional array

Hello, I have two files in the following format; file1: A B C D E F G H I J K L file2: 1 2 3 4 5 6 7 8 9 10 11 12 I have read them both in to multi-dimensional arrays. I need a file that has column 2 of the first file printed out for each column 3 of the second file ie... ... (3 Replies)
Discussion started by: cold_Que
3 Replies

5. Shell Programming and Scripting

printing array elements inside AWK

i just want to dump my array and see if it contains the values i am expecting. It should print as follows, ignore=345fht ignore=rthfg56 . . . ignore=49568g Here is the code. Is this even possible to do? please help termReport.pl < $4 | dos2ux | head -2000 | awk ' BEGIN... (0 Replies)
Discussion started by: usustarr
0 Replies

6. Shell Programming and Scripting

AWK help: how to compare array elements against a variable

i have an array call ignore. it is set up ignore=34th56 ignore=re45ty ignore=rt45yu . . ignore=rthg34 n is a variable. I have another variable that i read from a different file. It is $2 and it is working the way i expect. array ignore read and print correct values. in the below if... (2 Replies)
Discussion started by: usustarr
2 Replies

7. Shell Programming and Scripting

awk - array elements as condition

Hi, can I use array elements ( all ) in conditional statements? the problem is ,the total number of elements is not known. e.g A is an array with elements - 1,2,3 now if i want to test if the 1 st field of input record is either 1,2 or 3, i can do something like this if ( $1 ~... (1 Reply)
Discussion started by: shellwell
1 Replies

8. Shell Programming and Scripting

Accessing single elements of a awk array in END

How do I access one of the indices in array tst with the code below? tst=sprintf("%5.2f",Car / 12) When I scan thru the array with for ( i in tst ) { print i,tst } I get the output of: vec-7 144 But when I try this in the END print tst It looks like it's not set. What am... (6 Replies)
Discussion started by: timj123
6 Replies

9. Shell Programming and Scripting

how to append into array thru awk

hey champs, i have variable as field_dtls, which has values like CLIENT ID|FAMILY NAME|MIDDLE NAME|FIRST NAME|COUNTRY NAME|ADDRESS|NATIONAL ID|PASSPORT NUMBER so, echo "$field_dtls" CLIENT ID|FAMILY NAME|MIDDLE NAME|FIRST NAME|COUNTRY NAME|ADDRESS|NATIONAL ID|PASSPORT NUMBER but i... (2 Replies)
Discussion started by: manas_ranjan
2 Replies

10. Shell Programming and Scripting

To return the elements of array

Hi, Please can someone help to return the array elements from a function. Currently the problem I face is that tempValue stores the value in myValue as a string while I need an array of values to be returned instead of string. Many Thanks, Sudhakar the function called is: ... (5 Replies)
Discussion started by: Sudhakar333
5 Replies
Login or Register to Ask a Question