Array V-Lookup using UNIX bash


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Array V-Lookup using UNIX bash
# 1  
Old 04-28-2018
Question Array V-Lookup using UNIX bash

Hey everyone,

I am trying to extract column values from a column in a tab-delimited text file and overlay them in a 2nd tab-delimited text file using a V-lookup type script in Unix bash.

These are the 1st few rows of the 1st input file IN1:

Code:
rsid    chromosome    position    allele1    allele2
rs4471    1    82154    T    T
rs313    1    752721    G    G
rs125    1    768448    G    G
rs112    1    798959    G    G
rs668    1    800007    C    C
rs497    1    838555    A    A
rs447    1    846808    T    C
rs753    1    854250    A    G
rs1330    1    861808    G    G
rs111    1    873558    T    G
rs171    1    887162    T    T


These are the 1st rows of the 2nd input file IN2:

Code:
rsid    chromosome    position    alleles
rs4471    1    734462    AA
rs125    1    752721    AG
rs497    1    760998    CC
rs1330    1    776546    AA



Need to bring in IN2 column 4 values into IN1 based on column 1 values, however, the IN2 is smaller than IN1

If IN2 is missing 1st column value of IN1, then retain value in IN1. In this example there is no value for rs313, rs112, rs668, etc, so their original values in IN1 are retained.


Desired output, OUT:

Code:
rsid    chromosome    position    allele1    allele2
rs4471    1    82154    A    A
rs313    1    752721    G    G
rs125    1    768448    A    G
rs112    1    798959    G    G
rs668    1    800007    C    C
rs497    1    838555    C    C
rs447    1    846808    T    C
rs753    1    854250    A    G
rs1330    1    861808    A    A
rs111    1    873558    T    G
rs171    1    887162    T    T



I have highlighted the rows for which values were found in IN2. I'm using Ubuntu 16.04.

Thanks in advance.
# 2  
Old 04-28-2018
Hi, try:
Code:
awk '
  NR==1 {
    next
  } 

  NR==FNR {
    A[$1]=substr($4,1,1)
    B[$1]=substr($4,2)
    next
  } 

  $1 in A {
    $4=A[$1]
    $5=B[$1]
  }
  1
' OFS='\t' IN2 IN1

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 04-28-2018
Thanks Scrutinizer, works beautifully !
# 4  
Old 04-28-2018
Try
Code:
awk 'NR == FNR {REP[$1] = $4; next} $1 in REP && FNR > 1 {$4 = substr (REP[$1], 1, 1); $5 = substr ( REP[$1], 2, 1)} 1' OFS="\t" file2 file1
rsid    chromosome    position    allele1    allele2
rs4471   1    82154     A    A
rs313    1    752721    G    G
rs125    1    768448    A    G
rs112    1    798959    G    G
rs668    1    800007    C    C
rs497    1    838555    C    C
rs447    1    846808    T    C
rs753    1    854250    A    G
rs1330   1    861808    A    A
rs111    1    873558    T    G
rs171    1    887162    T    T

This User Gave Thanks to RudiC For This Post:
# 5  
Old 04-28-2018
Hello Geneanalys,

Could you please try following and let me know if this helps you. (tested with GNU awk)
Code:
awk 'FNR==NR{num=split($NF,array,"");for(i=1;i<=num;i++){val=val?val OFS array[i]:array[i]};a[$1]=val;$NF="";b[$1]=$0;val="";next} FNR>1{printf("%s",$1 in a?b[$1] a[$1] OFS val ORS:$0 ORS);next} 1'  in2  in1 | column -t

Thanks,
R. Singh

Last edited by RavinderSingh13; 04-28-2018 at 04:39 PM..
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 04-28-2018
Quote:
Originally Posted by Scrutinizer
Hi, try:
Code:
awk '
  NR==1 {
    next
  } 

  NR==FNR {
    A[$1]=substr($4,1,1)
    B[$1]=substr($4,2)
    next
  } 

  $1 in A {
    $4=A[$1]
    $5=B[$1]
  }
  1
' OFS='\t' IN2 IN1


Scrutinizer,

Could you please post explanation for the commands.

---------- Post updated at 03:36 PM ---------- Previous update was at 03:28 PM ----------

Scrutinizer or RudiC,

Is there a way to insert a counter to see the number of rows for which there were values in IN2?
# 7  
Old 04-28-2018
Try
Code:
awk 'NR == FNR {REP[$1] = $4; next} $1 in REP && FNR > 1 {$4 = substr (REP[$1], 1, 1); $5 = substr ( REP[$1], 2, 2); CNT++} 1; END {print CNT}' OFS="\t" file2 file1

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash lookup matching digits for secong file

In the bash below the user selects the file to be used. The digits of each file are unique and used to automatically locate the next file to be used in the process. The problem I can not seem to fix is that the full path needs to be referenced in the second portion and it is not currently. Is... (7 Replies)
Discussion started by: cmccabe
7 Replies

2. Shell Programming and Scripting

Bash arrays: rebin/interpolate smaller array to large array

hello, i need a bit of help on how to do this effectively in bash without a lot of extra looping or massive switch/case i have a long array of M elements and a short array of N elements, so M > N always. M is not a multiple of N. for case 1, I want to stretch N to fit M arrayHuge H = (... (2 Replies)
Discussion started by: f77hack
2 Replies

3. Shell Programming and Scripting

IP Address LookUp Bash Script

I am new to bash scripting. I want write a script that reads from the first argument file and run nslookup, then prints out each nslookup. Something like below: File name = ip 8.8.8.8 8.8.4.4 Bash shell script: nslookup.sh #!/bin/bash for i in $1 do nslookup $i done I... (7 Replies)
Discussion started by: boldnbeautiful
7 Replies

4. Shell Programming and Scripting

Bash 3.2 - Array / Regex - IF 3rd member in array ends in 5 digits then do somthing...

Trying to do some control flow parsing based on the index postion of an array member. Here is the pseudo code I am trying to write in (preferably in pure bash) where possible. I am thinking regex with do the trick, but need a little help. pesudo code if == ENDSINFIVEINTS ]]; then do... (4 Replies)
Discussion started by: briandanielz
4 Replies

5. UNIX for Dummies Questions & Answers

Lookup in Unix

Hi, I have an input file which contain below records: a,1 b,2 c,3 a,10 b,34 i have a reference file which contains below records: a,AA b,BB c,CC My required output is : (3 Replies)
Discussion started by: pandeesh
3 Replies

6. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

7. UNIX for Advanced & Expert Users

Clueless about how to lookup and reverse lookup IP addresses under a file!!.pls help

Write a quick shell snippet to find all of the IPV4 IP addresses in any and all of the files under /var/lib/output/*, ignoring whatever else may be in those files. Perform a reverse lookup on each, and format the output neatly, like "IP=192.168.0.1, ... (0 Replies)
Discussion started by: choco4202002
0 Replies

8. UNIX for Dummies Questions & Answers

lookup in unix

i am having one file which is of CSV format with two fields client_id and client_nbr.The sample data is lke below ABC,1250 CDE,1520 EFG,1000 PQR,1800 The client nbr for these clients change frequently.So i want to create one lookup file every week for the changed client and run a script... (3 Replies)
Discussion started by: dr46014
3 Replies

9. Shell Programming and Scripting

lookup in unix

Hi All I have got a fixed length file of 80bytes long.The first 4bytes of each record represents a client_number.I need to modify the client number based on another lookup file. The lookup file contains 2 fields and a comma delimited file.The first line of the lookup file contains the header... (5 Replies)
Discussion started by: dr46014
5 Replies

10. UNIX for Dummies Questions & Answers

Unix 8.2 and reverse Lookup

We have Unix configured as our external DNS, forward DNS is working properly, however Reverse lookup is not working. Any idea what the problem is? I have checked the named.boot and .rev file and everything seems to be correctly. However it appears that the reverse zone file in the named.boot... (2 Replies)
Discussion started by: cassy
2 Replies
Login or Register to Ask a Question