Search id from second file and append in first


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search id from second file and append in first
# 1  
Old 02-10-2014
Search id from second file and append in first

Hello,

I want to search a string/substring from the second column in file in another file and append the first found record in second file to the end of the record in the first file. Both files are tab delimited.

All lines with KOG in col13 do not need to be searched as it will not be found.

Here the logic in my head which needs to translate into code.

In all lines which does not contain the keyword 'KOG' in the column 13 of file1

Extract substring for searching:when second column has values starting with
sp| or tr| ,, example when value is sp|P32770|NRP1_YEAST , string to be searched is P32770..
when value is tr|N1PNC6|N1PNC6_MYCP1 , string to be searched is N1PNC6

if the second column does not start with sp| or tr| ... it has values like NP_001059837 or AEW46684 then the entire string needs to be searched.

If searched string is found, append the entire first matching line in file2 to the end of the corresponding record in file1 after a tab.


A lot of the second column values repeat, so it would be good if the search for the same value is done only once. For example AEW46684 occurs in the second column of file1 119 times, so searching it just once might save computation with these huge files.

Since there are many columns in the samples I have attached sample inputs and output.
# 2  
Old 02-10-2014
What have you tried?
# 3  
Old 02-10-2014
Code:
awk '$13!~KOG && FNR == NR { OFS="\t"; f[++j] = $0; next } { for(i = 1; i <= j; i++) if(index(f[i], $2)) { $15 = f[i] ; break } print}'  file2_samp.txt file1_samp.txt

This is what I have tried, but its not working and lacks extracting the sub string capability
# 4  
Old 02-10-2014
Not very clean though but try this.
You can optimize it to some extend depending on which ever file you think might be bigger.
The highlighted part is what you were asking for, the substring logic.

Code:
awk 'NR==FNR{
    if($14 ~ /KOG/){
      k=""
    }else if($2 ~ /^sp|^tr/){
      split($2, arr, "|")
      k=arr[2]
    }
    else k=$2
    data[++i]=$0; key[i]=k
    next
  }
  {
    for(j=1;j<=i;j++)
      if(key[j] && match($0,key[j]))
        data[j]=data[j]"\t"$0
  }
  END{
    for(j=1;j<=i;j++) print data[j]"\n\n"
  }
' file1_samp.txt file2_samp.txt > result1.txt

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 5  
Old 02-11-2014
ahamed, the code has been running for 6 hrs without any output so far, file1 is 29 MB and file2 is 8 GB..is there a way to speed up things? Also I think searching the same string just once will save a lot of time.

still no output, help please?

Last edited by gina.lizar; 02-11-2014 at 01:19 PM..
# 6  
Old 02-11-2014
Did you check the result1.txt?
Anyways, try this

Code:
#!/bin/bash

data[0]=""
key[0]=""
count=0

search_add()
{
  inkey=$1;indata=$2;action=$3
  if [ $action == ADD ]; then
    key[$count]=$inkey
    data[$count]=$indata
    ((count+=1))
    return 0
  elif [ $action == SEARCH ]; then
    found=""
    for((i=0;i<$count;i++))
    do
      if [ "${key[$i]}" == "$inkey" ]
      then
        found=${data[$i]}
        return 0
      fi
    done
  fi
  return 1
}

while read first sec remaining
do
  pat=${sec#*|}; pat=${pat%|*}
  search_add $pat "" SEARCH
  if [ $? -ne 0 ]; then
    found=$( grep -m1 $pat file2_samp.txt )
    search_add $pat "$found" ADD
  fi
  echo -e "$first $sec $remaining\t$found\n" >> result1.txt

done < file1_samp.txt


--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 7  
Old 02-11-2014
Try testing with way smaller but consistent sample files.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search and Replace+append a text in python

Hello all, I have a verilog file as following (part of it): old.v: bw_r_rf16x32 AUTO_TEMPLATE ( 1957 // .rst_tri_en (mem_write_disable), 1958 .rclk (clk), 1959 .bit_wen (dva_bit_wr_en_e), 1960 .din ... (5 Replies)
Discussion started by: Zam_1234
5 Replies

2. Shell Programming and Scripting

Search text and append using SED?

I have file . cat hello.txt Hello World I would like to append a string "Today " so the output is cat hello.txt Hello World Today I dont know which line number does the "Hello World" appears otherwise I could have used the Line number to search and append . (3 Replies)
Discussion started by: gubbu
3 Replies

3. Shell Programming and Scripting

Search for a particular field length and append '0' if less less than 10

Hi, I am new to Unix. Please help me in finding solution for the below scenario. I have a flat file that contains data like 378633410|3013505414|new_378633410|ALBERT|WALLS|378633410|Rew||||||| 351049045|239|new_351049045|JIM|COOK|351049045|Rew|||||||... (6 Replies)
Discussion started by: anandek
6 Replies

4. Shell Programming and Scripting

How to search and append words in the same file using unix scripting file operations

Hi , I have a file myhost.txt which contains below, 127.0.0.1 localhost 1.17.1.5 atrpx958 11.17.10.11 atrpx958zone nsybhost I need to append words only after "atrpx958" like 'myhost' and 'libhost' and not after atrpx958zone. How to search the word atrpx958(which is hostname) only,... (5 Replies)
Discussion started by: gsreeni
5 Replies

5. Shell Programming and Scripting

How to search and append words in a file

Hi , I have a file myhost.txt which contains below, 127.0.0.1 localhost 1.17.1.5 atrpx958 11.17.10.11 atrpx958zone nsybhost I need to append words only after "atrpx958" like 'myhost' and 'libhost' and not after atrpx958zone. How to search the word atrpx958 only in... (2 Replies)
Discussion started by: gsreeni
2 Replies

6. Shell Programming and Scripting

Trying to search for a string and append text only once

Hi I am trying to search for a particular occurrence of a string in a file, and if found, append another string to the end of that line. Here is my file contents: column1 userlist default nowrite=3 output=4 column2 access default nowrite=3 Here is the code: A="user=1... (1 Reply)
Discussion started by: bludhemn
1 Replies

7. UNIX for Dummies Questions & Answers

Grep search and append

I'm sure glad I found this forum... This is my first post, so please be gentle... ;) I tried searching everywhere, but the terminology is so common that I cannot find a solution to my problem. I'm looking for a GREP statement to do the following... I need a search and append function... (5 Replies)
Discussion started by: rmanke
5 Replies

8. Shell Programming and Scripting

Perl search and append new line

Dear All, I want search two lines and append some string in between these lines. Input file tmp,123 ,10:123 tmp,666 ,50:999 tmp,2:19800 5,3:21. tmp,2:19800 55555555 tmp,2:19800 5,3:21.Output should be tmp,123 ,10:123 tmp,666 ,50:999 tmp,2:19800 (4 Replies)
Discussion started by: arvindng
4 Replies

9. Shell Programming and Scripting

Search and Append

All, I stuck up for the logic, how to implement the below thing in script. -Search for <a class="string-array"> line and in next line of search string append with <string>java</string> in a xml file The problem here is, <a class="string-array"> occurs multiple places in the xml. I just... (13 Replies)
Discussion started by: vino_hymi
13 Replies

10. Shell Programming and Scripting

Append text from one file to another based on a search from the end of a document

Hi all, I have output files that are all text files with various different extensions. So, if I submit the input file "job_name.inp", when it finishes I get an output file "job_name.dat". A typical input file looks something like this: $CONTRL SCFTYP=RHF RUNTYP=ENERGY MAXIT=199 MULT=1... (4 Replies)
Discussion started by: marcozd
4 Replies
Login or Register to Ask a Question