Using columns from 2 files and extracting string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using columns from 2 files and extracting string
# 22  
Old 11-04-2011
The output is incorrect.
It should be GTT but it gives GTC . Please see the following record in the files.

file1$4=file2$1=SNPSTER1_0001:7:32:86:1332#0/1
file2$10 = GATTTATCTTGTTCCTCTGCAGCAGGTTGTCCAGAT

file 2$6=32M4S


s1=0 , no leading S
s2=4 ignore AGAT
m1=32 , GATTTATCTTGTTCCTCTGCAGCAGGTTGTCC
m2=0

In this case file1$2=5868
file1$8=5893
file1$9=5896


file2$4=5869

Here (file2$4+s1+m1)=5901 > file1$9=5896

So we choose m1=GATTTATCTTGTTCCTCTGCAGCAGGTTGTCC for subtring operation, anyway m2 is absent


answer = substring (GATTTATCTTGTTCCTCTGCAGCAGGTTGTCC,26,3) = GTT
# 23  
Old 11-04-2011
This code below should take care of this:

Code:
awk 'NR == FNR {
  c = x
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    p1 == "M" && !c++ && t[$1, "S"]++ 
    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)
    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1], f2m1[$1])
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1], f2m2[$1])
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' file2_sample.txt file1_sample.txt

If you look at the code and try to understand how it works, you should be able to debug it yourself ...

Last edited by radoulov; 11-04-2011 at 12:59 PM..
This User Gave Thanks to radoulov For This Post:
# 24  
Old 11-04-2011
I had to do a couple of more adjustments, it is a great learning experience. Thanks a ton radoulov.
# 25  
Old 11-05-2011
Hi radoulov,

Would you please help me debug this case?
The value of f2s2[$1] is printing to be 0 when it should be 1
for $6=1S5M145N29M1S. So f2m2s[$1] is off by f2s2[$1]
it is not ignoring the last character in $10, this is happening in all
cases where there is both a leading and a trailing S. In all other cases
f2s2[$1] is working just fine.
I also checked the while loop that populates f2, s2=1 is being successfully passed.
I have attached the record.

Thanks,
Alpesh



I used this code.
Code:
awk 'NR == FNR {
  c=x
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    p1 == "M" && !c++ && t[$1, "S"]++ 

    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)

    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1] +1 , f2m1[$1])
  print f2m1s[$1]
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1] +1 , f2m2[$1])
  print f2m2s[$1]
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' f2.txt f1.txt

# 26  
Old 11-05-2011
Yes,
that is another bug Smilie
Try swapping p1 == "M" and !c++:

Code:
awk 'NR == FNR {
  c=x
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    !c++ && p1 == "M" && t[$1, "S"]++ 

    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)

    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1] +1 , f2m1[$1])
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1] +1 , f2m2[$1])
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' f2.txt f1.txt

This is the code with debug statements that I've used:

Code:
awk 'NR == FNR {
  c=x
  #debug
  print "debug:", $6
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    !c++ && p1 == "M" && t[$1, "S"]++ 

    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)

    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1] +1 , f2m1[$1])
  #debug
  for (i in f2)
    print "debug: f2:" i, f2[i]
  print "debug: f2s1[$1]", f2s1[$1]
  print "debug: f2s2[$1]", f2s2[$1]
  print "debug: f2m1[$1]", f2m1[$1]
  print "debug: f2m2[$1]", f2m2[$1]
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1] +1 , f2m2[$1])
  print "debug: f2m2s[$1]", f2m2s[$1]
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' f2.txt f1.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900 (18 Replies)
Discussion started by: venkat_reddy
18 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

4. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

5. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

6. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

7. Shell Programming and Scripting

extracting columns from 2 files

Hello, I have 2 files file1 & file2 = a1 b1 a2 b2 a3 b3 ... = c1 d1 c2 d2 c3 d3 ... I need to compare if b(i)=c(j) . i,j=1,2,3,4,... If yes, right a(i) d(j) in output file3 per line (1 Reply)
Discussion started by: newpromo
1 Replies

8. Shell Programming and Scripting

Append string to columns from 2 files

Hi Having a file as follows file1.txt Date (dd/mm)Time Server IP Error Code =========================================================================== 10/04/2008 10:10 ServerA xxx.xxx.xxx.xxx 6 10/04/2008 10:10 ServerB ... (3 Replies)
Discussion started by: karthikn7974
3 Replies

9. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

10. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies
Login or Register to Ask a Question