Using columns from 2 files and extracting string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using columns from 2 files and extracting string
# 15  
Old 10-20-2011
outstanding !! thank you Smilie
# 16  
Old 10-27-2011
Hi All,

Thanks for all the help. I am faced with a little complication now with respect to the last problem.The substring to be extracted must take care of the padded S values(column 6 in file2).
This column has values like 36M, 5S31M , 32M4S etc. The number associated with the leading S must be added to the starting position of the substring in the original string.So for 5S31M, 5 must be added to ($8-$2) of the print statement in the code. 32M4S should be ignored and treated as before, since it does not have a leading S. By leading S, I meant the character 'S' must be present at the beginning of the value of column 6, if the value is 2S30M4S, only the value 2 (associated with the leading S) and not 4 is to be considered for calculations. I hope I`m clear.

Thanks again,
Alpesh

---------- Post updated at 12:03 PM ---------- Previous update was at 09:52 AM ----------
This is what I came up with. Doesn't work, please help debug/change.
Code:
awk 'NR == FNR {
   f2[$1] = $10;
   f2[$2]=$6;next
   }
 $4 in f2 {
awk '{x=f2[$6];gsub(/[0-9]+[^0-9S]/,z);p=$1+0};
   print $0 "\t" substr(f2[$4], $8 - $2 +1 + $p, $9 - $8)
   }'  file2_truncated.txt file1_truncated.txt  | head


Last edited by Franklin52; 10-27-2011 at 03:02 PM.. Reason: Please use code tags for data and code samples, thank you
# 17  
Old 10-28-2011
Hi radoulov and vgersh99,

Can you help me out please? Smilie

Thanks,
Alpesh
# 18  
Old 10-28-2011
not tested
Code:
awk 'NR == FNR {   
    f2[$1] = $10
    f2pad[$1]=(match("^[0-9]+S", $6))?(substr($6,1,RLENGTH-1)):0
    next   
  } 
  $4 in f2 {
      print $0, substr(f2[$4], $8 - $2 + f2pad[$4], $9 - $8)    
}' file2_truncated.txt file1_truncated.txt |      head

This User Gave Thanks to vgersh99 For This Post:
# 19  
Old 10-29-2011
Quote:
Originally Posted by alpesh
Hi radoulov and vgersh99,

Can you help me out please? Smilie
Just to add that if you post an example of the expected output, based on the provided input, you'll probably receive a quicker answer.
# 20  
Old 11-04-2011
Hi radoulov,

I will try to explain my question with two examples. Sorry if its a lengthy read, I`m sure the answer will take you much less time than reading the question. This is a continuation of the substring code that you helped me with earlier. I have attached file1 and file 2 samples for testing.

File2$6 can have 'M' and 'S' along with other alphebets and numbers.There can be no S, or max 2 S. There has to be at least 1 'M' and at most two 'M's. As a rule , we ignore the S and take the M.
S will only be present at the beginning and/or ending of $6 and not the middle.
example 23S4M9S, 1S34M1S, 34M2S are valid but 23M1S12M is invalid.

Lets take an example of file1$4=SNPSTER1_0001:7:60:876:131#0/1

So if file2$6 is 20M769N15M1S for $10 string ATAGCCAATATCCCCAACAGGTTGAGGGAACTGTTT
,we divide it into 4 segments.

s1=0=first 0 characters to ignore since there is no leading S
s2=1=last 1 character to ignore = T
m1=20=first 20 characters after s1,ATAGCCAATATCCCCAACAG
m2=15=last 15 characters before s2, GTTGAGGGAACTGTT





So we have 2 strings(m1 and m2), and substring is to be extracted from one of them based on the following condition.

if (file2$4+s1+m1) > file1$9
choose string m1 for substring operation
else
choose string m2 for substring opeartion



Substring operation

when file2$1=file1$4,

print file1$0 , the substring of m1 or m2 with parameters file1$8 - file1$2 + 1, file1$9 - file1$8



In this case file1$2=15735490
file1$8=15735496
file1$9=15735497

file2$4=15734702



(file2$4+s1+m1)=15734722 is less than file1$9=15735497

so we choose m2=GTTGAGGGAACTGTT for substring operation.

answer = substring (GTTGAGGGAACTGTT,7,1) = G

##########################################################

Another example

file1$4=file2$1=SNPSTER1_0001:7:115:1082:672#0/1
file2$10 = ATCTTGGGCCGCGAGCATCTTCAACCGCAAAATTTG

file 2$6=1S24M186N11M


s1=1 ignore first character 'A'
s2=0
m1=24 , TCTTGGGCCGCGAGCATCTTCAAC
m2=11, CGCAAAATTTG

In this case file1$2=4044310
file1$8=4044316
file1$9=4044317


file2$4=4044311

Here (file2$4+s1+m1)=4044336 > file1$9=4044317

So we choose m1=TCTTGGGCCGCGAGCATCTTCAA for subtring operation


answer = substring (TCTTGGGCCGCGAGCATCTTCAA ,7,1) = G


############################################################


Thanks,
Alpesh
# 21  
Old 11-04-2011
Try this:

Code:
awk 'NR == FNR {
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)
    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)
    }
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1], f2m1[$1])
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1], f2m2[$1])
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' file2_sample.txt file1_sample.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900 (18 Replies)
Discussion started by: venkat_reddy
18 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

4. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

5. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

6. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

7. Shell Programming and Scripting

extracting columns from 2 files

Hello, I have 2 files file1 & file2 = a1 b1 a2 b2 a3 b3 ... = c1 d1 c2 d2 c3 d3 ... I need to compare if b(i)=c(j) . i,j=1,2,3,4,... If yes, right a(i) d(j) in output file3 per line (1 Reply)
Discussion started by: newpromo
1 Replies

8. Shell Programming and Scripting

Append string to columns from 2 files

Hi Having a file as follows file1.txt Date (dd/mm)Time Server IP Error Code =========================================================================== 10/04/2008 10:10 ServerA xxx.xxx.xxx.xxx 6 10/04/2008 10:10 ServerB ... (3 Replies)
Discussion started by: karthikn7974
3 Replies

9. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

10. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies
Login or Register to Ask a Question