Using columns from 2 files and extracting string


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using columns from 2 files and extracting string
# 1  
Old 10-19-2011
Using columns from 2 files and extracting string

Hi All,

I have 2 files with a common column.

File 1 looks like
Code:
NAME   START     POS1  POS2  
N1          1234        1236    1237 
N2          1245        1248    1250
..
..

File 2 looks like
Code:
NAME   STRING  
N1         ABCDEFGH
N2         EFGHBCD
N3         PQRSSTUV
..
......

The string named N1 starts at position (1234+1)=1235 and continues upto 1242
and has 8 characters as described in the second file. I am interested in extracting the sub-string between 1236(POS1 for N1) and 1237(POS2 for N1) not including the start i.e (1237 for N1) and (1249 and 1250 for N2).
So the expected answer is D for N1 and HB for N2. The answer can appear as a new column in file1.

It seems fairly simple but I`m struggling to find the right commands.Please help.

Thanks,
Alpesh

Last edited by radoulov; 10-19-2011 at 06:09 PM.. Reason: Code tags.
# 2  
Old 10-19-2011
Could you please elaborate more? How the length of the string ABCDEFGH and the positions POS1 and POS2 relate to each other?
# 3  
Old 10-19-2011
the string ABCDEFGH starts from the start position column+1. So it starts from position 1235, the character A is at position 1235 and the last character H is at 1242. I want to extract the sub-string starting at POS1 and ending at POS2 not including POS1. So in this case the result would be the character in position 1237 which is D. For N2, the substring is between 1248 and 1250, basically 1249 and 1250 not including 1248, the result is HB.
# 4  
Old 10-19-2011
If A is at position 1235, D is at position1238, not at 1237, or I'm missing something?
# 5  
Old 10-19-2011
sorry ..you are correct !
# 6  
Old 10-19-2011
I'm not sure if 'START' position really matters - you're extracting from POS1 to POS2 anyway. Or are POS1 and POS2 relative to the START?
something along these lines - debug with a sample file1:
Code:
N1          1234        2    5
N2          1245        3    6

Code:
nawk 'FNR==NR{fs[$1]=$3;fe[$1]=$4;next} $1 in fs {print substr($2,fs[$1],fe[$1]-fs[$1])}' file1 file2

This User Gave Thanks to vgersh99 For This Post:
# 7  
Old 10-19-2011
This is what I get with your files:

Code:
awk 'NR == FNR {
  f2[$1] = $2; next
  }
$1 in f2 {
  print $0, substr(f2[$1], $3 - $2, $4 - $3) 
  }' file2 file1

Code:
% awk 'NR == FNR {
  f2[$1] = $2; next
  }
$1 in f2 {
  print $0, substr(f2[$1], $3 - $2, $4 - $3)
  }' file2 file1
NAME   START     POS1  POS2   
N1          1234        1236    1237  B
N2          1245        1248    1250 GH

You may need to adjust the numbers Smilie

Or this:

Code:
% awk 'NR == FNR {
  f2[$1] = $2; next
  }
$1 in f2 {
  print $0, substr(f2[$1], $3 - $2 + 2, $4 - $3)           
  }' file2 file1                        
NAME   START     POS1  POS2   
N1          1234        1236    1237  D
N2          1245        1248    1250 BC

This User Gave Thanks to radoulov For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900 (18 Replies)
Discussion started by: venkat_reddy
18 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file. (5 Replies)
Discussion started by: owwow14
5 Replies

4. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

5. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

6. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

7. Shell Programming and Scripting

extracting columns from 2 files

Hello, I have 2 files file1 & file2 = a1 b1 a2 b2 a3 b3 ... = c1 d1 c2 d2 c3 d3 ... I need to compare if b(i)=c(j) . i,j=1,2,3,4,... If yes, right a(i) d(j) in output file3 per line (1 Reply)
Discussion started by: newpromo
1 Replies

8. Shell Programming and Scripting

Append string to columns from 2 files

Hi Having a file as follows file1.txt Date (dd/mm)Time Server IP Error Code =========================================================================== 10/04/2008 10:10 ServerA xxx.xxx.xxx.xxx 6 10/04/2008 10:10 ServerB ... (3 Replies)
Discussion started by: karthikn7974
3 Replies

9. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get... (1 Reply)
Discussion started by: mohancrr
1 Replies

10. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance! (2 Replies)
Discussion started by: kingkong
2 Replies
Login or Register to Ask a Question