Using columns from 2 files and extracting string Post: 302571016

Sponsored Content

Top Forums Shell Programming and Scripting Using columns from 2 files and extracting string Post 302571016 by radoulov on Saturday 5th of November 2011 01:12:28 PM

11-05-2011

Registered User

Yes,
that is another bug Smilie

Try swapping p1 == "M" and !c++:

Code:

awk 'NR == FNR {
  c=x
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    !c++ && p1 == "M" && t[$1, "S"]++ 

    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)

    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1] +1 , f2m1[$1])
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1] +1 , f2m2[$1])
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' f2.txt f1.txt

This is the code with debug statements that I've used:

Code:

awk 'NR == FNR {
  c=x
  #debug
  print "debug:", $6
  while (match($6, /[0-9]*[SM]/)) {
    p1 = substr($6, RSTART + RLENGTH - 1, 1)    
    !c++ && p1 == "M" && t[$1, "S"]++ 

    f2[$1, p1, ++t[$1, p1]] = substr($6, RSTART, RLENGTH - 1) 
    $6 = substr($6, RSTART + RLENGTH)

    }    
  f2s1[$1] = ($1, "S", 1) in f2 ? f2[$1, "S", 1] : 0 
  f2s2[$1] = ($1, "S", 2) in f2 ? f2[$1, "S", 2] : 0
  f2m1[$1] = ($1, "M", 1) in f2 ? f2[$1, "M", 1] : 0
  f2m2[$1] = ($1, "M", 2) in f2 ? f2[$1, "M", 2] : 0 
  f2m1s[$1] = substr($10, f2s1[$1] +1 , f2m1[$1])
  #debug
  for (i in f2)
    print "debug: f2:" i, f2[i]
  print "debug: f2s1[$1]", f2s1[$1]
  print "debug: f2s2[$1]", f2s2[$1]
  print "debug: f2m1[$1]", f2m1[$1]
  print "debug: f2m2[$1]", f2m2[$1]
  f2m2s[$1] = substr($10, length($10) - f2m2[$1] - f2s2[$1] +1 , f2m2[$1])
  print "debug: f2m2s[$1]", f2m2s[$1]
  f2_4[$1] = $4
  next    
  }
$4 in f2s1 {
  _sub = (f2_4[$4] + f2s1[$4] + f2m1[$4]) > $9 ? f2m1s[$4] : f2m2s[$4]
  print $0, substr(_sub, $8 - $2 + 1, $9 - $8)
  }' f2.txt f1.txt

radoulov

View Public Profile for radoulov

Find all posts by radoulov

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Hello! I wan't to extract columns from two files and later combine them for plotting with gnuplot. If the files file1 and file2 look like: fiile1: a, 0.62,x b, 0.61,x file2: a, 0.43,x b, 0,49,x The desired output is a 0.62 0.62 b 0.61 0.49 Thank you in advance!

2. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Hi, Need to extract a string from one file and search the same in other files. Ex: I have file1 of hundred lines with no delimiters not even space. I have 3 more files. I should get 1 to 10 characters say substring from each line of file1 and search that string in rest of the files and get...

3. Shell Programming and Scripting

Append string to columns from 2 files

Hi Having a file as follows file1.txt Date (dd/mm)Time Server IP Error Code =========================================================================== 10/04/2008 10:10 ServerA xxx.xxx.xxx.xxx 6 10/04/2008 10:10 ServerB ...

4. Shell Programming and Scripting

extracting columns from 2 files

Hello, I have 2 files file1 & file2 = a1 b1 a2 b2 a3 b3 ... = c1 d1 c2 d2 c3 d3 ... I need to compare if b(i)=c(j) . i,j=1,2,3,4,... If yes, right a(i) d(j) in output file3 per line

5. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}...

6. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and...

7. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data...

8. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Hi, I have multiple files that each contain one column of strings: File1: 123abc 456def 789ghi File2: 123abc 456def 891jkl File3: 234mno 123abc 456def In total I have 25 of these type of file.

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late...

10. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Hello All I'm joining two files using Awk by Left outer join on the file 1 File 1 1 AA 2 BB 3 CC 4 DD File 2 1 IND 100 200 300 2 AUS 400 500 600 5 USA 700 800 900

LEARN ABOUT MOJAVE

bytes5.18

bytes(3pm)						 Perl Programmers Reference Guide						bytes(3pm)

NAME

       bytes - Perl pragma to force byte semantics rather than character semantics

NOTICE

       This pragma reflects early attempts to incorporate Unicode into perl and has since been superseded. It breaks encapsulation (i.e. it
       exposes the innards of how the perl executable currently happens to store a string), and use of this module for anything other than
       debugging purposes is strongly discouraged. If you feel that the functions here within might be useful for your application, this possibly
       indicates a mismatch between your mental model of Perl Unicode and the current reality. In that case, you may wish to read some of the perl
       Unicode documentation: perluniintro, perlunitut, perlunifaq and perlunicode.

SYNOPSIS

	   use bytes;
	   ... chr(...);       # or bytes::chr
	   ... index(...);     # or bytes::index
	   ... length(...);    # or bytes::length
	   ... ord(...);       # or bytes::ord
	   ... rindex(...);    # or bytes::rindex
	   ... substr(...);    # or bytes::substr
	   no bytes;

DESCRIPTION

       The "use bytes" pragma disables character semantics for the rest of the lexical scope in which it appears.  "no bytes" can be used to
       reverse the effect of "use bytes" within the current lexical scope.

       Perl normally assumes character semantics in the presence of character data (i.e. data that has come from a source that has been marked as
       being of a particular character encoding). When "use bytes" is in effect, the encoding is temporarily ignored, and each string is treated
       as a series of bytes.

       As an example, when Perl sees "$x = chr(400)", it encodes the character in UTF-8 and stores it in $x. Then it is marked as character data,
       so, for instance, "length $x" returns 1. However, in the scope of the "bytes" pragma, $x is treated as a series of bytes - the bytes that
       make up the UTF8 encoding - and "length $x" returns 2:

	   $x = chr(400);
	   print "Length is ", length $x, "
";     # "Length is 1"
	   printf "Contents are %vd
", $x;	    # "Contents are 400"
	   {
	       use bytes; # or "require bytes; bytes::length()"
	       print "Length is ", length $x, "
"; # "Length is 2"
	       printf "Contents are %vd
", $x;     # "Contents are 198.144"
	   }

       chr(), ord(), substr(), index() and rindex() behave similarly.

       For more on the implications and differences between character semantics and byte semantics, see perluniintro and perlunicode.

LIMITATIONS

       bytes::substr() does not work as an lvalue().

SEE ALSO

       perluniintro, perlunicode, utf8

perl v5.18.2							    2013-11-04								bytes(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extracting columns from different files for later merging

Discussion started by: kingkong

2. Shell Programming and Scripting

Extracting a string from one file and searching the same string in other files

Discussion started by: mohancrr

3. Shell Programming and Scripting

Append string to columns from 2 files

Discussion started by: karthikn7974

4. Shell Programming and Scripting

extracting columns from 2 files

Discussion started by: newpromo

5. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

Discussion started by: orcaja

6. Shell Programming and Scripting

Extracting columns from multiple files with awk

Discussion started by: orcaja

7. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Discussion started by: ida1215

8. Shell Programming and Scripting

Compare columns of multiple files and print those unique string from File1 in an output file.

Discussion started by: owwow14

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

Discussion started by: pawannoel

10. Shell Programming and Scripting

Joining files using awk not extracting all columns from File 2

Discussion started by: venkat_reddy

LEARN ABOUT MOJAVE

bytes5.18