If Perl is an option, then here's a sample program:
Code:
$
$ cat -n process_files.pl
1 #!/usr/bin/perl
2 use strict;
3
4 # Set the file names
5 my $search_file = "file1";
6 my $data_file = "file2";
7 my $temp_file = "file1.tmp";
8
9 # $sf = The search file; the file from where the search terms are read.
10 # $df = The data file; the file to be scanned with the search term to fetch
11 # values of AF and FDP
12 # $tf = The temporary file; where the final results are stored temporarily.
13 # After all processing is done, you can either move or copy the temp file
14 # to the search file. This is much safer than in-place editing of search
15 # file.
16 open(my $sf, "<", $search_file) or die "Can't open $search_file: $!";
17 open(my $df, "<", $data_file) or die "Can't open $data_file: $!";
18 open(my $tf, ">", $temp_file) or die "Can't open $temp_file: $!";
19
20 # Loop through the search file
21 while (<$sf>) {
22 chomp(my $line1 = $_);
23 # For each line in the search file, extract the search term, which is
24 # the second "word" from left.
25 (my $search_term) = $line1 =~ /^\S+\s+(\S+).*$/;
26
27 # Initialize AF and FDP to zero-length strings. If the search term is
28 # not found in data file, we use this fact to print "NOT DETECTED".
29 my ($af, $fdp) = ("", "");
30
31 # Now loop through the data file looking for the search term.
32 while (<$df>) {
33 chomp(my $line2 = $_);
34 # If the search term was found in a particular line of data file,
35 # set AF and FDP values and stop processing the data file.
36 if ($line2 =~ /$search_term/) {
37 ($af, $fdp) = $line2 =~ /AF=(.*?);.*FDP=(.*?);/;
38 last;
39 }
40 }
41
42 # If AF and FDP were set, print their values, else "NOT DETECTED".
43 if ($af eq "" and $fdp eq "") {
44 print $tf "$line1 NOT DETECTED\n";
45 } else {
46 print $tf "$line1 READS=[$fdp] AF=[$af]\n";
47 }
48
49 # Now rewind; set the pointer back to the beginning of data file.
50 # For the next search term, we will start searching from the top of the
51 # data file.
52 seek($df, 0, 0);
53 }
54
55 close($sf) or die "Can't close $search_file: $!";
56 close($df) or die "Can't close $data_file: $!";
57 close($tf) or die "Can't close $temp_file: $!";
58
59 # At this point, we have the original file "file1" and a temp file "file1.tmp"
60 # with the desired output. The following statements retain a backup of the
61 # original file.
62 my $sf_orig = "file1.orig";
63 rename($search_file, $sf_orig) or die "Can't rename $search_file to $sf_orig: $!";
64 rename($temp_file, $search_file) or die "Can't rename $temp_file to $search_file: $!";
65
$
$ perl process_files.pl
$
$ cat file1
AKT1 c.49G>A p.E17K NOT DETECTED
AKT1 c.155T>G p.L52R NOT DETECTED
APC c.4033G>T p.E1345* READS=[1999] AF=[0.248124]
EGFR c.2237_2255delAATTAAGAGAAGCAACATCinsT p.E746_S752delinsV READS=[1963] AF=[0.0,0.0,0.0,0.139582]
$
$
These 2 Users Gave Thanks to durden_tyler For This Post:
Hi Friends,
I have a file with the following values..
xyz.txt,12345.xml
abc.txt,04567.xml
cde.txt,12134.xml
I would like to extract all the 2nd column values twice as shown in the example like
12345,12345.xml
04567,04567.xml
12134,12134.xml
Please advice!!
In the formus one of... (7 Replies)
I have read another post about this issue and am wondering how to adapt it
to my own, much simpler, issue.
I have a file of user IDs like so:
333333
321321
546465
...etc
I need to take each number and use it to print records wherein the 5th
field matches the user ID pulled from the... (2 Replies)
Hi everyone,
I have file1 and file2 comma separated both.
file1 is:
Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10
Code7,,,,,,,,,
Code5,,,,,,,,,
Code3,,,,,,,,,
Code9,,,,,,,,,
Code2,,,,,,,,,file2... (17 Replies)
Hello friends,
I have a text file with many columns (no. columns vary from row to row) separated by space. I need to collect all the values from 18th column to the end from each line and group them as pairs and then numbering like below..
1. 18th-col-value 19th-col-value 2. 20th-col-value ... (5 Replies)
In the below awk I am trying to print expName only if another tag planExecuted is true. In addition to the expName I am also printing planShortID. For some reason the word experiment gets printed so I remove it with sed. I have attached the complete index.html as well as included a sample of it... (1 Reply)
The below awk is used with the attached index.html and matches the specific user id in the sub portion with path of /rundb/api/v1/plugin/49/. The command does run but the output is blank. Something changed in the file structure as it used to work.
So using the first line in the output:
... (2 Replies)
I am trying to use awk to match the NM_ in file with $1 of id which is tab-delimited. The NM_ will always be in the line of file that starts with > and be after the second _. When there is a match between each NM_ and id, then the value of $2 in id is substituted or used to update the NM_. Each NM_... (3 Replies)
I am trying to use awk to find all the $2 values in file2 which is ~30MB and tab-delimited, that are between $2 and $3 in file1 which is ~2GB and tab-delimited.
I have just found out that I need to use $1 and $2 and $3 from file1 and $1 and $2of file2 must match $1 of file1 and be in the range... (6 Replies)
In the awk below which executes as is, I am trying to add a condition that will extract the text or
value after the FR= for the lines in each line of file1 compared
to file2. As is the lines between the two files are either a match, Missing in file 1, or Missing in file2,
but I can not add the... (1 Reply)