Help with parsing file with combination of pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with parsing file with combination of pattern
# 1  
Old 10-01-2014
Help with parsing file with combination of pattern

I have a file1 like

Code:
    prt1|als28.1 prt3|als53.1 prt2|als550.1 prt1|bls9.2 prt2|als7.2 prt2|bls0.2
    prt2|als872.1 prt1|bls871.1    prt2|als6.2    prt4|als22.1 prt2|bls43.2

I want to create a file2 from this file by comparing all the possible combinations of patterns (prt) assuming prt1 as a reference pattern. The number of pattern can be differentin each lines of file1. For first line in file1 there can be several pairs considering each prt1 as reference (for example `
Code:
prt1|als28.1 prt3|als53.1; prt1|als28.1 prt2|als550.1; prt1|als28.1 prt2|als7.2; prt1|als28.1 prt2|bls0.2; prt1|bls9.2 prt3|als53.1; prt1|bls9.2 prt2|als550.1; prt1|bls9.2 prt2|als7.2; prt1|bls9.2 prt2|bls0.2

`). The combination like `
Code:
prt1|als28.1 prt1|bls9.2

` should be ignored. So the output of first line in file2(result) will be

Code:
    prt1|als28.1 prt3|als53.1
    prt1|als28.1 prt2|als550.1
    prt1|als28.1 prt2|als7.2
    prt1|als28.1 prt2|bls0.2
    prt1|bls9.2 prt3|als53.1
    prt1|bls9.2 prt2|als550.1
    prt1|bls9.2 prt2|als7.2
    prt1|bls9.2 prt2|bls0.2

likewise the output of second line will be

Code:
    prt1|bls871.1 prt2|als872.1
    prt1|bls871.1 prt2|als6.2
    prt1|bls871.1 prt4|als22.1
    prt1|bls871.1 prt2|bls43.2

I can't figure out how exactly can do this. any suggestions/programs will be helpful. This is one I wrote

Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    open F1,$ARGV[0] or die "\n can not open file $ARGV[0]\n";
    my $pattern1 = $ARGV[1];
    my $otherpattern = $ARGV[2];
    while (my $line=<F1>) 
    {
        if ($line=~/ ($querypattern\S+)/i) { print $1; }
        {
            if ($line=~/  ($otherpattern\S+)/i)
            {
                print "\t".$1."\n";
            }
            else
            {
                if ($line=~ m/\bNo pairs found\b/g)
                {
                    print "\t".$line;
                    print "\t"."No pairs Found"."\n";


Last edited by Scrutinizer; 10-01-2014 at 03:41 PM.. Reason: CODE tags
# 2  
Old 10-01-2014
How about
Code:
awk     '       {c++
                 for (i=1; i<=NF; i++) if ($i ~ /prt1/) A[$i]
                                         else           B[$i]
                 for (i in A) for (j in B) print i, j > "file"c
                 delete A; delete B
                }
        ' file
file1:
prt1|bls9.2 prt2|bls0.2
prt1|bls9.2 prt3|als53.1
prt1|bls9.2 prt2|als550.1
prt1|bls9.2 prt2|als7.2
prt1|als28.1 prt2|bls0.2
prt1|als28.1 prt3|als53.1
prt1|als28.1 prt2|als550.1
prt1|als28.1 prt2|als7.2
file2:
prt1|bls871.1 prt4|als22.1
prt1|bls871.1 prt2|als872.1
prt1|bls871.1 prt2|bls43.2
prt1|bls871.1 prt2|als6.2

Does the output order matter?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 10-01-2014
Quote:
Originally Posted by [/CODE
Does the output order matter?
No the order does not matter.
# 4  
Old 10-01-2014
If you still are interested in a Perl solution.

Code:
#!/usr/bin/perl

use strict;
use warnings;

my $filename = shift or die "Missing filename to operate on it" ;
my $re = shift or die "Missing regex to match";

open my $fh, '<', $filename or die "Could not open $filename: $!\n";

while (my $line = <$fh>) {
    chomp $line;
    print "Line #$.\n";
    my @fields = split /\s+/, $line;
    my @patterns = grep{/$re/} @fields;

    my %patterns = map{$_ => 1} @patterns;
    my @NF = grep(!defined $patterns{$_}, @fields);

    for my $pattern (@patterns) {
        for my $field (@NF) {
            print "$pattern $field\n";
        }
    }
    print "\n";
}
close $fh

Result:

Code:
→ perl prog.pl filename prt1
Line #1
prt1|als28.1 prt3|als53.1
prt1|als28.1 prt2|als550.1
prt1|als28.1 prt2|als7.2
prt1|als28.1 prt2|bls0.2
prt1|bls9.2 prt3|als53.1
prt1|bls9.2 prt2|als550.1
prt1|bls9.2 prt2|als7.2
prt1|bls9.2 prt2|bls0.2

Line #2
prt1|bls871.1 prt2|als872.1
prt1|bls871.1 prt2|als6.2
prt1|bls871.1 prt4|als22.1
prt1|bls871.1 prt2|bls43.2

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk or a combination of commands to read and calculate nth lines from pattern

Two numerical lines, found by either header line, need to be added and the total placed in a new-header section. Also the total should should be rounded or cut to a two decimal anynumber.XX format with the AB string added on the end. For example: The numerical lines from headers 2 and 3 are... (3 Replies)
Discussion started by: jessandr
3 Replies

2. UNIX for Dummies Questions & Answers

Display latest record from file based on multiple columns combination

I have requirement to print latest record from file based on multiple columns combination. EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/16/2015 100004 03/16/2015 100005 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 140003 03/18/2015 140004 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 220006... (1 Reply)
Discussion started by: tmalik79
1 Replies

3. Linux

Berkeley version bdf: combination of inode and file system usage

hello, i need a command like "bdf" of HP-UX. ( report number of free disk blocks (Berkeley version) ). it should report inode and file system usage in one line like bdf -i manpage : Man Page for bdf (all Section 1m) - The UNIX and Linux Forums HP-UX command "" report inode and file... (2 Replies)
Discussion started by: bora99
2 Replies

4. Shell Programming and Scripting

Parsing and timestamp a pattern in log

Hello Thanks to Chubler_XL and MadeInGermany for their help few weeks ago. Now, i would like modifying the script, see the next POST. The old script works like that : I picked any random hours In the logs there is the stamp time of webservices, i can see the behavior or errors of... (3 Replies)
Discussion started by: amazigh42
3 Replies

5. UNIX for Dummies Questions & Answers

sed - combination of line deletion and pattern matching

I want to delete all the blank lines from a file before a certain line number. e.g. Input file (n: denotes line number) 1: a 2: 3: b 4: c 5: 6: d I want to delete all blank lines before line number 3, such that my output is: a b c d I see that sed '/^$/d' in_file works... (9 Replies)
Discussion started by: jawsnnn
9 Replies

6. Shell Programming and Scripting

help with awk for file combination

1)file1: | *Local Communication Bandwidths (MB/Sec) | Memory copy (bcopy) | | ^ | mmap_bandwidth | | ^ | mmap_read bandwidth | | ^ | memory write bandwidth | | Local Communication Latencies | Pipe Latency | 2)file2 422.6903 1948.9000 ... (9 Replies)
Discussion started by: yanglei_fage
9 Replies

7. Shell Programming and Scripting

pattern parsing

Can somebody show me an example of of using either '#' or '?' to remove part of string. I am reading files from directories and I want to check if file ends with *.log *.dmp or begins with a arch_* I DONT want to use AWK or SED to do this since there maybe 1000's of files I need to test... (4 Replies)
Discussion started by: BeefStu
4 Replies

8. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

9. UNIX for Dummies Questions & Answers

shell help - file combination

Dear all, I have a question about merging multiple files to one. For example, I have 4 files, named file_1, file_2, file_3 and file_4, they all have same line number, and only one word in each line. I want to combine these four files to one file, file_1 becomes the first column of the new... (4 Replies)
Discussion started by: ting123
4 Replies

10. UNIX for Dummies Questions & Answers

awk and file combination

Hi there, I have 3 files and i want to take different fields from each file and combine them in one. I would like to ask if somebody tell me how can I refer to each field of the different files to write an awk command. I mean can I do sth like awk '........... print $1.file1 $3.file2}'... (1 Reply)
Discussion started by: sickboy
1 Replies
Login or Register to Ask a Question