Find and select complete paragraph


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find and select complete paragraph
# 8  
Old 12-07-2015
Got Perl?

Code:
#!/usr/bin/perl
use strict;
use warnings;

my $sep = $/ = "== :: ==\n";
my $pattern = "Permanent";
my $hat = 0;

while(<>) {
    if(/$pattern/){
        print $sep if $hat == 0 and ++$hat;
        print;
    }
}

Save as filter.pl
Run as perl filter.pl april_2015.txt
or
perl filter.pl april_2015.txt > permanent.txt
or
perl filter.pl april_2015.txt may_2015.txt june_2015.txt > permanent.txt
or
perl filter.pl *_2015.txt > permanent.txt
This User Gave Thanks to Aia For This Post:
# 9  
Old 12-08-2015
not much efficient by another approach.
Code:
awk ' BEGIN {printf "== :: =="; RS="==";FS="\n";OFS="\n" } $4 ~ "Permanent"{ printf $1 OFS $2 OFS $3 OFS $4 OFS $5 OFS $6 OFS $7 OFS "== :: =="} ' file

# 10  
Old 12-20-2015
I tried code suggested by Aia and its working as per expectations.
However, I would like to run this perl in loop as I have around 830 different patterns like "Permanent".

I tried following

Code:
cat filter.pl
#!/usr/bin/perl
use strict;
use warnings;

my $sep = $/ = "== :: ==\n";
my $pattern = `$1`;
my $hat = 0;

while(<>) {
if(/$pattern/){
print $sep if $hat == 0 and ++$hat;
print;
 }
}

and made following loop.

Code:
for service in `cat input_patterns`
do
echo ${service}
perl filter.pl ${service} file>>pattern.out
done

however, its not working.
I am sure I am doing something wrong.
Kindly suggest.
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for full-line and multi-line sample input, output, and code segments.

Last edited by Don Cragun; 12-20-2015 at 03:15 PM.. Reason: Change ICODE tags to CODE tags.
# 11  
Old 12-20-2015
Quote:
Originally Posted by anushree.a
...

and made following loop.

Code:
for service in `cat input_patterns`
do
echo ${service}
perl filter.pl ${service} file>>pattern.out
done

however, its not working.
I am sure I am doing something wrong.
Kindly suggest.
...
The Perl program processes the data file to search for one pattern. Inside the loop, it runs as many times as there are patterns.
So, if your "input_patterns" file has 830 lines, then you process the data file 830 times, searching for one pattern each time!

To give you an analogy, let's say you want to go grocery shopping.
Do you do the following?
(1) Go to grocery store, buy eggs, come back.
(2) Then go to the same grocery store, buy milk, come back.
(3) Then go to the same grocery store, buy meat, come back.
(4) Then go to the same grocery store, buy drinks, come back.
...

I'm sure you see how inefficient this is, yet you're doing something similar in your code.

While this kind of code might work at a small scale (small data file, small pattern file), the inefficiency due to repeated scanning add up at a large scale.
Imagine searching for 10,000 patterns in a million line data file.
Do you want to scan a million line file 10,000 times, looking for one pattern each time?
# 12  
Old 12-20-2015
Quote:
Originally Posted by anushree.a
I tried code suggested by Aia and its working as per expectations.
However, I would like to run this perl in loop as I have around 830 different patterns like "Permanent".


Code:
cat filter.pl
#!/usr/bin/perl
use strict;
use warnings;

my $sep = $/ = "== :: ==\n";
my $pattern = `$1`;
my $hat = 0;

while(<>) {
if(/$pattern/){
print $sep if $hat == 0 and ++$hat;
print;
 }
}

and made following loop.

Code:
for service in `cat input_patterns`
do
echo ${service}
perl filter.pl ${service} file>>pattern.out
done

however, its not working.
I am sure I am doing something wrong.
Kindly suggest.
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for full-line and multi-line sample input, output, and code segments.
This is not tested, however, if you want to use the Perl script in that way you need a different modification that what I highlighted in red.

Code:
#!/usr/bin/perl
use strict;
use warnings;

my $pattern = shift;
my $sep = $/ = "== :: ==\n";
my $hat = 0;


while(<>) {
    if(/$pattern/){
        print $sep if $hat == 0 and ++$hat;
        print;
    }
}

And then you can use the following shell script

Code:
#!/bin/bash

while read service; do
    perl filter.pl "$service" file.txt >> pattern.out
done < input_patterns

Of course, that might be slow due to all the times the binary perl gets called, and the opening, appending and closing of pattern.out. Your mileage may vary there.

Here's a Perl script that might work alone.
Again, it is not tested but you can try with just a portion of your 800 plus patterns input_patterns. It assumes one pattern per line.

Code:
#!/usr/bin/perl
use strict;
use warnings;

my $pattern;
my $pattern_source = shift;
{
    local $/ = undef;
    open my $fh, '<', $pattern_source or die "$!\n";
    $pattern = <$fh>;
    $pattern =~ s/\R(?!$)/\|/g;
    close $fh;
}
my $sep = $/ = "== :: ==\n";
my $hat = 0;
while(<>) {
    if(/$pattern/){
        print $sep if $hat == 0 and ++$hat;
        print;
    }
}

Use as perl filter.pl input_patterns april_2015.txt > pattern.out
# 13  
Old 12-20-2015
Here's a Python script as well:

Code:
$ 
$ cat patterns.txt
Permanent
Geometric
$ 
$ cat april_2015.txt 
== :: ==
Gender: Female
Service: Tattoo
Nature: Permanent
Amt: 21000 INR
Date: 04/04/2015
Artist: Anushka
== :: ==
Gender: Female
Service: Makeup
Nature: Bridal
Amt: 19200 INR
Date: 05/04/2015
Artist: Jenn
== :: ==
Gender: Male
Service: Tattoo
Nature: Permanent
Amt: 9500 INR
Date: 05/04/2015
Artist: Anushka
== :: ==
Gender: Male
Service: Tattoo
Nature: Geometric
Amt: 9500 USD
Date: 05/04/2015
Artist: Kat Von D
== :: ==
$ 
$ cat -n process_files.py
     1	#!/usr/bin/env python
     2	from sys import argv
     3	# Accept file names as input parameters
     4	pattern_file = argv[1]
     5	data_file = argv[2]
     6	
     7	# Load patterns from pattern_file
     8	patterns = []
     9	with open(pattern_file, 'rt') as f:
    10	    for line in f:
    11	        line = line.replace('\n','')
    12	        patterns.append(line)
    13	
    14	# Read data_file; print data chunk if pattern was found
    15	chunk = []
    16	print_the_rest = 0
    17	with open(data_file, 'rt') as f:
    18	    for line in f:
    19	        line = line.replace('\n','')
    20	        if line == "== :: ==":
    21	            chunk = []
    22	            chunk.append(line)
    23	            print_the_rest = 0
    24	        else:
    25	            chunk.append(line)
    26	            param, value = line.split(': ')
    27	            if value in patterns:
    28	                for item in chunk:
    29	                    print item
    30	                chunk = []
    31	                print_the_rest = 1
    32	            elif print_the_rest:
    33	                print line
    34	
$ 
$ python process_files.py patterns.txt april_2015.txt
== :: ==
Gender: Female
Service: Tattoo
Nature: Permanent
Amt: 21000 INR
Date: 04/04/2015
Artist: Anushka
== :: ==
Gender: Male
Service: Tattoo
Nature: Permanent
Amt: 9500 INR
Date: 05/04/2015
Artist: Anushka
== :: ==
Gender: Male
Service: Tattoo
Nature: Geometric
Amt: 9500 USD
Date: 05/04/2015
Artist: Kat Von D
$ 
$


Last edited by durden_tyler; 12-20-2015 at 07:22 PM..
# 14  
Old 12-21-2015
Dear Aia,

Thank you for your support.
The script is now written and have been sent to teating team.

Also, I am grateful to all the other friends who helped me here...
Thanks for the time and efforts.

Bye.
Anu.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract paragraph that contains a value x<-30

I am using OSX. I have a multi-mol2 file (text file with coordinates and info for several molecules). An example of two molecules in the file is given below for molecule1 and molecule 2. The total file contains >50,000 molecules. I would like to extract out and write to another file only the... (2 Replies)
Discussion started by: Egy
2 Replies

2. UNIX for Advanced & Expert Users

Find command takes too long to complete

Hi, Below is my find command find /opt/app/websphere -name myfolder -perm -600 | wc -l At time it even takes 20 mins to complete. my OS is : SunOS mypc 5.10 Generic_150400-09 sun4v sparc SUNW,T5440 (10 Replies)
Discussion started by: mohtashims
10 Replies

3. Shell Programming and Scripting

How to grep paragraph?

Hi, I have A file like this: >Contig1 AAAAAAATTTTTTCCCAATATATGAT ATATATAEATATATAT >Contig2 AAAAAAATTTTTTCCCAATATATGAT ATATATAEAATTTTTAATTTTTTCCCA ATCCCAAATATATAT >Contig3 AAAAAAATTTTTTCCCAATATATGAT ATATATAEAATTTTTAATTTTTTCCCA ATCCCAAATAAATTTTTTCCCAATAT ATGATATATATAEAATTTTTAATTTTT... (3 Replies)
Discussion started by: the_simpsons
3 Replies

4. UNIX for Dummies Questions & Answers

Unable to execute the complete cmd - using find command

Hi, I'm unable to execute the below command completely ; it's not allowing me to type the complete command. It is allowing till "xargs" and i cannot even press enter after that. I'm using Solaris. Let me know if anything needs to be added so as to execute the complete command. Appreciate... (12 Replies)
Discussion started by: venkatesht
12 Replies

5. Shell Programming and Scripting

How to find complete file names in UNIX if i know only extention of file

Suppose I have a file which contains other file names with some extention . text file containt gdsds sd8ef g/f/temp_temp.sum yyeta t/unix.sum ghfp hrwer h/y/test.text.dat if then.... I want to get the complete file names, like for above file I should get output as temp_temp.sum... (4 Replies)
Discussion started by: panchal
4 Replies

6. UNIX for Dummies Questions & Answers

Output text from 1st paragraph in file w/ a specific string through last paragraph of file w/ string

Hi, I'm trying to output all text from the first paragraph in a file that contains a specific string through the last paragraph in that file that contains that string. Previously, I was outputting just each paragraph with that search string with: cat in_file | nawk '{RS=""; FS="\n";... (2 Replies)
Discussion started by: carpenn
2 Replies

7. UNIX for Dummies Questions & Answers

BASH complete-filename & menu-complete together

Hi, Does anyone know how to make BASH provide a list of possible completions on the first tab, and then start cycling through the possibilites on the next tab? Right now this is what I have in my .bashrc: bind "set show-all-if-ambiguous on" bind \\C-o:menu-complete This allows... (0 Replies)
Discussion started by: Mithu
0 Replies

8. UNIX for Dummies Questions & Answers

how to find complete path of a file in unix

hi experts(novice people can stay away as it is no child's game), i am developing a script which works like recycle bin of windows. the problem i am facing is that when ever i am trying to delete a file which is situated in parent directory or parent's parent directory i am unable to capture... (5 Replies)
Discussion started by: yahoo!
5 Replies

9. UNIX for Advanced & Expert Users

how to find complete path of a file in unix

hi experts(novice people can stay away as it is no child's game), i am developing a script which works like recycle bin of windows. the problem i am facing is that when ever i am trying to delete a file which is situated in parent directory or parent's parent directory i am unable to capture... (1 Reply)
Discussion started by: yahoo!
1 Replies

10. Shell Programming and Scripting

Bold the paragraph

Hi, I have a file with multiple paragraph. I want to look for some word and make that paragraph bold. How can I do that? Thanks, Karthik (3 Replies)
Discussion started by: caprikar
3 Replies
Login or Register to Ask a Question