Bash script to extract paragraph with globs in it


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Bash script to extract paragraph with globs in it
# 1  
Old 02-27-2017
Bash script to extract paragraph with globs in it

Hi,

Its been a long time since I have used Bash to write a script so am really struggling here. Need the gurus to help me out.

Code:
uname -a
Linux lxserv01 2.6.18-417.el5

i have a text file with blocks of code written in a similar manner

Code:
******* BEGIN MESSAGE *******

       Station / User:  129   800013   Batch Processing
 SDate / Time / PDate:  26.02.2017 17:07:05   26.02.2017
       Current System:  XXXXXX Production System       
   Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)
        Function Name:  FindOriginalPresentment

Warning !

Original presentment Not Found !

Institution No (Original Tran): [00000043]
Charge back slip: [70527509216]
Acquirer Reference: [85470355344549150697093]
Presentment Slip: [N/A]
Transaction Class: [002 - Clearing transactions]
Transaction Category: [001 - Presentments]
File Institution No: [00000043]
File No: [00041926]

******* END MESSAGE *******

******* BEGIN MESSAGE *******

       Station / User:  129   800013   Batch Processing
 SDate / Time / PDate:  26.02.2017 17:06:59   26.02.2017
       Current System:  XXXXXX Production System       
   Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)

Information message !

Exception Processing - Sundry Types!

Date: [20170226]
Time: [17:06:59]

003','040

******* END MESSAGE *******

******* BEGIN MESSAGE *******

       Station / User:  129   800013   Batch Processing
 SDate / Time / PDate:  26.02.2017 17:07:05   26.02.2017
       Current System:  XXXXXX Production System       
   Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)
        Function Name:  FindOriginalPresentment

Warning !

Original presentment Not Found !

Institution No (Original Tran): [00000043]
Charge back slip: [70527509216]
Acquirer Reference: [85470355344549150697093]
Presentment Slip: [N/A]
Transaction Class: [002 - Clearing transactions]
Transaction Category: [001 - Presentments]
File Institution No: [00000043]
File No: [00041926]

******* END MESSAGE *******

******* BEGIN MESSAGE *******

       Station / User:  129   800013   Batch Processing
 SDate / Time / PDate:  26.02.2017 17:06:59   26.02.2017
       Current System:  XXXXXX Production System        
   Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)

Information message !

Exception Processing - Sundry Types!

Date: [20170226]
Time: [17:06:59]

003','040

******* END MESSAGE *******

Each 'BEGIN MESSAGE' and the subsequent 'END MESSAGE' is a block. Once in this block, if there is a pattern/text 'Original presentment Not Found !', the script should spit out the entire BEGIN and END block. I started with a simple command to search for BEGIN and END blocks but the bash script is giving me errors on finding the GLOB in the BEGIN/END pattern of a block.

Help please, I am lost here.

Thanks a lot.
# 2  
Old 02-27-2017
Please post your attempt so people in here can analyse it and possibly propose corrections and / or enhancements.


EDIT: If you don't insist on a bash solution, try
Code:
awk '
$0 ~ ST                         {TMP = GLPR = ""}
$0 ~ GL                         {GLPR = 1}
$0 ~ ST, $0 ~ EN                {TMP = TMP $0 ORS}
($0 ~ EN) && GLPR               {print TMP}
' ST="BEGIN MESSAGE" EN="END MESSAGE" GL="Original presentment Not Found !"  file


Last edited by RudiC; 02-27-2017 at 08:27 AM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 02-27-2017
Hi,

I think I have a script that will do the trick for you:

Code:
#!/bin/bash
  
input=example.txt
tmp=`/bin/tempfile`

while read -r line
do
        case "$line" in
                "******* BEGIN MESSAGE *******")
                        echo "$line" > "$tmp"
                        ;;
                "******* END MESSAGE *******")
                        echo "$line" >> "$tmp"

                        if /bin/grep ^Original\ presentment\ Not\ Found\ \!$ "$tmp" >/dev/null 2>/dev/null
                        then
                                /bin/cat "$tmp"
                                echo
                        fi
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac

done < $input

Here is the output from a test run. In this case, example.txt was populated with the exact example text you provided in your post.

Code:
$ ./script.sh 
******* BEGIN MESSAGE *******

Station / User:  129   800013   Batch Processing
SDate / Time / PDate:  26.02.2017 17:07:05   26.02.2017
Current System:  XXXXXX Production System
Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)
Function Name:  FindOriginalPresentment

Warning !

Original presentment Not Found !

Institution No (Original Tran): [00000043]
Charge back slip: [70527509216]
Acquirer Reference: [85470355344549150697093]
Presentment Slip: [N/A]
Transaction Class: [002 - Clearing transactions]
Transaction Category: [001 - Presentments]
File Institution No: [00000043]
File No: [00041926]

******* END MESSAGE *******

******* BEGIN MESSAGE *******

Station / User:  129   800013   Batch Processing
SDate / Time / PDate:  26.02.2017 17:07:05   26.02.2017
Current System:  XXXXXX Production System
Institution Number:  00000043
Application / Version:  abc-inw   30.66.36   Release A (OMNI)
Function Name:  FindOriginalPresentment

Warning !

Original presentment Not Found !

Institution No (Original Tran): [00000043]
Charge back slip: [70527509216]
Acquirer Reference: [85470355344549150697093]
Presentment Slip: [N/A]
Transaction Class: [002 - Clearing transactions]
Transaction Category: [001 - Presentments]
File Institution No: [00000043]
File No: [00041926]

******* END MESSAGE *******

$

This seems to print only two blocks, and they both contain the search string. The blocks that do not contain it are not written to standard output, which if I understand what you've written correctly is exactly what you're after.

Hope this helps.
# 4  
Old 02-27-2017
@RudiC, your code runs perfect, but why I am insisting in bash script is because I have worked on it for some time so know the basics of it and later on I can automate this stuff using a script, with others also using it. My code is just a simple print statement, so its not even worth reading it out. Thats the reason I never posted it at the start.

@drysdalk, your script is giving me an error -> no such file or directory. I had created the example.txt beforehand. Just to let you know, I have rights only under my directory so have altered the script code accordingly. Other than that, you have entered the BEGIN and END using a fixed length string, which will work fine in this case but what should I do if the BEGIN and END strings are shorter or longer in length. That was the reason I was looking into using regex.

Code:
#!/bin/bash

input=/home/dsiddiqui/basic_bash/example.txt
tmp=`/home/dsiddiqui/basic_bash/tempfile`

while read -r line
do
        case "$line" in
                "******* BEGIN MESSAGE *******")
                        echo "$line" > "$tmp"
                        ;;
                "******* END MESSAGE *******")
                        echo "$line" >> "$tmp"

                        if /bin/grep ^Original\ presentment\ Not\ Found\ \!$ "$tmp" >/dev/null 2>/dev/null
                        then
                                /bin/cat "$tmp"
                                echo
                        fi
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac

done < $input

# 5  
Old 02-27-2017
Hi,

I suspect the error is coming from this change you made:

tmp=`/home/dsiddiqui/basic_bash/tempfile`

Keep this line the way it was originally:

tmp=`/bin/tempfile`

The quotes I'm using here are backticks, and have the effect of running an external command, namely /bin/tempfile. This line does not simply set the filename directly by assigning text into a variable.

The purpose of the tempfile program is to generate a temporary file under /tmp that is guaranteed not to have existed already, so you don't have to worry about clobbering someone else's output. When run, it creates the file and returns the filename as output, so this basically results in the variable 'tmp' being set to your newly-created temporary file.

So if you try again with the original line and let us know how it goes, we can take things from there.

As for the variable-length output: I would have expected the beginning and end lines of every block would be identical, as a clear way of demarcating one data section from another ? If not, then if you could give us some idea of how the block markers are expected to vary I'll see what I can suggest.
# 6  
Old 02-27-2017
Hi drysdalk,

Getting below error after putting in the original line

Code:
[dsiddiqui@lxserv01 scripts]$ ./para.sh
./para.sh: line 4: /bin/tempfile: No such file or directory
./para.sh: line 10: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 13: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 10: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 13: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 10: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 13: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 10: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 22: : No such file or directory
./para.sh: line 13: : No such file or directory
[dsiddiqui@lxserv01 scripts]$

your script
Code:
[dsiddiqui@lxserv01 scripts]$ more para.sh
#!/bin/bash

input=/home/dsiddiqui/basic_bash/example.txt
tmp=`/bin/tempfile`

while read -r line
do
        case "$line" in
                "******* BEGIN MESSAGE *******")
                        echo "$line" > "$tmp"
                        ;;
                "******* END MESSAGE *******")
                        echo "$line" >> "$tmp"

                        if /bin/grep ^Original\ presentment\ Not\ Found\ \!$ "$tmp" >/dev/null 2>/dev/null
                        then
                                /bin/cat "$tmp"
                                echo
                        fi
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac

done < $input
[dsiddiqui@lxserv01 scripts]$

to your question reg block markers: I did find out, they will remain fixed, but what I was thinking was say -> 'instead of 7 stars at the start of the line, it could be 8, or for that matter, any character any number of times, the only thing consistent would be 'BEGIN MESSAGE' and 'END MESSAGE'. We can also do a check for: if the next line to BEGIN MESSAGE is END MESSAGE -> then it is a block.

Hope I am making sense

Thanks a lot
# 7  
Old 02-27-2017
Hi,

OK, thanks for the detail. This would seem to imply that your Linux system doesn't have /bin/tempfile installed on it. It may be worth checking to see if it's in /usr/bin/tempfile instead, but failing that change the line to something like this:

tmp=/tmp/script.tmp

or any other filename that you are sure it is safe for you to use.

An alternative version of the script that purely checks for the presence of "BEGIN MESSAGE" or "END MESSAGE" anywhere on a line follows.

Note that this isn't strictly speaking 100% safe or reliable, since if for any reason any other line happened to contain either of these strings as part of its text this would trigger the checks in question, and cause that particular block (at a minimum) to get mangled.

It's always a good idea to absolutely strictly define strings like this if you can, since they are fundamental to how parsing a file can safely be done. But if the number of asterisks is variable then you may have to go with the less-strict version below.

Code:
#!/bin/bash

input=example.txt
tmp=`/bin/tempfile`

while read -r line
do
        if echo "$line" | /bin/grep "BEGIN MESSAGE" >/dev/null 2>/dev/null
        then
                echo "$line" > "$tmp"
        elif echo "$line" | /bin/grep "END MESSAGE" >/dev/null 2>/dev/null
        then
                echo "$line" >> "$tmp"

                if /bin/grep ^Original\ presentment\ Not\ Found\ \!$ "$tmp" >/dev/null 2>/dev/null
                then
                        /bin/cat "$tmp"
                        echo
                fi
        else
                echo "$line" >> "$tmp"
        fi
done < "$input"

---------- Post updated at 02:30 PM ---------- Previous update was at 02:20 PM ----------

Hi,

One last version, this time avoiding the use of grep (which may make things run a little faster if you have a great deal of data to get through):

Code:
#!/bin/bash

input=example.txt
tmp=`/bin/tempfile`

while read -r line
do
        case "$line" in
                *BEGIN\ MESSAGE*)
                        echo "$line" > "$tmp"
                        ;;
                *END\ MESSAGE*)
                        echo "$line" >> "$tmp"

                        if /bin/grep ^Original\ presentment\ Not\ Found\ \!$ "$tmp" >/dev/null 2>/dev/null
                        then
                                /bin/cat "$tmp"
                                echo
                        fi
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac
done < "$input"

Again, the same caveats apply as outlined previously: if a line for any reason contains "BEGIN MESSAGE" or "END MESSAGE" as part of its own text and isn't itself a block marker, this would cause problems. So this is only safe if you're 100% sure that will never happen in your input.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract a paragraph containing a given string?

Hello: Have a very annoying problem: Need to extract paragraphs with a specific string in them from a very large file with a repeating record separator. Example data: a file called test.out CREATE VIEW view1 AS something FROM table1 ,table2 as A, table3 (something FROM table4) FROM... (15 Replies)
Discussion started by: delphys
15 Replies

2. UNIX for Dummies Questions & Answers

Extract paragraph that contains a value x<-30

I am using OSX. I have a multi-mol2 file (text file with coordinates and info for several molecules). An example of two molecules in the file is given below for molecule1 and molecule 2. The total file contains >50,000 molecules. I would like to extract out and write to another file only the... (2 Replies)
Discussion started by: Egy
2 Replies

3. Shell Programming and Scripting

how to write bash script that will automatically extract zip file

i'm trying to write a bash script that that will automatically extract zip files after the download. i writed this script #!/bin/bash wget -c https://github.com/RonGokhle/kernel-downloader/zipball/master CURRENDIR=/home/kernel-downloader cd $CURRENDIR rm $CURRENDIR/zipfiles 2>/dev/null ... (2 Replies)
Discussion started by: ron gokhle
2 Replies

4. Shell Programming and Scripting

How to extract multiple line in a paragraph? Please help.

Hi all, The following lines are taken from a long paragraph: Labels of output orbitals: RY* RY* RY* RY* RY* RY* 1\1\GINC-COMPUTE-1-3\SP\UB3LYP\6-31G\C2H5Cr1O1(1+,5)\LIUZHEN\19-Jan-20 10\0\\# ub3lyp/6-31G pop=(nbo,savenbo) gfprint\\E101GECP\\1,5\O,0,-1.7 ... (1 Reply)
Discussion started by: liuzhencc
1 Replies

5. UNIX for Dummies Questions & Answers

Bash script to extract spf records

Hello I am trying to generate a script to run on worldwide firewalls. I need the spf block for large sites like google, etc so I can essentially whitelist google sites for users. (Google here is just an example...) Right now I am just testing Bash oneliners to see how I can isolate the... (1 Reply)
Discussion started by: mbubb
1 Replies

6. Shell Programming and Scripting

script to list out the output in one paragraph

Hi All, I want to run 5 `ps -ef | grep ` cmds in one script and i want the script to give me return code 0 if everything is OK. If it notices one of the processes is not there, it will prompt me the process name and advice me to check it. I've wrote a script that separates the output but I want... (2 Replies)
Discussion started by: fara_aris
2 Replies

7. Linux

Extract a paragraph

Hi , Unix.com has been life saver for me I admit :) I am trying to extract a paragraph based on matching pattern "CREATE TABLE " from a ddl file . The paragraphs are seperated by blank line . Input file is #cat zip.20080604.sql1 CONNECT TO TST103 SET SESSION_USER OPSDM002 ... (2 Replies)
Discussion started by: capri_drm
2 Replies

8. Shell Programming and Scripting

how to extract paragraphs from file in BASH script followed by prefix ! , !! and !!!

I]hi all i am in confusion since last 2 days :( i posted thraed yesterday and some friends did help but still i couldnt get solution to my problem let it be very clear i have a long log file of alkatel switch and i have to seperate the minor major and critical alarms shown by ! , !! and !!!... (6 Replies)
Discussion started by: nabmufti
6 Replies

9. Shell Programming and Scripting

script for a 3 line paragraph

i would like to ask how to make a script that in evry 3 lines of my paragraph(below) it would appear like this: $ cat myparagraph this is line 1 this is line 2 this is line3 this is line 4 this is 5 this 6 this is 7 this 8 ==================================================== $ cat... (2 Replies)
Discussion started by: invinzin21
2 Replies

10. Shell Programming and Scripting

*.pm globs without quoting, *.pl doesn't.

Can someone explain the following? I can use find on *.pm without quotes, but find on *.pl makes on error, I need quotes for the second version. What's up with that? $find -name *.pm ./tieProxyStatus/Status.pm $find -name *.pl find: paths must precede expression Usage: find $find... (2 Replies)
Discussion started by: tphyahoo
2 Replies
Login or Register to Ask a Question