Extract paragraphs and count them


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract paragraphs and count them
# 15  
Old 03-13-2017
Hi,

Basically, print is a variable that we clear at the start of every block. We then set the variable if and only if we encounter a block that we want to print (that is, a block which contains an Error or Warning line). When we get to the end of the current block, we check to see if the print variable is set. If it is, we then proceed with printing out what we need. If it isn't, then we know we don't need to print anything from this block, as it contains no errors or warnings. So we then move on, and at the next start of a block we unset the variable, and so on.
# 16  
Old 03-13-2017
Quote:
Originally Posted by drysdalk
Hi,

This solution is a bit less efficient since it now relies on external binaries rather than shell built-ins, but for every block that has a Warning or Error, this will print out the Institution ID and the text of the error or warning.

Code:
#!/bin/bash

IFS=''

input=EXTRN071_copy.txt
tmp=/tmp/script.tmp

echo institution,errormessage
while read -r line
do
        case "$line" in
                *BEGIN\ MESSAGE*)
                        unset print
                        echo "$line" > "$tmp"
                        ;;
                *END\ MESSAGE*)
                        echo "$line" >> "$tmp"

                        if [ "$print" == "1" ]
                        then
                                institution=`/usr/bin/awk '$0 ~ /   Institution/ {sub(/\r$/,""); print $NF}' "$tmp"`
                                errormessage=`/bin/grep -E -A2 "^Warning|^Error" "$tmp" | /usr/bin/tail -1`
                                echo $institution,$errormessage
                        fi
                        ;;
                Warning*|Error*)
                        print=1
                        echo "$line" >> "$tmp"
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac
done < "$input"

Sample output:
Code:
$ ./script.sh 
institution,errormessage
00000029,Original presentment Not Found !
00000029,Non-financial original Slip Not Found !
00000029,Processing Failed For Transaction!
00000046,Transaction type of chargeback is not the same as that of original presentment.
00000046,Transaction type of chargeback is not the same as that of original presentment.
00000041,Original presentment Not Found !
00000041,Non-financial original Slip Not Found !
00000041,Processing Failed For Transaction!
00000041,Original presentment Not Found !
00000041,Non-financial original Slip Not Found !
00000041,Processing Failed For Transaction!
00000050,Original presentment Not Found !
00000050,Non-financial original Slip Not Found !
00000050,Processing Failed For Transaction!
00000050,Original presentment Not Found !
00000050,Non-financial original Slip Not Found !
00000050,Processing Failed For Transaction!
00000007,Original Transaction Not Found !
00000007,Processing Failed For Transaction!
00000007,No transactions processed!
00000007,PROCESSING ERROR! - check log for error messages.
$

Hope this helps in the meantime.

EDIT: If you want the output sorted, change the last line to:

done < "$input" | /usr/bin/sort
Your script is amazing. Thanks for that. Some what gives me the same output I was looking for. Thanks a lot again.

However, I wanna know why are you using the print keyword and then unsetting it and then again setting it?
# 17  
Old 03-13-2017
Hi,

I could have called it anything, yes. I just happened to call it print. The fact it's a variable name, and always has a $ symbol before it to make it clear to the shell it's a variable name, means this does not interfere with anything else that may or may not exist as a built-in, or elsewhere.
This User Gave Thanks to drysdalk For This Post:
# 18  
Old 03-13-2017
Quote:
Originally Posted by drysdalk
Hi,

Basically, print is a variable that we clear at the start of every block. We then set the variable if and only if we encounter a block that we want to print (that is, a block which contains an Error or Warning line). When we get to the end of the current block, we check to see if the print variable is set. If it is, we then proceed with printing out what we need. If it isn't, then we know we don't need to print anything from this block, as it contains no errors or warnings. So we then move on, and at the next start of a block we unset the variable, and so on.
but what was the advantage of using print? If I understood correctly, we could have used any vaiable name for that matter and using print is putting some sort of a check?

And I guess I am wrong in mentioning print as a keyword in bash

---------- Post updated at 05:10 PM ---------- Previous update was at 05:02 PM ----------

@drysdalk I got what you are doing with the grep and tail command, but could you please explain to me what you are doing with the awk command and the characters you are using in that statement

---------- Post updated at 05:21 PM ---------- Previous update was at 05:10 PM ----------

@drysdalk I got what you are doing with the grep and tail command, but could you please explain to me what you are doing with the awk command and the characters you are using in that statement
# 19  
Old 03-13-2017
Hi,

Sure, no problem. There are a few different things to this /usr/bin/awk '$0 ~ / Institution/ {sub(/\r$/,""); print $NF}' "$tmp" line, so we'll take them in turn.

$0 ~ / Institution/
This is pattern-matching. What we're saying here is that we only want to consider the current input (represented by $0) further if it contains (the meaning of ~ in this context) the exact string " Institution" (that's the word 'Institution' with three spaces in front of it). If that pattern-matching check passes, we move on to the next bit of the line.

sub(/\r$/,"");
Now this was something I didn't actually expect to have to do, and it kind of caught me out. As it turns out, the example file you've provided has Windows-style end-of-lines, rather than UNIX-style. This was catching me out when trying to print the Institution ID numbers, since being the last field on the line, they also included the Windows-style end-of-line characters, and it messed with the output.

So what this awk substitution command is doing is looking for lines that end with a carriage return character, and replacing them with nothing, so we only have the line feed character to mark the end of a line. This makes the end of line "normal", from the perspective of a UNIX-style system.

Now that the line has been sanitised and stripped of all characters we don't need and would interfere with our later output (after already being sure we've found a line with the exact string we're looking for), we move on to the last bit of the awk line.

print $NF
This is the easiest one of the bunch, and prints the last field on the line (which in our case, is the Institution Number).

So the full explanation of this awk line in English would be:
  • Look for lines that contain the exact string " Institution"...
  • and then strip them of Windows-style line-ends, leaving UNIX-style line ends...
  • and finally print out the last field of the remaining line.

Hope this helps.

Last edited by drysdalk; 03-14-2017 at 05:51 AM..
This User Gave Thanks to drysdalk For This Post:
# 20  
Old 03-14-2017
Yes. I almost forgot tell you that the log file has carriage return characters at the end. What I was normally doing at my end was dos2unix and then move on from there.

One last question when it comes to this thread: Say I want to input a different pattern every time I run the script. (You had provided me with the script for searching for a pattern in the other thread). I have no problem when entering a string with spaces in between, storing it in a variable and then grepping that variable from the log file. However, the problem comes when there is an exclamation at the end of the string; which is normally the case in the log file if you remember. The grepping does not work when I include the exclamation in the stored variable. How do I overcome that problem?

Thanks a lot for all your help
# 21  
Old 03-14-2017
Hi,

I think I understand what you mean. Is this what you're after ?

Code:
#!/bin/bash

IFS=''

input=EXTRN071_copy_UNIX.txt
tmp=/tmp/script.tmp

if [ "$1" == "" ]
then
        echo You must provide a search string/regex
        exit 1
else
        search="$1"
fi

echo institution,message
while read -r line
do
        case "$line" in
                *BEGIN\ MESSAGE*)
                        unset print
                        echo "$line" > "$tmp"
                        ;;
                *END\ MESSAGE*)
                        echo "$line" >> "$tmp"

                        if [ "$print" == "1" ]
                        then
                                institution=`/usr/bin/awk '$0 ~ /   Institution/ {sub(/\r$/,""); print $NF}' "$tmp"`
                                message=`/bin/grep -E -A2 "$search" "$tmp" | /usr/bin/tail -1`
                                echo $institution,$message
                        fi
                        ;;
                $search)
                        print=1
                        echo "$line" >> "$tmp"
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac
done < "$input" | /usr/bin/sort

A sample session follows.

Code:
$ ./script.sh "Warning !"
institution,message
00000007,No transactions processed!
00000007,PROCESSING ERROR! - check log for error messages.
00000007,Processing Failed For Transaction!
00000029,Non-financial original Slip Not Found !
00000029,Original presentment Not Found !
00000029,Processing Failed For Transaction!
00000041,Non-financial original Slip Not Found !
00000041,Non-financial original Slip Not Found !
00000041,Original presentment Not Found !
00000041,Original presentment Not Found !
00000041,Processing Failed For Transaction!
00000041,Processing Failed For Transaction!
00000046,Transaction type of chargeback is not the same as that of original presentment.
00000046,Transaction type of chargeback is not the same as that of original presentment.
00000050,Non-financial original Slip Not Found !
00000050,Non-financial original Slip Not Found !
00000050,Original presentment Not Found !
00000050,Original presentment Not Found !
00000050,Processing Failed For Transaction!
00000050,Processing Failed For Transaction!
$ ./script.sh "Error!"
institution,message
00000007,Original Transaction Not Found !
$ ./script.sh "Information message !"
institution,message
00000004,Exception Processing - Sundry Types!
00000004,Phase 2  - Processing Institution\File Number 
00000004,Phase 3 - Processing Institution\File Number
00000004,Posting Phase Started
00000004,Processing Complete! Check Logs For Messages.
00000007,Exception Processing - Sundry Types!
00000029,Exception Processing - Sundry Types!
00000029,Phase 2  - Processing Institution\File Number 
00000029,Phase 3 - Processing Institution\File Number
00000029,Posting Phase Started
00000029,Processing Complete! Check Logs For Messages.
00000035,Exception Processing - Sundry Types!
00000035,Phase 2  - Processing Institution\File Number 
00000035,Phase 3 - Processing Institution\File Number
00000035,Posting Phase Started
00000035,Processing Complete! Check Logs For Messages.
00000036,Exception Processing - Sundry Types!
00000036,Phase 2  - Processing Institution\File Number 
00000036,Phase 3 - Processing Institution\File Number
00000036,Posting Phase Started
00000036,Processing Complete! Check Logs For Messages.
00000041,Exception Processing - Sundry Types!
00000041,Phase 2  - Processing Institution\File Number 
00000041,Phase 3 - Processing Institution\File Number
00000041,Posting Phase Started
00000041,Processing Complete! Check Logs For Messages.
00000043,Exception Processing - Sundry Types!
00000043,Phase 2  - Processing Institution\File Number 
00000043,Phase 3 - Processing Institution\File Number
00000043,Posting Phase Started
00000043,Processing Complete! Check Logs For Messages.
00000046,Exception Processing - Sundry Types!
00000046,Phase 2  - Processing Institution\File Number 
00000046,Phase 3 - Processing Institution\File Number
00000046,Posting Phase Started
00000046,Processing Complete! Check Logs For Messages.
00000049,Exception Processing - Sundry Types!
00000049,Phase 2  - Processing Institution\File Number 
00000049,Phase 3 - Processing Institution\File Number
00000049,Posting Phase Started
00000049,Processing Complete! Check Logs For Messages.
00000050,Exception Processing - Sundry Types!
00000050,Phase 2  - Processing Institution\File Number 
00000050,Phase 3 - Processing Institution\File Number
00000050,Posting Phase Started
00000050,Processing Complete! Check Logs For Messages.
00000054,Exception Processing - Sundry Types!
00000054,Phase 2  - Processing Institution\File Number 
00000054,Phase 3 - Processing Institution\File Number
00000054,Posting Phase Started
00000054,Processing Complete! Check Logs For Messages.
$

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract lines that have dupliucate and count them

Dear friends i have big file and i want to export the filw with new column for the lines that have same duplicate value in first column : ex : , ex : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50... (9 Replies)
Discussion started by: is2_egypt
9 Replies

2. Shell Programming and Scripting

Extract count of string in all files and display on date wise

Hi All, hope you all are doing well! I kindly ask you for shell scripting help, here is the description: I have huge number of files shown below on date wise, which contains different strings(numbers you can say) including 505001 and 602001. ... (14 Replies)
Discussion started by: VasuKukkapalli
14 Replies

3. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

4. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

5. Shell Programming and Scripting

Need help with sorting in paragraphs

I am very new to shell scripting, current try to do a sorting of a text file in paragraphs with ksh script. example: File content: A1100001 line 1 = "testing" line 2 = something, line 3 = 100 D1200003 line 1 = "testing" line 2 = something, line 3 = 100 B1200003 line 1 =... (3 Replies)
Discussion started by: gavin_L
3 Replies

6. Shell Programming and Scripting

Extract paragraphs under conditions

Hi all, I want to extract some paragraphs out of a file under certain conditions. - The paragraph must start with 'fmri' - The paragraph must contain the string 'restarter svc:/system/svc/restarter:default' My input is like that : fmri svc:/system/vxpbx:default state_time Wed... (4 Replies)
Discussion started by: Armoric
4 Replies

7. Shell Programming and Scripting

Extract string from multiple file based on line count number

Hi, I search all forum, but I can not find solutions of my problem :( I have multiple files (5000 files), inside there is this data : FILE 1: 1195.921 -898.995 0.750312E-02-0.497526E-02 0.195382E-05 0.609417E-05 -2021.287 1305.479-0.819754E-02 0.107572E-01 0.313018E-05 0.885066E-05 ... (15 Replies)
Discussion started by: guns
15 Replies

8. Shell Programming and Scripting

How to extract specific data and count number containing sets from a file?

Hello everybody! I am quit new here and hope you can help me. Using an awk script I am trying to extract data from several files. The structure of the input files is as follows: TimeStep parameter1 parameter2 parameter3 parameter4 e.g. 1 X Y Z L 1 D H Z I 1 H Y E W 2 D H G F 2 R... (2 Replies)
Discussion started by: Daniel8472
2 Replies

9. Shell Programming and Scripting

how to filter out some paragraphs in a file

Hi, I am trying to filter out those paragraphs that contains 'CONNECT', 'alter system switch logfile'. That means say the input file is : ------------------------------------------------------- Wed Jun 7 00:32:31 2006 ACTION : 'CONNECT' CLIENT USER: prdadm CLIENT TERMINAL: Wed Jun 7... (7 Replies)
Discussion started by: cnlhap
7 Replies

10. Shell Programming and Scripting

how to extract paragraphs from file in BASH script followed by prefix ! , !! and !!!

I]hi all i am in confusion since last 2 days :( i posted thraed yesterday and some friends did help but still i couldnt get solution to my problem let it be very clear i have a long log file of alkatel switch and i have to seperate the minor major and critical alarms shown by ! , !! and !!!... (6 Replies)
Discussion started by: nabmufti
6 Replies
Login or Register to Ask a Question