Extract paragraphs and count them


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract paragraphs and count them
# 1  
Old 03-13-2017
Extract paragraphs and count them

Hi,

I have a text with a number of paragraphs in them. My problem is I need to locate certain errors/warning and extract/count them. Problem is I do not know how many paras are there with that particular type of error/warning. I had thought that somehow if I could count the number of paras/blocks in the complete text file-> then extract all blocks/paras with a particular type of warning/error so that it would lessen those lines/blocks/paras from the original text file, it would ultimately give me a count of 0 in the original text file, but whatever I have found in google related to perl or bash, it is just confusing me since I know little of actual scripting.

I am on
Linux 2.6.18-417.el5 #1 SMP Sat Nov 19 14:54:59 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

I have also included the complete text file for the gurus

Thanks a lot again.
# 2  
Old 03-13-2017
Hi,

If these errors or warnings can only ever occur once per each section of your input file, then all you'd need to do is search for all instances of those errors or warnings and count how many you've found. That would then tell you how many sections contained these errors. If they can occur multiple times per section of course that would complicate things.

If you can provide information on what these error/warning messages in your input are expected to look like, and if they will only ever appear once per paragraph, that would be a good way forward for starters. As of just now you haven't actually said exactly what it is in the file that constitutes the warning/error that you're interested in.
# 3  
Old 03-13-2017
Hi @drysdalk,

Sorry about that. totally missed to write about the specific errors/warnings.

1. Each block starts with a BEGIN MESSAGE and an END MESSAGE.

2. Each block would have just one error or warning

3. Each error/warning message is preceded by a Warning(space)! or Error!

Eg, 'Original Transaction Not Found !' -> This particular error message would occur only once in a block.

The challenge I am facing is how do I find out what that error/warning message text is.
# 4  
Old 03-13-2017
Hi,

OK, thanks. Are you just wanting to count up how many errors/warnings there are for informational purposes, or do you need to find all blocks that contain these and print them out in their entirety ?
# 5  
Old 03-13-2017
Hi,

To the point -> I am actually looking for the entire block which gives me that error/warning.

Here is what people in my office normally do using Windows -> Open up notepad++, search for a warning/error text -> on the first occurrence that is found, we cut that block out and move it to a new file; thereby reducing the overall count in the original file. Every occurrence of that error/warning in the original file, we cut that block out, and paste it under the new file was opened. This way each new file contains only those errors/warnings. This gives us the count of the error messages as well as a sorted output since each new file only contains those particular errors/warnings.

And then when that is done, we do other manual thing of finding the (tab)Institution Number: and the Acquirer Reference: but that is a totally different requirement.

So you what I meant. It's just a long and frustrating way of finding out information which can be repetitive and mistake ridden as well.

Sorry to be throwing all of the information at once. But I am just tired of this manual way. And thanks for all your help
# 6  
Old 03-13-2017
Hi,

This is an adaptation of the script I provided to your question from the other week, which I think will do what you need.

Code:
#!/bin/bash

input=EXTRN071_copy.txt
tmp=/tmp/script.tmp

while read -r line
do
        case "$line" in
                *BEGIN\ MESSAGE*)
                        unset print
                        echo "$line" > "$tmp"
                        ;;
                *END\ MESSAGE*)
                        echo "$line" >> "$tmp"

                        if [ "$print" == "1" ]
                        then
                                /bin/cat "$tmp"
                                echo
                        fi
                        ;;
                Warning*|Error*)
                        print=1
                        echo "$line" >> "$tmp"
                        ;;
                *)
                        echo "$line" >> "$tmp"
                        ;;
        esac
done < "$input"

So the basic idea is:
  • Read in the file, and consider it one line at a time
  • If the line contains "BEGIN MESSAGE" clear the 'print' variable, and over-write the temp file with the current line
  • If the line contains "END MESSAGE" add the line to the temp file, and if the 'print' variable is set, print out the whole temp file to stdout
  • If the line starts with "Error" or Warning" set the variable 'print' to 1, and add the line to the temp file
  • If the line starts with anything else, add the line to the temp file


Hope this does the trick. If not, let me now and I'll have another crack at it.

EDIT: If you need to preserve the spaces, tabs and other formatting at the start of the lines before the text begins, add a line like this at the top of the script after the shebang line:

IFS='' (that's two single-quotes, and not a double-quote)

Last edited by drysdalk; 03-13-2017 at 12:09 PM..
# 7  
Old 03-13-2017
Thanks. You're a genius. I remember the other script you gave me and I did try to fiddle around with it, but always messed up trying to find error and warning messages.

The output does print out all the errors and warning. Now if I wanna find out the particular type of warning/errors and their related blocks, do I need to copy the output and move it to a file and then do the related search? Is there a way to sort a complete block, like i would sort lines and uniq and then count it by a wc -l. Im sorry if I am asking a lot, but since you give me outputs in a jiffy, i thought I'll take the risk Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract lines that have dupliucate and count them

Dear friends i have big file and i want to export the filw with new column for the lines that have same duplicate value in first column : ex : , ex : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50... (9 Replies)
Discussion started by: is2_egypt
9 Replies

2. Shell Programming and Scripting

Extract count of string in all files and display on date wise

Hi All, hope you all are doing well! I kindly ask you for shell scripting help, here is the description: I have huge number of files shown below on date wise, which contains different strings(numbers you can say) including 505001 and 602001. ... (14 Replies)
Discussion started by: VasuKukkapalli
14 Replies

3. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

4. Shell Programming and Scripting

Extract and count number of Duplicate rows

Hi All, I need to extract duplicate rows from a file and write these bad records into another file. And need to have a count of these bad records. i have a command awk ' {s++} END { for(i in s) { if(s>1) { print i } } }' ${TMP_DUPE_RECS}>>${TMP_BAD_DATA_DUPE_RECS}... (5 Replies)
Discussion started by: Arun Mishra
5 Replies

5. Shell Programming and Scripting

Need help with sorting in paragraphs

I am very new to shell scripting, current try to do a sorting of a text file in paragraphs with ksh script. example: File content: A1100001 line 1 = "testing" line 2 = something, line 3 = 100 D1200003 line 1 = "testing" line 2 = something, line 3 = 100 B1200003 line 1 =... (3 Replies)
Discussion started by: gavin_L
3 Replies

6. Shell Programming and Scripting

Extract paragraphs under conditions

Hi all, I want to extract some paragraphs out of a file under certain conditions. - The paragraph must start with 'fmri' - The paragraph must contain the string 'restarter svc:/system/svc/restarter:default' My input is like that : fmri svc:/system/vxpbx:default state_time Wed... (4 Replies)
Discussion started by: Armoric
4 Replies

7. Shell Programming and Scripting

Extract string from multiple file based on line count number

Hi, I search all forum, but I can not find solutions of my problem :( I have multiple files (5000 files), inside there is this data : FILE 1: 1195.921 -898.995 0.750312E-02-0.497526E-02 0.195382E-05 0.609417E-05 -2021.287 1305.479-0.819754E-02 0.107572E-01 0.313018E-05 0.885066E-05 ... (15 Replies)
Discussion started by: guns
15 Replies

8. Shell Programming and Scripting

How to extract specific data and count number containing sets from a file?

Hello everybody! I am quit new here and hope you can help me. Using an awk script I am trying to extract data from several files. The structure of the input files is as follows: TimeStep parameter1 parameter2 parameter3 parameter4 e.g. 1 X Y Z L 1 D H Z I 1 H Y E W 2 D H G F 2 R... (2 Replies)
Discussion started by: Daniel8472
2 Replies

9. Shell Programming and Scripting

how to filter out some paragraphs in a file

Hi, I am trying to filter out those paragraphs that contains 'CONNECT', 'alter system switch logfile'. That means say the input file is : ------------------------------------------------------- Wed Jun 7 00:32:31 2006 ACTION : 'CONNECT' CLIENT USER: prdadm CLIENT TERMINAL: Wed Jun 7... (7 Replies)
Discussion started by: cnlhap
7 Replies

10. Shell Programming and Scripting

how to extract paragraphs from file in BASH script followed by prefix ! , !! and !!!

I]hi all i am in confusion since last 2 days :( i posted thraed yesterday and some friends did help but still i couldnt get solution to my problem let it be very clear i have a long log file of alkatel switch and i have to seperate the minor major and critical alarms shown by ! , !! and !!!... (6 Replies)
Discussion started by: nabmufti
6 Replies
Login or Register to Ask a Question