Capturing multi-line records containing known value? Post: 302404854

Sponsored Content

Top Forums Shell Programming and Scripting Capturing multi-line records containing known value? Post 302404854 by cs03dmj on Wednesday 17th of March 2010 10:39:52 AM

03-17-2010

Registered User

Thanks for the inspiration, vgersh99! Revised code:

Code:

# FindR - Find(e)r - Find( )R(ecords)
# Finds certain records within decoded files given one or more search patterns...
# v3.0 - 20100317 - cs03dmj

# Usage Information:
usage() {
        echo "Usage: $0 decoded_filname search_pattern(s)"
        echo && echo "e.g. to find records containing a value:"
        echo "  $0 myfile monkey"
        echo && echo "e.g. to find records containing several values:"
        echo "  $0 myfile monkey banana"
        echo && echo "e.g. to find records containing specific data, use one or more regular expressions in quotes:"
        echo "  $0 mydecodedfile \"m.*y\" \"b.*a\""
        echo && echo "N.B. If output is redirected to a file, user messages will not be included..." && echo
        # Exit with an errored return code:
        exit 1
}

# If the number of parameters ($#) is less than two (i.e. filename and one search pattern) then display usage information:
[ $# -lt 2 ] && usage

# If the first paramater ($1) is not a valid file, report the problem then display usage information:
[ ! -f $1 ] && echo "File $1 does not exist." && usage

# Use nawk as awk can't handle large files:
nawk '
BEGIN {
        # Records are split by one or more blank lines, so set the Record Separator (RS) appropriately:
        RS = ""
        # ARGV is the built-in array of passed parameters.
        # ARGC is the number of elements in array ARGV
        # ARGV[0] = nawk
        # ARGV[1] = decoded_filename
        # Parse remaining parameters (ARGV[2 ...]) into "searches" array:
        for (i = 2; i < ARGC; i++) {
                searches[i] = ARGV[i]
                # Delete the parameter so is not treated as another input file:
                delete ARGV[i]
        }
        # Create a variable to count the total number of matches (if any):
        totalmatches = 0
}

# For every record:
{
        # For every search pattern in "searches" array:
        for (search in searches) {
                # If the record does not contain a matching search, skip to the next record:
                if ( $0 !~ searches[search] ) { next }
        }
        # If we have made it this far, all searches have been matched, so print the record followed by a newline:
        print $0 "\n"
        # Increment the total matches variable:
        totalmatches++
}

# At the end of the file:
END {
        # Notify the user (via stderr) how many matches have been found:
        print "...found " totalmatches " matching record(s) from " NR " total record(s)." > "/dev/stderr/"
}
' "$@" # <-- Here's where we pass all the parameters to nawk for processing, using $@ in quotes to ensure that whitespaces are kept and wildcards are not expanded by the shell.

cs03dmj

View Public Profile for cs03dmj

Find all posts by cs03dmj

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK Multi-Line Records Processing

I am an Awk newbie and cannot wrap my brain around my problem: Given multi-line records of varying lengths separated by a blank line I need to skip the first two lines of every record and extract every-other line in each record unless the first line of the record has the word "(CONT)" in the...

2. Shell Programming and Scripting

AWK Multi-Line Records Numbering Problem

I have a set of files of multi-line records with the records separated by a blank line. I needed to add a record number to the front of each line followed by a colon and did the following: awk 'BEGIN {FS = "\n"; RS = ""}{for (i=1; i<=NF; i++)print NR,":",$i}' ~/Desktop/data98-1-25.txt >...

3. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that...

4. UNIX for Dummies Questions & Answers

Alphabetical sort for multi line records contains in a single file

Hi all, I So, I've got a monster text document comprising a list of various company names and associated info just in a long list one after another. I need to sort them alphabetically by name... The text document looks like this: Company Name: the_first_company's_name_here Address:...

5. Shell Programming and Scripting

Capturing the invalid records to error file

HI, I have a source file which has the below data. Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table.txt sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6 Tableid,table sourceid,1,2,3,4,5,6 targetid,1,2,3,4,5,6...

6. Shell Programming and Scripting

Transpose multi-line records into a single row

Now that I've parsed out the data that I desire I'm left with variable length multi-line records that are field seperated by new lines (\n) and record seperated by a single empty line ("") At first I was considering doing something like this to append all of the record rows into a single row: ...

7. Shell Programming and Scripting

Joining multi-line output to a single line in a group

Hi, My Oracle query is returing below o/p ---------------------------------------------------------- Ins trnas value a lkp1 x a lkp1 y b lkp1 a b lkp2 x b lkp2 y ...

8. Shell Programming and Scripting

Multi-line filtering based on multi-line pattern in a file

I have a file with data records separated by multiple equals signs, as below. ========== RECORD 1 ========== RECORD 2 DATA LINE ========== RECORD 3 ========== RECORD 4 DATA LINE ========== RECORD 5 DATA LINE ========== I need to filter out all data from this file where the...

9. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2

10. Shell Programming and Scripting

Multi line log files to single line format

I want to read the log file which was generate from other command . And the output was having multi line in log files for job name and server name. But i need to make all the logs on one line Source file 07/15/2018 17:02:00 TRANSLOG_1700 Server0005_SQL ...

LEARN ABOUT MINIX

roff

is  a  text formatter.	Its input consists of the text to be out-
put, intermixed with formatting commands.  A  formatting  command
is  a  line  containing  the  control character followed by a two
character command name, and possibly one or more arguments.   The
control  character is initially . (dot).  The formatted output is
produced on standard output.  The formatting commands are  listed
below, with being a number, being a character, and being a title.
A + before n means it may be signed,  indicating  a  positive  or
negative change from the current value.  Initial values for where
relevant, are given in parentheses.
  .ad	  Adjust right margin.
  .ar	  Arabic page numbers.
  .br	  Line break.  Subsequent text will begin on a new line.
  .bl n   Insert n blank lines.
  .bp +n  Begin new page and number it n. No n means +1.
  .cc c   Control character is set to c.
  .ce n   Center the next n input lines.
  .de zz  Define a macro called zz. A line with .. ends definition.
  .ds	  Double space the output. Same as .ls 2.
  .ef t   Even page footer title is set to t.
  .eh t   Even page header title is set to t.
  .fi	  Begin filling output lines as full as possible.
  .fo t   Footer titles (even and odd) are set to t.
  .hc c   The character c (e.g., %) tells roff where hyphens are permitted.
  .he t   Header titles (even and odd) are set to t.
  .hx	  Header titles are suppressed.
  .hy n   Hyphenation is done if n is 1, suppressed if it is 0. Default is 1.
  .ig	  Ignore input lines until a line beginning with .. is found.
  .in n   Indent n spaces from the left margin; force line break.
  .ix n   Same as .in but continue filling output on current line.
  .li n   Literal text on next n lines.  Copy to output unmodified.
  .ll +n  Line length (including indent) is set to n (65).
  .ls +n  Line spacing: n (1) is 1 for single spacing, 2 for double, etc.
  .m1 n   Insert n (2) blank lines between top of page and header.
  .m2 n   Insert n (2) blank lines between header and start of text.
  .m3 n   Insert n (1) blank lines between end of text and footer.
  .m4 n   Insert n (3) blank lines between footer and end of page.
  .na	  No adjustment of the right margin.
  .ne n   Need n lines.  If fewer are left, go to next page.
  .nn +n  The next n output lines are not numbered.
  .n1	  Number output lines in left margin starting at 1.
  .n2 n   Number output lines starting at n.  If 0, stop numbering.
  .ni +n  Indent line numbers by n (0) spaces.
  .nf	  No more filling of lines.
  .nx f   Switch input to file f.
  .of t   Odd page footer title is set to t.
  .oh t   Odd page header title is set to t.
  .pa +n  Page adjust by n (1).  Same as .bp
  .pl +n  Paper length is n (66) lines.
  .po +n  Page offset.	Each line is started with n (0) spaces.
  .ro	  Page numbers are printed in Roman numerals.
  .sk n   Skip n pages (i.e., make them blank), starting with next one.
  .sp n   Insert n blank lines, except at top of page.
  .ss	  Single spacing.  Equivalent to .ls 1.
  .ta	  Set tab stops, e.g., .ta 9 17 25 33 41 49 57 65 73 (default).
  .tc c   Tabs are expanded into c.  Default is space.
  .ti n   Indent next line n spaces; then go back to previous indent.
  .tr ab  Translate a into b on output.
  .ul n   Underline the letters and numbers in the next n lines.