bash: need to have egrep to return a text string if the search pattern has NOT been found

07-10-2012

Registered User

3, 1

Join Date: Jul 2012

Last Activity: 10 January 2019, 6:31 AM EST

Posts: 3

Thanks Given: 2

Thanked 1 Time in 1 Post

bash: need to have egrep to return a text string if the search pattern has NOT been found

Hello all,

after spending hours of searching the web I decided to create an account here. This is my first post and I hope one of the experts can help.

I need to resolve a grep / sed / xargs / awk problem.

My input file is just like this:

----------------------------------

Code:

root@Ubuntu-12:~# cat myfile 
article1
data.........x
colour....blue
number.........15
name...smith
month...................july

article2
colour....yellow
number.........423489
something....x
month...................january

article3
colour....orange
number.........7
name....jason
month...................may
value.....4
much
more
lines
root@Ubuntu-12:~#

----------------------------------

This is the code I currently use (example):

Code:

grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | \
awk -F . '{print $NF}' ; done | xargs -L5 | \
awk 'BEGIN {printf("%15s %15s %15s %15s %15s\n" ,"Article", "Colours", "Numbers", "Names", "Month")} {printf("%15s %15s %15s %15s %15s\n", $1, $2, $3, $4, $5)}'

Unfortunately the output looks like this:

Code:

        Article         Colours         Numbers           Names           Month
       article1            blue              15           smith            july
       article2          yellow          423489         january        article3
         orange               7           jason             may

As we can see the format is screwed up because we are egrep'ping for 5 values. This was successful for "article1" but "name...xx" is missing in "article2". Therefore "article3" is used as the 5th column in row 2 rather than in column1 of row 3.

So xargs is parsing the wrong format into awk which eventually shifts the table:

Code:

grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | awk -F . '{print $NF}' ; done | xargs -L5

Code:

article1 blue 15 smith july
article2 yellow 423489 january article3
orange 7 jason may

------------------------------------

Now the question. Is there a way that egrep, when searching for 5 strings but only finding 4, is replacing a missing string with a replacement word like "missing"? This would ensure xargs -L5 is happy and awk keeps the format for the table.

Or is there a more efficient way of doing this?

The input text file is just an example for a much larger file with hundreds of thousands of lines.

Last edited by methyl; 07-10-2012 at 09:41 PM.. Reason: code tags on the data ; break very long lines; more code tags; somehow get this post readable

bash4ever

View Public Profile for bash4ever

Find all posts by bash4ever

07-10-2012

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

What is the expected output, based on the data provided?

Ps. The input data looks pretty random to me. Is there a formal file structure? Can you explain it?
There is no way that someone can write code to processs the sample input provided - it's full of abstracts and random comments.

Last edited by methyl; 07-10-2012 at 09:39 PM..

methyl

View Public Profile for methyl

Find all posts by methyl

07-10-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

This is how I would have approached it:

Code:

awk -F . '
    function dump( )
    {
        if( stuff["article"] )
            printf( "%s %s %s %s %s\n", stuff["article"], stuff["colour"], stuff["number"], stuff["name"], stuff["month"] );
        else
            printf( "Article Colours Numbers Names Month\n" );
        delete stuff;
    }

    /^article/ {
        dump( );
        stuff["article"] = $NF;
        next;
    }

    { stuff[$1] = $NF; }

    END { dump(); }

' input-file

EDIT: Crossed with Methyl; I made the assumption that the 'article' could be treated as a division. Of course, if that assumption is wrong it all goes out the window.

agama

View Public Profile for agama

Find all posts by agama

07-11-2012

Registered User

3, 1

Join Date: Jul 2012

Last Activity: 10 January 2019, 6:31 AM EST

Posts: 3

Thanks Given: 2

Thanked 1 Time in 1 Post

The expected output is:

Code:

Article         Colours         Numbers           Names           Month
article1        blue            15                smith           july
article2        yellow          423489            --MISSING--     january        
article3        orange          7                 jason           may

Moderator's Comments:

You were already asked to use code tags in your first post in this thread here by methyl. Please do so, thanks.

---------- Post updated at 08:02 AM ---------- Previous update was at 07:43 AM ----------

The only pattern from the big input file is that the word

Code:

article

is initiating the block of text I am interested in. Then within the next 5 lines after the word

Code:

article

there should be the words

Code:

colour, number, name, month

. But sometimes some of the 4 words I am looking for don't exist.

The ideal solution would create an output table and replace the missing word(s) with a replacement word, just to highlight that it does not exist.

Last edited by zaxxon; 07-11-2012 at 03:51 AM.. Reason: code tags

bash4ever

View Public Profile for bash4ever

Find all posts by bash4ever

07-11-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Small tweak to the previously posted script should do what you want:

Code:

awk -F . '
    function dump( )
    {
        if( stuff["article"] )
            printf( "%10s %10s %10s %10s %10s\n", stuff["article"], stuff["colour"], stuff["number"], stuff["name"], stuff["month"] );
        else
            printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", "Month" );
        stuff["article"] = "";
        stuff["colour"] = stuff["number"] = stuff["name"] = stuff["month"]  = "+MISSING+";
    }
    /^article/ {
        dump( );
        stuff["article"] = $NF;
        next;
    }

    { stuff[$1] = $NF; }

    END { dump(); }

' input-file

Last edited by agama; 07-12-2012 at 11:55 PM.. Reason: Corrected typo

These 2 Users Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

07-11-2012

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

The following approach leverages awk's multiline record abilities (assumes each article block is delimited by at least one blank line) and shamelessly pilfers agama's solution.

Code:

BEGIN {
    RS=""; FS="\n"; fmt="%-10s %-10s %-10s %-10s %-10s\n"
    printf fmt, "Article", "Colours", "Numbers", "Names", "Month"
}

{
    a["colour"] = a["number"] = a["name"] = a["month"] = "+MISSING+"
    for (i=1, i<=NF, i++) {
        split($i, b, /\.+/)
        if (b[1] in a)
            a[b[1]] = b[2]
    }
    printf fmt, $1, a["colour"], a["number"], a["name"], a["month"]
}

Regards,
Alister

Last edited by alister; 07-11-2012 at 11:28 PM..

This User Gave Thanks to alister For This Post:

alister

View Public Profile for alister

Find all posts by alister

07-12-2012

Registered User

3, 1

Join Date: Jul 2012

Last Activity: 10 January 2019, 6:31 AM EST

Posts: 3

Thanks Given: 2

Thanked 1 Time in 1 Post

That's awesome. Thank you so much.

---------- Post updated at 07:54 AM ---------- Previous update was at 07:49 AM ----------

@ agama, there was just one little typo:

is:

Code:

printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", :Month" );

should be:

Code:

printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", "Month" );

This User Gave Thanks to bash4ever For This Post:

bash4ever

View Public Profile for bash4ever

Find all posts by bash4ever

Shell Programming and Scripting

bash: need to have egrep to return a text string if the search pattern has NOT been found

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to search file for string and lauch function if found

Discussion started by: cmccabe

2. Shell Programming and Scripting

Search a text and return the text from file

Discussion started by: my_Perl

3. UNIX for Dummies Questions & Answers

How to find a file based on pattern & return the filename if found?

Discussion started by: buster_t

4. Shell Programming and Scripting

Avoid carriage return until ^M is found (CentOS 6, bash 4.1)

Discussion started by: gacanepa

5. UNIX for Dummies Questions & Answers

Append a string on the next line after a pattern string is found

Discussion started by: newbeee

6. UNIX for Dummies Questions & Answers

Search specific pattern in file and return number of occurence

Discussion started by: dashing201

7. Shell Programming and Scripting

extract specific line if the search pattern is found

Discussion started by: Sekar1

8. Shell Programming and Scripting

search a pattern and if pattern found insert new pattern at the begining

Discussion started by: pitagi

9. Shell Programming and Scripting

Help with pattern search and return

Discussion started by: inditopgun

10. UNIX for Advanced & Expert Users

Using egrep to search for Text and special char

Discussion started by: izy100