bash: need to have egrep to return a text string if the search pattern has NOT been found


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting bash: need to have egrep to return a text string if the search pattern has NOT been found
# 1  
Old 07-10-2012
bash: need to have egrep to return a text string if the search pattern has NOT been found

Hello all,

after spending hours of searching the web I decided to create an account here. This is my first post and I hope one of the experts can help.

I need to resolve a grep / sed / xargs / awk problem.

My input file is just like this:

----------------------------------
Code:
root@Ubuntu-12:~# cat myfile 
article1
data.........x
colour....blue
number.........15
name...smith
month...................july

article2
colour....yellow
number.........423489
something....x
month...................january

article3
colour....orange
number.........7
name....jason
month...................may
value.....4
much
more
lines
root@Ubuntu-12:~#

----------------------------------

This is the code I currently use (example):
Code:
grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | \
awk -F . '{print $NF}' ; done | xargs -L5 | \
awk 'BEGIN {printf("%15s %15s %15s %15s %15s\n" ,"Article", "Colours", "Numbers", "Names", "Month")} {printf("%15s %15s %15s %15s %15s\n", $1, $2, $3, $4, $5)}'

Unfortunately the output looks like this:
Code:
        Article         Colours         Numbers           Names           Month
       article1            blue              15           smith            july
       article2          yellow          423489         january        article3
         orange               7           jason             may

As we can see the format is screwed up because we are egrep'ping for 5 values. This was successful for "article1" but "name...xx" is missing in "article2". Therefore "article3" is used as the 5th column in row 2 rather than in column1 of row 3.

So xargs is parsing the wrong format into awk which eventually shifts the table:
Code:
grep "^article[0-9]$" -A5 myfile | while read x ; do echo "$x" | egrep "article|colour|number|name|month" | awk -F . '{print $NF}' ; done | xargs -L5

Code:
article1 blue 15 smith july
article2 yellow 423489 january article3
orange 7 jason may

------------------------------------

Now the question. Is there a way that egrep, when searching for 5 strings but only finding 4, is replacing a missing string with a replacement word like "missing"? This would ensure xargs -L5 is happy and awk keeps the format for the table.

Or is there a more efficient way of doing this?

The input text file is just an example for a much larger file with hundreds of thousands of lines.

Last edited by methyl; 07-10-2012 at 09:41 PM.. Reason: code tags on the data ; break very long lines; more code tags; somehow get this post readable
# 2  
Old 07-10-2012
What is the expected output, based on the data provided?

Ps. The input data looks pretty random to me. Is there a formal file structure? Can you explain it?
There is no way that someone can write code to processs the sample input provided - it's full of abstracts and random comments.

Last edited by methyl; 07-10-2012 at 09:39 PM..
# 3  
Old 07-10-2012
This is how I would have approached it:

Code:
awk -F . '
    function dump( )
    {
        if( stuff["article"] )
            printf( "%s %s %s %s %s\n", stuff["article"], stuff["colour"], stuff["number"], stuff["name"], stuff["month"] );
        else
            printf( "Article Colours Numbers Names Month\n" );
        delete stuff;
    }

    /^article/ {
        dump( );
        stuff["article"] = $NF;
        next;
    }

    { stuff[$1] = $NF; }

    END { dump(); }

' input-file

EDIT: Crossed with Methyl; I made the assumption that the 'article' could be treated as a division. Of course, if that assumption is wrong it all goes out the window.
# 4  
Old 07-11-2012
The expected output is:

Code:
Article         Colours         Numbers           Names           Month
article1        blue            15                smith           july
article2        yellow          423489            --MISSING--     january        
article3        orange          7                 jason           may


Moderator's Comments:
Mod Comment You were already asked to use code tags in your first post in this thread here by methyl. Please do so, thanks.


---------- Post updated at 08:02 AM ---------- Previous update was at 07:43 AM ----------

The only pattern from the big input file is that the word
Code:
article

is initiating the block of text I am interested in. Then within the next 5 lines after the word
Code:
article

there should be the words
Code:
colour, number, name, month

. But sometimes some of the 4 words I am looking for don't exist.

The ideal solution would create an output table and replace the missing word(s) with a replacement word, just to highlight that it does not exist.

Last edited by zaxxon; 07-11-2012 at 03:51 AM.. Reason: code tags
# 5  
Old 07-11-2012
Small tweak to the previously posted script should do what you want:

Code:
awk -F . '
    function dump( )
    {
        if( stuff["article"] )
            printf( "%10s %10s %10s %10s %10s\n", stuff["article"], stuff["colour"], stuff["number"], stuff["name"], stuff["month"] );
        else
            printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", "Month" );
        stuff["article"] = "";
        stuff["colour"] = stuff["number"] = stuff["name"] = stuff["month"]  = "+MISSING+";
    }
    /^article/ {
        dump( );
        stuff["article"] = $NF;
        next;
    }

    { stuff[$1] = $NF; }

    END { dump(); }

' input-file


Last edited by agama; 07-12-2012 at 11:55 PM.. Reason: Corrected typo
These 2 Users Gave Thanks to agama For This Post:
# 6  
Old 07-11-2012
The following approach leverages awk's multiline record abilities (assumes each article block is delimited by at least one blank line) and shamelessly pilfers agama's solution. Smilie

Code:
BEGIN {
    RS=""; FS="\n"; fmt="%-10s %-10s %-10s %-10s %-10s\n"
    printf fmt, "Article", "Colours", "Numbers", "Names", "Month"
}

{
    a["colour"] = a["number"] = a["name"] = a["month"] = "+MISSING+"
    for (i=1, i<=NF, i++) {
        split($i, b, /\.+/)
        if (b[1] in a)
            a[b[1]] = b[2]
    }
    printf fmt, $1, a["colour"], a["number"], a["name"], a["month"]
}

Regards,
Alister

Last edited by alister; 07-11-2012 at 11:28 PM..
This User Gave Thanks to alister For This Post:
# 7  
Old 07-12-2012
That's awesome. Thank you so much.

---------- Post updated at 07:54 AM ---------- Previous update was at 07:49 AM ----------

@ agama, there was just one little typo:

is:
Code:
printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", :Month" );

should be:
Code:
printf( "%10s %10s %10s %10s %10s\n", "Article", "Colours", "Numbers", "Names", "Month" );

This User Gave Thanks to bash4ever For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash to search file for string and lauch function if found

In the bash below I am searching the filevirus-scan.log for the Infected files: 0 line (in bold) and each line for OK. If both of these are true then the function execute is automatically called and processing starts. If both these conditions are not meet then the line in the file is sent to the... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Search a text and return the text from file

Hi I have a set of input strings in a pattern as given below string1 string2 string3 string4 string5 I need to search this sequence of strings from a file in such a way that the first two strings (string1 and string2) and last two strings (string4 and string5) should match with the... (8 Replies)
Discussion started by: my_Perl
8 Replies

3. UNIX for Dummies Questions & Answers

How to find a file based on pattern & return the filename if found?

Hi all, I am a newbie here. I have this requirement to find a file based on a pattern then return the filename if found. I created a script based on online tutorials. Though, I am stuck & really appreciate if anyone can have a quick look & point me to the right direction? #Script starts... (10 Replies)
Discussion started by: buster_t
10 Replies

4. Shell Programming and Scripting

Avoid carriage return until ^M is found (CentOS 6, bash 4.1)

Hi everyone, I have the following contents in a text file (as seen when viewed using vim): one two three ^M four five six ^M seven eight nine ^M ten eleven twelve ^M (That is just a small portion of the file) How can I obtain the following result? one two three ^M four five six ^M seven... (2 Replies)
Discussion started by: gacanepa
2 Replies

5. UNIX for Dummies Questions & Answers

Append a string on the next line after a pattern string is found

Right now, my code is: s/Secondary Ins./Secondary Ins.\ 1/g It's adding a 1 as soon as it finds Secondary Ins. Primary Ins.: MEDICARE B DMERC Secondary Ins. 1: CONTINENTAL LIFE INS What I really want to achieve is having a 1 added on the next line that contain "Secondary Ins." It... (4 Replies)
Discussion started by: newbeee
4 Replies

6. UNIX for Dummies Questions & Answers

Search specific pattern in file and return number of occurence

Hi I want to search for a specific pattern in file Say ABC;HELLO_UNIX_WORLD;PQR ABC;HELLO_UNIX_WORLD_IS_NOT_ENOUGH;XYZ ABC;HELLO_UNIX_FORUM;LMN Pattern to search is : "HELLO_UNIX_*****" and not "HELLO_UNIX_***_***_" I mean after "HELLO_UNIX" there can only be one word.In this case... (2 Replies)
Discussion started by: dashing201
2 Replies

7. Shell Programming and Scripting

extract specific line if the search pattern is found

Hi, I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses. My Source file <Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
Discussion started by: Sekar1
3 Replies

8. Shell Programming and Scripting

search a pattern and if pattern found insert new pattern at the begining

I am trying to do some thing like this .. In a file , if pattern found insert new pattern at the begining of the line containing the pattern. example: in a file I have this. gtrow0unit1/gctunit_crrownorth_stage5_outnet_feedthru_pin if i find feedthru_pin want to insert !! at the... (7 Replies)
Discussion started by: pitagi
7 Replies

9. Shell Programming and Scripting

Help with pattern search and return

I would like to write a script which will read a file containing a list of filenames of the format as shown below : /usr/local/packages/runcmdlinetool /home/john.doe/sdfsdf/sdfsdfsd/sdfsdf/sdfsdfTemplates.xml /usr/local/bin/gtar... (4 Replies)
Discussion started by: inditopgun
4 Replies

10. UNIX for Advanced & Expert Users

Using egrep to search for Text and special char

Anyone is well-versed to use egrep to search a file for a line containing both: 1) AAA 2) $ I am having problem escaping the dollar sign when using egrep in conjunction with satisfying AAA as well. E.g. Text file Line 1 AAA Line 2 $$$ Line 3 AAA BBB $ Line 4 $$$ BBB AA will return me... (2 Replies)
Discussion started by: izy100
2 Replies
Login or Register to Ask a Question