This is the code I currently use (example):
Unfortunately the output looks like this:
As we can see the format is screwed up because we are egrep'ping for 5 values. This was successful for "article1" but "name...xx" is missing in "article2". Therefore "article3" is used as the 5th column in row 2 rather than in column1 of row 3.
So xargs is parsing the wrong format into awk which eventually shifts the table:
------------------------------------
Now the question. Is there a way that egrep, when searching for 5 strings but only finding 4, is replacing a missing string with a replacement word like "missing"? This would ensure xargs -L5 is happy and awk keeps the format for the table.
Or is there a more efficient way of doing this?
The input text file is just an example for a much larger file with hundreds of thousands of lines.
Last edited by methyl; 07-10-2012 at 09:41 PM..
Reason: code tags on the data ; break very long lines; more code tags; somehow get this post readable
What is the expected output, based on the data provided?
Ps. The input data looks pretty random to me. Is there a formal file structure? Can you explain it?
There is no way that someone can write code to processs the sample input provided - it's full of abstracts and random comments.
EDIT: Crossed with Methyl; I made the assumption that the 'article' could be treated as a division. Of course, if that assumption is wrong it all goes out the window.
You were already asked to use code tags in your first post in this thread here by methyl. Please do so, thanks.
---------- Post updated at 08:02 AM ---------- Previous update was at 07:43 AM ----------
The only pattern from the big input file is that the word
is initiating the block of text I am interested in. Then within the next 5 lines after the word
there should be the words
. But sometimes some of the 4 words I am looking for don't exist.
The ideal solution would create an output table and replace the missing word(s) with a replacement word, just to highlight that it does not exist.
Last edited by zaxxon; 07-11-2012 at 03:51 AM..
Reason: code tags
The following approach leverages awk's multiline record abilities (assumes each article block is delimited by at least one blank line) and shamelessly pilfers agama's solution.
In the bash below I am searching the filevirus-scan.log for the Infected files: 0 line (in bold) and each line for OK.
If both of these are true then the function execute is automatically called and processing starts. If both these conditions are not meet then the line in the
file is sent to the... (2 Replies)
Hi
I have a set of input strings in a pattern as given below
string1 string2 string3 string4 string5
I need to search this sequence of strings from a file in such a way that the first two strings (string1 and string2) and last two strings (string4 and string5) should match with the... (8 Replies)
Hi all,
I am a newbie here. I have this requirement to find a file based on a pattern then return the filename if found.
I created a script based on online tutorials. Though, I am stuck & really appreciate if anyone can have a quick look & point me to the right direction?
#Script starts... (10 Replies)
Hi everyone,
I have the following contents in a text file (as seen when viewed using vim):
one two three ^M
four five six ^M
seven
eight
nine ^M
ten eleven twelve ^M
(That is just a small portion of the file)
How can I obtain the following result?
one two three ^M
four five six ^M
seven... (2 Replies)
Right now, my code is:
s/Secondary Ins./Secondary Ins.\
1/g
It's adding a 1 as soon as it finds Secondary Ins.
Primary Ins.: MEDICARE B DMERC Secondary Ins.
1: CONTINENTAL LIFE INS
What I really want to achieve is having a 1 added on the next line that contain "Secondary Ins." It... (4 Replies)
Hi
I want to search for a specific pattern in file
Say
ABC;HELLO_UNIX_WORLD;PQR
ABC;HELLO_UNIX_WORLD_IS_NOT_ENOUGH;XYZ
ABC;HELLO_UNIX_FORUM;LMN
Pattern to search is : "HELLO_UNIX_*****" and not "HELLO_UNIX_***_***_"
I mean after "HELLO_UNIX" there can only be one word.In this case... (2 Replies)
Hi,
I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses.
My Source file
<Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
I am trying to do some thing like this ..
In a file , if pattern found insert new pattern at the begining of the line containing the pattern.
example:
in a file I have this.
gtrow0unit1/gctunit_crrownorth_stage5_outnet_feedthru_pin
if i find feedthru_pin want to insert !! at the... (7 Replies)
I would like to write a script which will read a file containing a list of filenames of the format as shown below :
/usr/local/packages/runcmdlinetool
/home/john.doe/sdfsdf/sdfsdfsd/sdfsdf/sdfsdfTemplates.xml
/usr/local/bin/gtar... (4 Replies)
Anyone is well-versed to use egrep to search a file for a line containing both:
1) AAA
2) $
I am having problem escaping the dollar sign when using egrep in conjunction with satisfying AAA as well.
E.g. Text file
Line 1 AAA
Line 2 $$$
Line 3 AAA BBB $
Line 4 $$$ BBB AA
will return me... (2 Replies)