Sponsored Content
Full Discussion: Parsing with keywords
Top Forums Shell Programming and Scripting Parsing with keywords Post 302696299 by bakunin on Tuesday 4th of September 2012 04:38:42 PM
Old 09-04-2012
This will be some work and it is going to become complex. Let us address one problem after the other. I suggest to use sed for this sort of text manipulating tasks.

The general way of addressing this is to retrieve one column after the other, collect the respective info into hold space, finally put the hold space to pattern space and print the line.

We start with trying to find out where a "record" starts by searching for a line with a single number on it. The next line is thought to be a title and the start of a new record. We clear the hold space and then trim the title to a fixed number of characters by first appending x spaces to it, then cutting everything after the first x characters. (I used 20 here, modify it to whatever number you see fit. You will have to change it in both substitute-statements.) Finally collect the title into the hold space:

Code:
sed -n '/^[0-9]$/ {
               n
               s/$/                    /
               s/^\(.\{20\}\).*$/\1/
               x
        }
        x ; s/\n//gp'

Next are the lines with "Pages". We trim the text from them, then pad with spaces like the titles, this time for 15 characters:

Code:
sed -n '/^[0-9]$/ {
               n
               s/$/                    /
               s/^\(.\{20\}\).*$/\1/
               x
               d
        }
        /^Pages/ {
               s/^Pages //
               s/$/               /
               s/^\(.\{15\}\).*$/\1/
               H
        }
        x ; s/\n//gp'

The authors are hard, because we have to imply what the first name and what the family name is. This can't be captured with a simple regexp. If it is always "John Doe" and never "Doe, John" (or vice versa) it is easy to retrieve the first (or second, respectively) name, but if both forms are mixed you will have to correct by hand.

Another thing is that the line with the author names has no distinction. Is it always the line next after the "Pages"-line? If so, the following will work, otherwise i simply see no pattern to match for.

The names handling might need some explanation:

Code:
John Doe, Jane Doe, George Miller

Every last name is followed by a comma or the line end. I substitute therefore a comma at the line end, then throw out every word, which isn't followed by a comma - the "not-last-names".

Code:
John Doe, Jane Doe, George Miller,
Doe,Doe,Miller,
Doe,Doe,Miller
Doe, Doe, Miller

Finally i remove the last comma and add spaces as necessary. Then the column is trimmed to 25 characters and added to the hold space.

Code:
sed -n '/^[0-9]$/ {
               n
               s/$/                    /
               s/^\(.\{20\}\).*$/\1/
               x
        }
        /^Pages/ {
               s/^Pages //
               s/$/               /
               s/^\(.\{15\}\).*$/\1/
               H
               n
               s/$/,/
               s/ *[^ ]*[^,]//g
               s/,$//
               s/,\([^ ]\)/, \1/g
               s/$/                         /
               s/^\(.\{25\}\).*$/\1/
               H
        }
        x ; s/\n//gp'

You should be able to take it from there. Simply retrieve the abstracts text, replace everything between the first two and the last two words with "..." and add this to the hold space, then output the whole.

If you still have troubles ask again and we will go over it again.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regarding use and require keywords

Hi, what is the difference between use and require keywords in Perl. What is the significance of these lines (what it mean, what is the use of this) #!/usr/bin/perl -w // In Perl script.... #!/bin/ksh //In shell script..... Thanks Sweta (2 Replies)
Discussion started by: sweta
2 Replies

2. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

3. Shell Programming and Scripting

How to cut id between keywords?

Hi, how to cut id from line ? ....<a class='adata' href='User.php?uid=545554'>.... to 545554 (3 Replies)
Discussion started by: Trump
3 Replies

4. Shell Programming and Scripting

Search a file with keywords

Hi All I have a file of format asdf asf first sec endi asdk rt 123 ferf dfg ijglkp (7 Replies)
Discussion started by: mailabdulbari
7 Replies

5. Shell Programming and Scripting

searching keywords in file

hey guys, Hey all, I'm doing a project currently and want to index words in a webpage. So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage. example: Webpage content... (2 Replies)
Discussion started by: Johanni
2 Replies

6. Shell Programming and Scripting

Extract word between two KEYWORDS

Hi I want to extract all the words between two keywords HELLO & BYE. eg: Input 1_HELLO_HOW_ARE_YOU_BYE_TEST 1_HELLO_WHERE_ARE_BYE_TEST 1_HELLO_HOW_BYE_TEST Output Required: HOW_ARE_YOU WHERE_ARE HOW (7 Replies)
Discussion started by: dashing201
7 Replies

7. Shell Programming and Scripting

Grep Keywords one by one

Hi I am trying to determine number of lines having a specific keyword. So for that I am using below query: grep -i 'keyword1' filename|wc -l This give me number of lines. Perfect for me. However now the requirement is I have multiple keywords together... and I have to find number of... (3 Replies)
Discussion started by: dashing201
3 Replies

8. Shell Programming and Scripting

How to grep keywords?

I have below text file only with one line: vi test.txt This is the first test from a1.loa1 a1v1, b2.lob2, "c3.loc3" c3b1, loc4 but not from mot3 and second test from a5.loa5 Below should be the output that i want: a1.loa1 b2.lob2 c3.loc3 loc4 a5.loa5 alv1 and c3b1 should be... (3 Replies)
Discussion started by: khchong
3 Replies

9. AIX

Filtering keywords from syslog.

Hi, My syslog in AIX forwards all user facility to a specific log /logs/user.log I need to further segregate the user.log to logs specific to various applications and i was wondering if i can make some configuration change to syslog.conf to forward messages based on a certain keyword? for... (2 Replies)
Discussion started by: roshan.171188
2 Replies
All times are GMT -4. The time now is 07:13 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy