Please help with code for this.
I want to parse several huge files and summarize relevant information into columns.
The columns of output are title, pagebegin,pageend, author1,author2....,author8, abstract. Column descriptions are as follows.
Title
Line after single integer value in a particular line.The preceeding entire line
has only one value. In the example it is 3.
example
3
Building transformational leadership
title = Building transformational leadership
Pages
Preceeded by keyword "Pages"
pagebegin will be first value after keyword "Pages"
pageend will be value after pagebegin and '-'
Example
Pages 309-323
pagebegin = 309
pageend = 323
Authors
Immediate next line after "Pages" line separated by commas. Can be upto 8 authors. Only last name needed.
Pages 309-323
Peter Sun, H. Anderson
author1 = Sun
author2 = Anderson
...
Abstract
Text between keywords "Abstract" and "Article Outline"
This will be some work and it is going to become complex. Let us address one problem after the other. I suggest to use sed for this sort of text manipulating tasks.
The general way of addressing this is to retrieve one column after the other, collect the respective info into hold space, finally put the hold space to pattern space and print the line.
We start with trying to find out where a "record" starts by searching for a line with a single number on it. The next line is thought to be a title and the start of a new record. We clear the hold space and then trim the title to a fixed number of characters by first appending x spaces to it, then cutting everything after the first x characters. (I used 20 here, modify it to whatever number you see fit. You will have to change it in both substitute-statements.) Finally collect the title into the hold space:
Next are the lines with "Pages". We trim the text from them, then pad with spaces like the titles, this time for 15 characters:
The authors are hard, because we have to imply what the first name and what the family name is. This can't be captured with a simple regexp. If it is always "John Doe" and never "Doe, John" (or vice versa) it is easy to retrieve the first (or second, respectively) name, but if both forms are mixed you will have to correct by hand.
Another thing is that the line with the author names has no distinction. Is it always the line next after the "Pages"-line? If so, the following will work, otherwise i simply see no pattern to match for.
The names handling might need some explanation:
Every last name is followed by a comma or the line end. I substitute therefore a comma at the line end, then throw out every word, which isn't followed by a comma - the "not-last-names".
Finally i remove the last comma and add spaces as necessary. Then the column is trimmed to 25 characters and added to the hold space.
You should be able to take it from there. Simply retrieve the abstracts text, replace everything between the first two and the last two words with "..." and add this to the hold space, then output the whole.
If you still have troubles ask again and we will go over it again.
Hi,
My syslog in AIX forwards all user facility to a specific log /logs/user.log
I need to further segregate the user.log to logs specific to various applications and i was wondering if i can make some configuration change to syslog.conf to forward messages based on a certain keyword?
for... (2 Replies)
I have below text file only with one line:
vi test.txt
This is the first test from a1.loa1 a1v1, b2.lob2, "c3.loc3" c3b1, loc4 but not from mot3 and second test from a5.loa5
Below should be the output that i want:
a1.loa1
b2.lob2
c3.loc3
loc4
a5.loa5
alv1 and c3b1 should be... (3 Replies)
Hi
I am trying to determine number of lines having a specific keyword.
So for that I am using below query:
grep -i 'keyword1' filename|wc -l
This give me number of lines. Perfect for me.
However now the requirement is
I have multiple keywords together... and I have to find number of... (3 Replies)
Hi
I want to extract all the words between two keywords HELLO & BYE.
eg:
Input
1_HELLO_HOW_ARE_YOU_BYE_TEST
1_HELLO_WHERE_ARE_BYE_TEST
1_HELLO_HOW_BYE_TEST
Output Required:
HOW_ARE_YOU
WHERE_ARE
HOW (7 Replies)
hey guys,
Hey all,
I'm doing a project currently and want to index words in a webpage.
So there would be a file with webpage content and a file with list of words, I want an output file with true and false that would show which word exists in the webpage.
example:
Webpage content... (2 Replies)
Hey guys,
I have this file generated by me... i want to create some HTML output from it.
The problem is that i am really confused about how do I go about reading the file.
The file is in the following format:
TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Hi,
what is the difference between use and require keywords in Perl.
What is the significance of these lines (what it mean, what is the use of this)
#!/usr/bin/perl -w // In Perl script....
#!/bin/ksh //In shell script.....
Thanks
Sweta (2 Replies)