A sample file.txt contains this data (actual text from Wikipedia):
In June 2000, ''Bookface, Inc.'' launched the website [URL="http://www.Bookface.com"]www.Bookface.com[/URL], a "Read on Demand" service precipitated both by the concurrent [[print on demand]] boom, and launching during the hype surrounding [[Stephen King]]'s online-only novella ''The Plant'', which had been <launched in July>, 1999.<ref>[http://www.kirjasto.sci.fi/sking.htm Stephen King Bio at ''Books & Writers'']. <Accessed January 27>, 2008</ref> Bookface delivered "whole books and excerpts to readers directly", with publishers including [[HarperCollins]], Penguin Puttnam, [[Random House]] and Time Warner Trade Publishing lined up to provide Bookface with content.<ref name="findarticles.com">[http://findarticles.com/p/articles/mi_m0EIN/is_2000_June_2/ai_62434142 Bookface.com Opens Books Online; Innovative Website Gives Readers Direct Access to Books; www.bookface.com to Launch With Involvement of Major Publishers], June 2, 2000. Accessed January 27, 2008</ref>
There are thousands of files this is example data.
I'd like to extract the text between the <ref></ref> pairs. Note that some of the ref pairs start with <ref name="findarticles.com"> where the name= portion could be just about anything and ends in ">". Or there may be no name= at all and start with <ref>. They always end in </ref>. Also the text between the ref pairs may contain other < and > characters (though no nested <ref></ref> pairs). Finally, file.txt will be accessed as a string via readfile(), not via getline.
This is what I have so far (this is a code-fragment from a longer awk script which does other unrelated stuff ie. the readfile method is needed for other reasons):
This works, except when the text between the ref pairs contains "<" or ">", as in the first ref pair in the above data ("<Accessed January 27>")
Hi!
I want to made a program that will generate code like this:
{{Navedi XYZ
|avtor=XYZ1
|naslov=XYZ2
|leto_izzida=XYZ3
|zalozba=XYZ4
|kraj=XYZ5
|isbn=XYZ6
|cobiss_id=XYZ7
}}
from input like this:
<b> ODGOVORNOST............. : <a... (5 Replies)
I have a regexp that I wish to match against every line of a file using awk.
But I do not want to substitute it or select the line.
I want to pull the matched text out and put it in a different file, line by line.
What is the correct awk usage to *extract* a regexp and put it in another... (11 Replies)
Hello,
I am trying to covert a for statement into a single awk script and I've got everything but one part.
I also need to execute an external script when "not found", how can I do that ?
for TXT in `find debugme -name "*.txt"` ;do
FPATH=`echo $TXT | sed 's/\(.*\)\/\(.*\)/\1/'`
how... (7 Replies)
hi everyone
suppose my input file is
ABC-12345
ABCD-12345
BCD-123456
i want to search the specific pattern which looks like
-
in a file so i used this command
cat $file | awk ' { if ($0 ~ /-/) { print } }'
so it gives me the result as
ABCD-12345
BCD-12345
BCD-12345
... (31 Replies)
Hi can you suggest in this regard
The sample.txt conatins the data
name lines type
sam 12 txt
sam 24 xls
sam 36 pdf
ram 32 txt
ram 45 sxls
ram 58 word
sam 92 jpeg
sam 21 gif
sam 22 ltf
from the data i need to sum all line... (5 Replies)
Hi all,
Can someone tell me what's the (g)awk equal of this simple regex to find ip addresses in urls:
egrep "^http://{1,3}\.{1,3}\.{1,3}\.{1,3}(:{1,5})?/"Input:
http://10.0.0.1/query.exe
http://11y10x09w:80/howaboutme
http://192.168.100.190:1234/takeme.gpg
Output:... (8 Replies)
Hi all the experts out there,
I am totally new to perl and I was given an assignment by using Perl to find the 2nd element of each line in each curly bracket which made up of 5 elements.
Expected result should like this:
Type: VCC Pin_name: AK32,AL32,AH21,.....
Type: NC Pin_name:... (2 Replies)
Experts and Informed folks,
Need some help here in parsing the log file.
1389675 Opera_ShirtCatalog INSERT INTO Opera_ShirtCatalog(COL1, COL2) VALUES (1, 'TEST1'), (2,'TEST2');
1389685 Opera_ShirtCatlog_Wom INSERT INTO Opera_ShirtCatlog_Wom(col1, col2, col3) VALUES (9,'Siz12, FormFit',... (12 Replies)
Hello I have a file like :
20120918000001413 | 1.17.163.89 | iSelfcare | MSISDN | N
20120918000001806 | 1.33.27.100 | iSelfcare | 5564 | N
....
I want to extract all lines that have on 4th field (considering "|" the separator ) something other than just digits. I want to do this using a... (5 Replies)
Hello to all,
I have:
X="string 1-"
Y="-string 2"
Z="string 1-20-string 2"In the position of the number 20 could be different numbers, but I'm interest only when the number is 15, 20,45 or 70.
I want to include an IF within an awk code with a regex in the following way.
... (12 Replies)
Discussion started by: Ophiuchus
12 Replies
LEARN ABOUT OPENSOLARIS
tag
tag(3tcl)tag(3tcl)NAME
tag - Manipulate tagged files
SYNOPSIS
tag option ?arg arg ...?
DESCRIPTION
The tag procedure provides a number of options for manipulating tagged files.
COMMANDS
tag readfile filename
Reads the file with the given filename and returns a list where each list element is a tag record, which is represented by a list of
label-value pairs, or label-value-endlabel triples.
The tag header is the first element returned.
tag writefile filename list
Takes a list in the format used internally in tcl programs for tagged data and writes it as a tagged file.
tag extract list tests
Takes a list in tagged format, and a list of conditions, and returns a new list in tagged format which contains those tag records
which match the conditions.
The tests is a list of test items, each of which is a list of the form { labelname condition matchvalue }
The conditions are
== String equals
!= String not equal
<= Less than or equal
-in Is the test value a member of the list given as the matchvalue
-contains
Does the match value contain the test value as a case insensitive substring.
-earlier
Date earlier
-later Date later - dates are in ISO format (yyyy-mm-dd [hh:mm:ss]).
-exists
Does the label exist in this record.
BUGS
tag readfile reads the whole file into memory before turning it into a list. Should be more memory efficient.
The -earlier and -later comparisons require TCL8.3
AUTHOR
John Lines (john@paladin.demon.co.uk)
July 3, 2000 tag(3tcl)