Sponsored Content
Full Discussion: awk or regex
Top Forums Shell Programming and Scripting awk or regex Post 302311800 by smihael on Wednesday 29th of April 2009 03:01:45 PM
Old 04-29-2009
awk or regex

Hi!

I want to made a program that will generate code like this:
{{Navedi XYZ
|avtor=XYZ1
|naslov=XYZ2
|leto_izzida=XYZ3
|zalozba=XYZ4
|kraj=XYZ5
|isbn=XYZ6
|cobiss_id=XYZ7
}}

from input like this:
<b> ODGOVORNOST............. : <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=CB&run=yes&SS1=&quo t;Tauber,%20Daniel%20A.&quot;">Tauber, Daniel A.</a> - zbiratelj</b>
<b> NASLOV.................. : #The #complete Linux kit</b>
<b> IMPRESUM................ : San Francisco [etc.] : Sybex, 1995</b>
<b> FIZIČNI OPIS............ : XXIII, 419 str. ; 23 cm + CD-ROM</b>
<b> ISBN.................... : 0-7821-1669-8</b>
<b> PREDMETNE OZNAKE........ : <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;racunalnistvo&quot;">računalništvo</a> // <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;operacijski%20sistemi&quot;">operacijski sistemi</a></b>
<b>// <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;linux&quot;">linux</a> // <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;unix&quot;">unix</a> // <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;programska%20oprema&quot;">programska oprema</a> // <a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;Internet&quot;">Internet</a> //</b>
<b><a href="http://cobiss2.izum.si/scripts/cobiss?ukaz=FFRM&mode=5&id=2047435453134563&PF1=AU&PF2=TI&PF3=PY&PF4=KW&CS=a&PF5=SU&run=yes&SS5=&quo t;komercialni%20sistemi&quot;">komercialni sistemi</a></b>
<b> UDK..................... : 681.3.06, 519.68</b>
<b> UDK ZA STATISTIKO....... : 62+66/69</b>
<b> VRSTA GRADIVA........... : monografska publikacija, tekstovno gradivo,</b>
<b>tiskano</b>


<b> COBISS.SI-ID............ : 2952</b>
in this example the code would be:
{{Navedi CD-ROM
|avtor= Daniel A. Tauber
|naslov=The complete Linux kit
|kraj=San Francisco
|zalozba=Sybex
|leto=1995
|cobiss_id=2952
|isbn=0-7821-1669-8
}}

This is needed on the Slovenian Wikisource, since some users gave only link to page on national bibliographic system (COBISS - COBISS/OPAC), but we need to cite all these things...
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting a regex with awk

I have a regexp that I wish to match against every line of a file using awk. But I do not want to substitute it or select the line. I want to pull the matched text out and put it in a different file, line by line. What is the correct awk usage to *extract* a regexp and put it in another... (11 Replies)
Discussion started by: Enobarbus37
11 Replies

2. Shell Programming and Scripting

AWK regex to find only numbers

Hi guys I need to find both negative and positive numbers from the following text file. And i also dont need 0. 0 8 -7 -2268 007 -07 -00 -0a0 0a0 -07a0 7a00 0a0 Can someone please give a regex to filter out the values in red. I tried a few things in awk but it didnt work... (9 Replies)
Discussion started by: sridanu
9 Replies

3. Shell Programming and Scripting

awk variables in regex expression ?

Hello, Could someone explain why this one returns nothing: $ x=/jon/ $ echo jon | awk -v xa=$x '$1~xa {print}' $ while the following works fine: $ x=jon $ echo jon | awk -v xa=$x '$1==xa {print}' $ jon and the following works fine: $ echo jon | awk '$1~/jon/ {print}' $ jon ... (3 Replies)
Discussion started by: vilius
3 Replies

4. Shell Programming and Scripting

awk regex problem

hi everyone suppose my input file is ABC-12345 ABCD-12345 BCD-123456 i want to search the specific pattern which looks like - in a file so i used this command cat $file | awk ' { if ($0 ~ /-/) { print } }' so it gives me the result as ABCD-12345 BCD-12345 BCD-12345 ... (31 Replies)
Discussion started by: aishsimplesweet
31 Replies

5. Shell Programming and Scripting

awk with multiple regex and substring

Hi Experts, I have a file on which i want to print the line which should match following criterias. Line should not start with 0 or 9 and Line should start with 1 and ( 576th character should not be 1 or 2 or 576-580 postion should not be NIPPF or CDIPB or 576-581 postion should... (2 Replies)
Discussion started by: millan
2 Replies

6. UNIX for Dummies Questions & Answers

Using AWK and regex

Hi can you suggest in this regard The sample.txt conatins the data name lines type sam 12 txt sam 24 xls sam 36 pdf ram 32 txt ram 45 sxls ram 58 word sam 92 jpeg sam 21 gif sam 22 ltf from the data i need to sum all line... (5 Replies)
Discussion started by: krashraj
5 Replies

7. Shell Programming and Scripting

awk equivalent of regex

Hi all, Can someone tell me what's the (g)awk equal of this simple regex to find ip addresses in urls: egrep "^http://{1,3}\.{1,3}\.{1,3}\.{1,3}(:{1,5})?/"Input: http://10.0.0.1/query.exe http://11y10x09w:80/howaboutme http://192.168.100.190:1234/takeme.gpg Output:... (8 Replies)
Discussion started by: r4v3n
8 Replies

8. Shell Programming and Scripting

awk regex- include text

Hi I am trying to filter some data using awk. I have a statement- awk 'BEGIN { FS = "\n" ; RS = "" } { if ( $6 = "City: " ) { print "City: Unknown" } else { print $6 } }'` The $6 values are City: London City: Madrid City: City: Tokyo This expression seems to catch all the lines... (4 Replies)
Discussion started by: jamie_123
4 Replies

9. Shell Programming and Scripting

wildcard in regex for awk

Hello I have a file like : 20120918000001413 | 1.17.163.89 | iSelfcare | MSISDN | N 20120918000001806 | 1.33.27.100 | iSelfcare | 5564 | N .... I want to extract all lines that have on 4th field (considering "|" the separator ) something other than just digits. I want to do this using a... (5 Replies)
Discussion started by: black_fender
5 Replies

10. Shell Programming and Scripting

Regex within IF statement in awk

Hello to all, I have: X="string 1-" Y="-string 2" Z="string 1-20-string 2"In the position of the number 20 could be different numbers, but I'm interest only when the number is 15, 20,45 or 70. I want to include an IF within an awk code with a regex in the following way. ... (12 Replies)
Discussion started by: Ophiuchus
12 Replies
regex(1F)							   FMLI Commands							 regex(1F)

NAME
regex - match patterns against a string SYNOPSIS
regex [-e] [-v "string"] [pattern template] ... pattern [template] DESCRIPTION
The regex command takes a string from the standard input, and a list of pattern / template pairs, and runs regex() to compare the string against each pattern until there is a match. When a match occurs, regex writes the corresponding template to the standard output and returns TRUE. The last (or only) pattern does not need a template. If that is the pattern that matches the string, the function simply returns TRUE. If no match is found, regex returns FALSE. The argument pattern is a regular expression of the form described in regex(). In most cases, pattern should be enclosed in single quotes to turn off special meanings of characters. Note that only the final pattern in the list may lack a template. The argument template may contain the strings $m0 through $m9, which will be expanded to the part of pattern enclosed in ( ... )$0 through ( ... )$9 constructs (see examples below). Note that if you use this feature, you must be sure to enclose template in single quotes so that FMLI does not expand $m0 through $m9 at parse time. This feature gives regex much of the power of cut(1), paste(1), and grep(1), and some of the capabilities of sed(1). If there is no template, the default is $m0$m1$m2$m3$m4$m5$m6$m7$m8$m9. OPTIONS
The following options are supported: -e Evaluates the corresponding template and writes the result to the standard output. -v "string" Uses string instead of the standard input to match against patterns. EXAMPLES
Example 1 Cutting letters out of a string To cut the 4th through 8th letters out of a string (this example will output strin and return TRUE): `regex -v "my string is nice" '^.{3}(.{5})$0' '$m0'` Example 2 Validating input in a form In a form, to validate input to field 5 as an integer: valid=`regex -v "$F5" '^[0-9]+$'` Example 3 Translating an environment variable in a form In a form, to translate an environment variable which contains one of the numbers 1, 2, 3, 4, 5 to the letters a, b, c, d, e: value=`regex -v "$VAR1" 1 a 2 b 3 c 4 d 5 e '.*' 'Error'` Note the use of the pattern '.*' to mean "anything else". Example 4 Using backquoted expressions In the example below, all three lines constitute a single backquoted expression. This expression, by itself, could be put in a menu defini- tion file. Since backquoted expressions are expanded as they are parsed, and output from a backquoted expression (the cat command, in this example) becomes part of the definition file being parsed, this expression would read /etc/passwd and make a dynamic menu of all the login ids on the system. `cat /etc/passwd | regex '^([^:]*)$0.*$' ' name=$m0 action=`message "$m0 is a user"`'` DIAGNOSTICS
If none of the patterns match, regex returns FALSE, otherwise TRUE. NOTES
Patterns and templates must often be enclosed in single quotes to turn off the special meanings of characters. Especially if you use the $m0 through $m9 variables in the template, since FMLI will expand the variables (usually to "") before regex even sees them. Single characters in character classes (inside []) must be listed before character ranges, otherwise they will not be recognized. For exam- ple, [a-zA-Z_/] will not find underscores (_) or slashes (/), but [_/a-zA-Z] will. The regular expressions accepted by regcmp differ slightly from other utilities (that is, sed, grep, awk, ed, and so forth). regex with the -e option forces subsequent commands to be ignored. In other words, if a backquoted statement appears as follows: `regex -e ...; command1; command2` command1 and command2 would never be executed. However, dividing the expression into two: `regex -e ...``command1; command2` would yield the desired result. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsu | +-----------------------------+-----------------------------+ SEE ALSO
awk(1), cut(1), grep(1), paste(1), sed(1), regcmp(3C), attributes(5) SunOS 5.11 12 Jul 1999 regex(1F)
All times are GMT -4. The time now is 02:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy