regular expression grouping across multiple lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting regular expression grouping across multiple lines
# 1  
Old 09-26-2012
regular expression grouping across multiple lines

Code:
cat book.txt

book1 price 23
      sku   1234
      auth  Bill
book2 sku   1233
      price 22
      auth  John
book3 auth  Frank
      price 24
book4 price 25
      sku   129
      auth  Tod

import re
f = open('book.txt', 'r')
text = f.read()
f.close()
m = re.findall(r'(\w{5})\sprice\s(\d+)', text)
m


[('book1', '23'), ('book4', '25')]

desired output:
[('book1', '23'), ('book2', '22'), ('book3', '24') ('book4', '25') ]

Just started learning RE. Is RE the proper tool for this type of extraction?

1. Each index(book) have fix length; so no need to worry about stuff like book1022
2. Each book could have just one attribute or more attributes.

Thanks!!

---------- Post updated at 07:01 PM ---------- Previous update was at 06:47 PM ----------

Getting closer Smilie

Code:
>>> m = re.findall('(book\d)\sprice\s(\d+)|(book\d).+\n+.+price\s(\d+)', text)
>>> m
[('book1', '23', '', ''), ('', '', 'book2', '22'), ('', '', 'book3', '24'), ('book4', '25', '', '')]

---------- Post updated at 07:50 PM ---------- Previous update was at 07:01 PM ----------

Never mind...
It only semi-worked when price is on the 1st or 2nd line.

fail in this case

Code:
cat book.txt

book1 price 23
      sku   1234
      auth  Bill
book2 sku   1233
      price 22
      auth  John
book3 auth  Frank
      price 24
book4 price 25
      sku   129
      auth  Tod
book5 auth Joe
      sku   129
      price 13

---------- Post updated at 07:52 PM ---------- Previous update was at 07:50 PM ----------

Code:
missing book5

[('book1', '23', '', ''), ('', '', 'book2', '22'), ('', '', 'book3', '24'), ('book4', '25', '', '')]

---------- Post updated at 08:03 PM ---------- Previous update was at 07:52 PM ----------

yea!! dotall and non-greedy seems to be working ok

Code:
>>> m = re.findall(r'(book\d)\sprice\s(\d+)|(book\d).+?price\s(\d+)', text, re.DOTALL)
>>> m
[('book1', '23', '', ''), ('', '', 'book2', '22'), ('', '', 'book3', '24'), ('book4', '25', '', '')]

# 2  
Old 09-26-2012
This is a simple script that I believe will do what you want, Try it out:
Code:
$ cat book.sh
line_out="["
while read p1 p2 p3
do
  if [ ${p1:0:1} = "b" -a ${p2:0:1} = "p" ]; then
    line_out="$line_out('$p1', '$p3'), "
  elif [ ${p1:0:1} = "b" -a ${p2:0:1} != "p" ]; then
    line_out="$line_out('$p1', "
  elif [ ${p1:0:1} = "p" ]; then
    line_out="$line_out '$p2'), "
  fi
done <input_file
#line_out="$line_out]"
line_out="${line_out%, }]"
echo $line_out

$ cat input_file
book1 price 23
      sku   1234
      auth  Bill
book2 sku   1233
      price 22
      auth  John
book3 auth  Frank
      price 24
book4 price 25
      sku   129
      auth  Tod

$ book.sh
[('book1', '23'), ('book2', '22'), ('book3', '24'), ('book4', '25')]

# 3  
Old 09-26-2012
duh..

Code:
>>> f = open('book.txt', 'r')
>>> text = f.read()
>>> f.close()
>>> m = re.findall(r'(book\d).+?price\s(\d+)', text, re.DOTALL)
>>> m
[('book1', '23'), ('book2', '22'), ('book3', '24'), ('book4', '25'), ('book5', '13')]

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

BASH - Regular Expressions :Looking for one word on multiple lines.

Im looking for a bash solution that will use Regular Expressions (not perl, sed or awk) to check the example data below and then give me a status. which would be just simply Match or Mismatch. SYS PS1 is present. Fan status: Normal Input Voltage status: Normal ... (5 Replies)
Discussion started by: popeye
5 Replies

2. Shell Programming and Scripting

Regular expression to match multiple lines?

Using a regular expression, I would like multiple lines to be matched. By default, a period (.) matches any character except newline. However, (?s) and /s modifiers are supposed to force . to accept a newline and to match any character including a newline. However, the following two perl... (4 Replies)
Discussion started by: LessNux
4 Replies

3. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies

4. UNIX for Dummies Questions & Answers

Finding lines with a regular expression, replacing them with blank lines

So the tag for this forum says all newbies welcome... All I want to do is go through my file and find lines which contain a given string of characters then replace these with a blank line. I really tried to find a simple command to do this but failed. Here's what I did come up with though: ... (2 Replies)
Discussion started by: Golpette
2 Replies

5. UNIX for Dummies Questions & Answers

delete lines matching a regular expression

I have a very large file (over 700 million lines) that has some lines that I need to delete. An example of 5 lines of the file: HS4_80:8:2303:19153:193032 153 k80:138891 HS4_80:8:2105:5544:43174 89 k88:81949 165 k88:81949 323 0 * = 323 0 ... (6 Replies)
Discussion started by: pathunkathunk
6 Replies

6. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

7. Shell Programming and Scripting

Would like to print 3 lines after a regular expression is found in the logfile

I would like to print 3 lines after a regular expression is found in the logfile. I'm using the following code: grep -n "$reg_exp" file.txt |while read LINE ;do i=$(echo $LINE |cut -d':' -f1 ) ;sed -n "$i,$(($i+3))p" file.txt ;done The above code things works fine,but sometimes gives erroneous... (3 Replies)
Discussion started by: joachimshaun
3 Replies

8. Shell Programming and Scripting

sed not printing lines before a regular expression.

Hey, I found a way to print the lines which is just before a regular expression, not including the expression. sed -n '/regexp/{n;p;}' myfile Now I'm looking for a way to print all lines, exept the regular expression and also the line before the same regular expression. Use code tags. (1 Reply)
Discussion started by: Livio
1 Replies

9. Shell Programming and Scripting

regular expression grepping lines with VARIOUS number of blanks

Hi, I need a regular expression grepping all lines starting with '*' followed by a VARIOUS number of blanks and then followed by the string 'Runjob=1'. I tried that code, but it doesn't work: grep -i '*'+'Runjob=1' INPUT_FILE >>OUTPUT_FILE Can someone help me? Thanks (8 Replies)
Discussion started by: ABE2202
8 Replies

10. Shell Programming and Scripting

regular expression across some lines

I am trying to use regular expression to identify ONLY the commands that hasn't the word "tablespace" within it. a command starts with "create table" and ends with ; (semicolon) example file: create table first tablespace ; create table second ( BBL_CUSTOMER_NAME VARCHAR2(32), a... (7 Replies)
Discussion started by: ynixon
7 Replies
Login or Register to Ask a Question