Parsing movie showtimes with sed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Parsing movie showtimes with sed
# 1  
Old 05-13-2011
Parsing movie showtimes with sed

I am trying to get sed to parse and remove all html and make the output readable by a tts engine, but the output so far still sounds bad with the tts engine.

The url for the raw data is httpsx://www.muvico.com/asp/performances_ts.asp?theater_id=220&film_ID=0&querytype=1&seed=0000000&forprint=1


Code:
  wget --no-check-certificate "https://www.muvico.com/asp/performances_ts.asp?theater_id=220&film_ID=0&querytype=1&seed=0000000&forprint=1" -O c:\movietemp.txt
  cd c:\sed\bin
  more c:\movietemp.txt | sed "s/<[^>]*>//g" > c:\movietemp2.txt
  more c:\movietemp2.txt | sed "s/&nbsp//g" > c:\movietemp3.txt
  more c:\movietemp3.txt | sed "s/;//g" > c:\movietemp4.txt
  more c:\movietemp4.txt | sed "s/()//g" > c:\movies.txt
  cd c:\jampal
  ptts -voice "Microsoft Mary" -u c:\movies.txt

Here is a sample of the output.


Quote:
All Movies for Parisian 20 and IMAXFriday - 5/13/2011PARISIAN 20 AND IMAX3D - PRIEST(PG13)1h 43m
11:05AM1:20PM3:40PM5:55PM8:20PM10:40PM12:50AM3D - THOR(PG13)2h 10m
I would like to know how to add spaces between the times and AM and PM so instead of 11:05AM1:20PM it is 11:05 AM 1:20 PM.

I am pretty new to sed, sorry about the messy bat file.
# 2  
Old 05-13-2011
Code:
sed '/AM/ AM /g; 's/PM/ PM /g' file

This User Gave Thanks to ahamed101 For This Post:
# 3  
Old 05-13-2011
You got a typo there, ahamed. Correct:
Code:
sed 's/AM/ AM /g;s/PM/ PM /g' file

But that is very crude -- will turn 'LAMA' into 'L AM A', e.g.

Can do better with specifying a digit before A,P and after M.
Code:
sed 's/\([0-9]\)\([AP]\)M\([0-9]\)/\1 \2M \3/g'

The second capture is the A or P, just to combine two commands into one

When you're sure what those sed commands all do, you can chain them together, rather than saving into file and then reading a file... Much more efficient would be:

Code:
more c:\movietemp.txt | sed "s/<[^>]*>//g" | sed "s/&nbsp//g"  | sed "s/;//g" | etc...

more is actually not needed either; sed can process input file specified as argument. Also all those commands can be part of just one call of sed, and thus reading each line just once. Like this:
Code:
sed 's/<[^>]*>//g; s/&nbsp//g; s/;//g; s/()//g; s/\([0-9]\)\([AP]\)M\([0-9]\)/\1 \2M \3/g' c:\movietemp.txt


Last edited by mirni; 05-13-2011 at 04:37 AM..
This User Gave Thanks to mirni For This Post:
# 4  
Old 05-13-2011
Quote:
Originally Posted by ahamed101
Code:
sed '/AM/ AM /g; 's/PM/ PM /g' file

I could not get it to work the way you posted it but with a few small modifications it now works, for some reason the single quotes don't work in windows. Thanks for the help.



Revised code
Code:
 sed "s/AM/ AM /g; s/PM/ PM /g"

---------- Post updated at 04:47 AM ---------- Previous update was at 03:26 AM ----------

Wow thanks for all the tips, I had no idea sed could do so much in one line, albeit a really long one. ;-)
So far I have;

Code:
 
cd c:\wget\bin
wget --no-check-certificate "https://www.muvico.com/asp/performances_ts.asp?theater_id=220&film_ID=0&querytype=1&seed=8675309&forprint=1" -O c:\movietemp.txt
cd c:\sed\bin
sed "s/<[^>]*>//g; s/&nbsp//g; s/;//g; s/()//g; s/\([0-9]\)\([AP]\)M\([0-9]\)/\1 \2M \3/g; s/[()]//g" c:\movietemp.txt > c:\movies.txt
cd c:\jampal
ptts -voice "Microsoft Mary" -u c:\movies.txt

I'm also trying to add a space between PM so P M because of the way the tts says it. Also prepend the rating with the word rating while removing the parenthesis so (PG13) would become Rated PG13 with spaces. And last but not least prepend the phrase "running time" before the running times and change for example 2h 45m to 2 hours and 45 minutes

---------- Post updated at 04:48 AM ---------- Previous update was at 04:47 AM ----------

---------- Post updated at 05:15 AM ---------- Previous update was at 04:48 AM ----------

I have also noticed that some of the words still blend into each other for example "10:20PMPRIEST" and "12:00AMBRIDESMAIDS" they were in caps. which should read 10:20 PM PRIEST or Priest but that's even more complex.

Last edited by Shellshock; 05-13-2011 at 04:36 AM..
# 5  
Old 05-13-2011
Quote:
I have also noticed that some of the words still blend into each other for example "10:20PMPRIEST" and "12:00AMBRIDESMAIDS" they were in caps
They sure should; the command:
Code:
s/\([0-9]\)\([AP]\)M\([0-9]\)/\1 \2M \3/g

Is inserting spaces only when it comes across <digit>AM<digit> or <digit>PM<digit> (as your sample input happens to be of that form). There is no trailing digit in '10:20PMPRIEST', so it doesn't match.

Let's try to do better:
Code:
sed 's/\([0-9][0-9]*:[0-9][0-9]*\)\([PA]\)M/\1 \2M /g'

which would capture digit, at least one ('[0-9][0-9]*'), followed by a colon and again at leas one digit followed by AM or PM.

Quote:
Wow thanks for all the tips, I had no idea sed could do so much in one line, albeit a really long one. ;-)
It sure can; but it doesn't have to be on one line. It's actually better to put one command on one line, so that it's more readable and comments can be inserted:
Code:
sed '
  s/<[^>]*>//g; #remove tags
  s/&nbsp//g;   
  s/;//g; 
  s/()//g; 
  s/\([0-9][0-9]*:[0-9][0-9]*\)\([PA]\)M/\1 \2M /g #add spaces before and after AM,PM
  s/\([0-9]*\)h \([0-9][0-9]*\)m/ running time \1 hours and \2 minutes/g
' c:\movietemp.txt

This User Gave Thanks to mirni For This Post:
# 6  
Old 05-13-2011
Thanks for all the help :-). For some reason the windows cli does not like single quotes or multiple lines so I ended up with a one liner. The "final" code is below.

Code:
cd c:\wget\bin
wget --no-check-certificate "https://www.muvico.com/asp/performances_ts.asp?theater_id=220&film_ID=0&querytype=1&seed=8675309&forprint=1" -O c:\movietemp.txt
cd c:\sed\bin
sed "s/<[^>]*>//g; s/&nbsp//g; s/;//g; s/\([0-9][0-9]*:[0-9][0-9]*\)\([PA]\)M/\1 \2M /g; s/\([0-9]*\)h \([0-9][0-9]*\)m/ running time \1 hours and \2 minutes playing at /g; s/[()]/ /g; " c:\movietemp.txt > c:\movies.txt
cd c:\jampal
ptts -voice "Microsoft Mary" -u c:\movies.txt

I also added semicolons after line 6 and 7.

---------- Post updated at 03:56 PM ---------- Previous update was at 03:28 PM ----------

I added some spaces and periods so the tts would not read it as one really long run on sentence.

Code:
sed "s/<[^>]*>//g; s/&nbsp//g; s/;//g; s/\([0-9][0-9]*:[0-9][0-9]*\)\([PA]\)M/\1 \2 M. /g; s/\([0-9]*\)h \([0-9][0-9]*\)m/. running time. \1 hours and \2 minutes. playing at. /g; s/[)]/ /g; s/[(]/. Rated. /g; s/PG/P G /g; " c:\movietemp.txt > c:\movies.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing via sed issue

sorry I messed up the last post with too many mistakes and corrections so I closed it and opening a new one which should be clear to everyone .my apologies to the admins. I am using sun solaris and Linux , what I want is SED to print any string (or output it to a file preferably) that does... (2 Replies)
Discussion started by: boncuk
2 Replies

2. Answers to Frequently Asked Questions

Why Parsing Can't be Done With sed ( or similar tools)

Regularly we have questions like: i have an XML (C, C++, ...) file with this or that property and i want to extract the content of this or that tag (function, ...). How do i do it in sed? Yes, in some (very limited) cases this is possible, but in general this can't be done. That is: you can do... (0 Replies)
Discussion started by: bakunin
0 Replies

3. UNIX for Dummies Questions & Answers

sed or Grep Parsing

I would like to parse two strings from lines in a file only when both strings appear on the same line. For example, if I have the following line: string1 string2 string3 string4 string5 string6 string7 string8 string9 I would like the output to be: string2: string7 Can someone give me... (5 Replies)
Discussion started by: ARBlue79
5 Replies

4. Shell Programming and Scripting

[SED] Parsing to get a single value

Hello guys, I guess you are fed up with sed command and parse questions, but after a while researching the forum, I could not get an answer to my doubt. I know it must be easy done with sed command, but unfortunately, I never get right syntax of this command OK, this is what I have in my... (3 Replies)
Discussion started by: manolain
3 Replies

5. Shell Programming and Scripting

Sed special parsing

What is the shortest & right way to remove the string "" with a sed statement ? echo 'whateverwhatever' | sed ........ ? :) (2 Replies)
Discussion started by: ctsgnb
2 Replies

6. Shell Programming and Scripting

sed (parsing value)

All, Can somebody provide me with some sed expertise on how to parse the following line. 27-MAR-2011 10:28:01 * (CONNECT_DATA=(SID=dmart)(CID=(PROGRAM=sqlplus)(HOST=mtasnprod1)(USER=mtasnord))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.197.7.47)(PORT=54881)) * establish * dmart * 0 I would like... (3 Replies)
Discussion started by: BeefStu
3 Replies

7. Shell Programming and Scripting

Parsing cron with sed

Hello I want to convert my cron list into a csv Can you please help me with sed ? eg: Convert #06,21,36,51 * * 1,2 * (. ~/.profile ; timex /some/path/script -30 -15) >> /some/path/logfile2 2>&1 * * * * * (. ~/.profile ; timex /some/path/script2) > /some/path/logfile2 To:... (1 Reply)
Discussion started by: drbiloukos
1 Replies

8. Shell Programming and Scripting

Parsing with awk or sed

I want to delete corrupt records from a file through awk or sed. Can anyone help me with this Thanks Striker Change subject to a descriptive one, ty. (1 Reply)
Discussion started by: Rahul_us
1 Replies

9. Shell Programming and Scripting

Sed parsing error

I'm having a problem with a sed script. A programmer needs to change columns 942,943,944 to blank spaces only where it has the number 999 in every line. I didn't have a copy of the data file to test with originally so made my own up with a bunch of x's and put 999 in columns 5-7. The sed... (1 Reply)
Discussion started by: gravy26
1 Replies

10. Shell Programming and Scripting

awk sed parsing

hi , i would like to parse some file with the fallowing data : data data data "unwanted data" data data "unwanted data" data data data data #unwanted data. what i want it to have any coments between "" and after # to be erased using awk or/and sed. has anyone an idea? thanks. (3 Replies)
Discussion started by: Darsh
3 Replies
Login or Register to Ask a Question