The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
file name transformation vrms Shell Programming and Scripting 16 05-27-2008 09:49 AM
Color Transformation Language 1.4.1 (Default branch) iBot Software Releases - RSS News 0 03-18-2008 08:10 AM
Event Transformation Services iBot Complex Event Processing RSS News 0 08-24-2007 04:30 PM
Apply transformation logic in 2 different files HAA Shell Programming and Scripting 1 07-10-2007 05:33 AM
Transformation capital letter Dark Angel UNIX for Dummies Questions & Answers 1 01-24-2002 04:17 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-26-2009
chebarbudo's Avatar
chebarbudo chebarbudo is offline
Registered User
  
 

Join Date: Nov 2008
Location: various
Posts: 188
Question text transformation with sed or awk

Hi there,
I'm trying to extract automatically opening hours from a website.
The page displaying the schedules is
http://www.natureetdecouvertes.com/p...sp?mag_cod=xxx
with xxx going from 101 to 174
I managed to get the following output :

Code:
      le lundi de 10.30 à 19.30
      le mardi de 9.30 à 19.30
blank
      le jeudi de 9.30 à 19.30
      le vendredi de 9.30 à 19.30
      le samedi de 10.30 à 21.30
blank

There is one line per weekday (from monday to sunday)
blank is an actual blank line (not displaying anything)
How can I now get the final output:

Code:
"10:30 19:30|09:30 19:30|           |09:30 19:30|09:30 19:30|10:30 21:30|           "

Thanks for your help.
Santiago
  #2 (permalink)  
Old 05-26-2009
panyam panyam is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2008
Posts: 474

Code:
awk '{ printf("%s\t%s%s",$4,$6,"|")}' input_file.txt

  #3 (permalink)  
Old 05-26-2009
colemar colemar is offline
Registered User
  
 

Join Date: Apr 2009
Location: Trento, Italy
Posts: 116

Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):"")}END{print b}' yourfile.txt

If you really want 11 spaces instead of a blank field, then:


Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):sprintf("%11s",""))}END{print b}' yourfile.txt


Last edited by colemar; 05-26-2009 at 06:32 AM..
  #4 (permalink)  
Old 05-26-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,557
if you have Python, an almost full alternative solution

Code:
 
#!/usr/bin/env python
import urllib2,re
pat=re.compile(""".*<span class="tdBlancBold">(.*)<div align="center">.*""",re.M|re.DOTALL)
days=['lundi','mardi','mercredi','jeudi','vendredi','samedi','dimanche']
url="http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=%s"
for num in range(101,174):
    page=urllib2.urlopen(url % str(num))    
    data=page.read()
    if not "Impossible" in data:
        result = pat.findall(data)       
        store={}
        for i in result:
            for j in i.split("<br>"):
                j=j.strip()
                if j.startswith("le"):
                    j=j.split()
                    if j[1] in days:
                        t1,t2=j[-3],j[-1]
                        store.setdefault(j[1],[])
                        store[j[1]].extend([t1,t2])
        for DAY in days:
            try:
                print "%s |" %( ' '.join(store[DAY])),
            except: 
                print "\t\t|",
        print ""    
    else:
        print "Page not found ",url % str(num)

extract of output :

Code:
# python test.py
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 19.00 |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 |               | 10.00 21.00 | 10.00 20.00 |           |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |
Page not found  http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=110
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |

  #5 (permalink)  
Old 07-04-2009
chebarbudo's Avatar
chebarbudo chebarbudo is offline
Registered User
  
 

Join Date: Nov 2008
Location: various
Posts: 188
Thanks everyone for your help,
It looks that all stores display their timetable using different format but I came up with the following solution that matches perfectly all needs :


Code:
wget -qO- 'http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=118' |
sed -rn 's/\r//g; s/<br>//; /^[[:space:]]*([Ll]e )?[Ll]undi/,/([Ll]e )?[Dd]imanche/ { ; /^[[:space:]]*$/!p }' |
sed -r 's/^[^0-9]*//; s/.*clipse totale.*/           /; s/ h /h/g; s/(à|de|-) //; s/\.|h/:/g; s/^9:/09:/; s/:( |$)/:00\1/g' |
sed ':a; N; $!b a; s/\n/|/g')'

Reply

Bookmarks

Tags
awk, sed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:19 AM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0