The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
file name transformation vrms Shell Programming and Scripting 16 05-27-2008 08:49 AM
Color Transformation Language 1.4.1 (Default branch) iBot Software Releases - RSS News 0 03-18-2008 08:10 AM
Event Transformation Services iBot Complex Event Processing RSS News 0 08-24-2007 03:30 PM
Apply transformation logic in 2 different files HAA Shell Programming and Scripting 1 07-10-2007 04:33 AM
Transformation capital letter Dark Angel UNIX for Dummies Questions & Answers 1 01-24-2002 04:17 PM

Reply
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 05-26-2009
chebarbudo's Avatar
chebarbudo chebarbudo is offline
Registered User
  
 

Join Date: Nov 2008
Location: various
Posts: 188
Question text transformation with sed or awk

Hi there,
I'm trying to extract automatically opening hours from a website.
The page displaying the schedules is
http://www.natureetdecouvertes.com/p...sp?mag_cod=xxx
with xxx going from 101 to 174
I managed to get the following output :
Code:
      le lundi de 10.30 à 19.30
      le mardi de 9.30 à 19.30
blank
      le jeudi de 9.30 à 19.30
      le vendredi de 9.30 à 19.30
      le samedi de 10.30 à 21.30
blank
There is one line per weekday (from monday to sunday)
blank is an actual blank line (not displaying anything)
How can I now get the final output:
Code:
"10:30 19:30|09:30 19:30|           |09:30 19:30|09:30 19:30|10:30 21:30|           "
Thanks for your help.
Santiago
  #2 (permalink)  
Old 05-26-2009
panyam panyam is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2008
Posts: 457
Code:
awk '{ printf("%s\t%s%s",$4,$6,"|")}' input_file.txt
  #3 (permalink)  
Old 05-26-2009
colemar colemar is offline
Registered User
  
 

Join Date: Apr 2009
Location: Trento, Italy
Posts: 116
Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):"")}END{print b}' yourfile.txt
If you really want 11 spaces instead of a blank field, then:

Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):sprintf("%11s",""))}END{print b}' yourfile.txt

Last edited by colemar; 05-26-2009 at 05:32 AM..
  #4 (permalink)  
Old 05-26-2009
ghostdog74 ghostdog74 is offline Forum Advisor  
Registered User
  
 

Join Date: Sep 2006
Posts: 2,426
if you have Python, an almost full alternative solution
Code:
 
#!/usr/bin/env python
import urllib2,re
pat=re.compile(""".*<span class="tdBlancBold">(.*)<div align="center">.*""",re.M|re.DOTALL)
days=['lundi','mardi','mercredi','jeudi','vendredi','samedi','dimanche']
url="http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=%s"
for num in range(101,174):
    page=urllib2.urlopen(url % str(num))    
    data=page.read()
    if not "Impossible" in data:
        result = pat.findall(data)       
        store={}
        for i in result:
            for j in i.split("<br>"):
                j=j.strip()
                if j.startswith("le"):
                    j=j.split()
                    if j[1] in days:
                        t1,t2=j[-3],j[-1]
                        store.setdefault(j[1],[])
                        store[j[1]].extend([t1,t2])
        for DAY in days:
            try:
                print "%s |" %( ' '.join(store[DAY])),
            except: 
                print "\t\t|",
        print ""    
    else:
        print "Page not found ",url % str(num)
extract of output :
Code:
# python test.py
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 19.00 |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 |               | 10.00 21.00 | 10.00 20.00 |           |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |
Page not found  http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=110
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |
  #5 (permalink)  
Old 07-04-2009
chebarbudo's Avatar
chebarbudo chebarbudo is offline
Registered User
  
 

Join Date: Nov 2008
Location: various
Posts: 188
Thanks everyone for your help,
It looks that all stores display their timetable using different format but I came up with the following solution that matches perfectly all needs :

Code:
wget -qO- 'http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=118' |
sed -rn 's/\r//g; s/<br>//; /^[[:space:]]*([Ll]e )?[Ll]undi/,/([Ll]e )?[Dd]imanche/ { ; /^[[:space:]]*$/!p }' |
sed -r 's/^[^0-9]*//; s/.*clipse totale.*/           /; s/ h /h/g; s/(à|de|-) //; s/\.|h/:/g; s/^9:/09:/; s/:( |$)/:00\1/g' |
sed ':a; N; $!b a; s/\n/|/g')'
Sponsored Links
Reply

Bookmarks

Tags
awk, sed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -4. The time now is 10:55 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language translation by Google.
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0