text transformation with sed or awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting text transformation with sed or awk
# 1  
Old 05-26-2009
Question text transformation with sed or awk

Hi there,
I'm trying to extract automatically opening hours from a website.
The page displaying the schedules is
http://www.natureetdecouvertes.com/p...sp?mag_cod=xxx
with xxx going from 101 to 174
I managed to get the following output :
Code:
      le lundi de 10.30 à 19.30
      le mardi de 9.30 à 19.30
blank
      le jeudi de 9.30 à 19.30
      le vendredi de 9.30 à 19.30
      le samedi de 10.30 à 21.30
blank

There is one line per weekday (from monday to sunday)
blank is an actual blank line (not displaying anything)
How can I now get the final output:
Code:
"10:30 19:30|09:30 19:30|           |09:30 19:30|09:30 19:30|10:30 21:30|           "

Thanks for your help.
Santiago
# 2  
Old 05-26-2009
Code:
awk '{ printf("%s\t%s%s",$4,$6,"|")}' input_file.txt

# 3  
Old 05-26-2009
Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):"")}END{print b}' yourfile.txt

If you really want 11 spaces instead of a blank field, then:

Code:
awk -F'[^0-9]+' '{b=b (NR>1?"|":"")($2?sprintf("%02d:%02d %02d:%02d",$2,$3,$4,$5):sprintf("%11s",""))}END{print b}' yourfile.txt


Last edited by colemar; 05-26-2009 at 06:32 AM..
# 4  
Old 05-26-2009
if you have Python, an almost full alternative solution
Code:
 
#!/usr/bin/env python
import urllib2,re
pat=re.compile(""".*<span class="tdBlancBold">(.*)<div align="center">.*""",re.M|re.DOTALL)
days=['lundi','mardi','mercredi','jeudi','vendredi','samedi','dimanche']
url="http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=%s"
for num in range(101,174):
    page=urllib2.urlopen(url % str(num))    
    data=page.read()
    if not "Impossible" in data:
        result = pat.findall(data)       
        store={}
        for i in result:
            for j in i.split("<br>"):
                j=j.strip()
                if j.startswith("le"):
                    j=j.split()
                    if j[1] in days:
                        t1,t2=j[-3],j[-1]
                        store.setdefault(j[1],[])
                        store[j[1]].extend([t1,t2])
        for DAY in days:
            try:
                print "%s |" %( ' '.join(store[DAY])),
            except: 
                print "\t\t|",
        print ""    
    else:
        print "Page not found ",url % str(num)

extract of output :
Code:
# python test.py
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 19.00 |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 |               | 10.00 21.00 | 10.00 20.00 |           |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 21.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |
Page not found  http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=110
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 | 10.00 19.30 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 | 10.00 20.00 |             |
9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 | 9.30 19.30 |           |

# 5  
Old 07-04-2009
Thanks everyone for your help,
It looks that all stores display their timetable using different format but I came up with the following solution that matches perfectly all needs :

Code:
wget -qO- 'http://www.natureetdecouvertes.com/pages/gener/view_FO_STORE_corgen.asp?mag_cod=118' |
sed -rn 's/\r//g; s/<br>//; /^[[:space:]]*([Ll]e )?[Ll]undi/,/([Ll]e )?[Dd]imanche/ { ; /^[[:space:]]*$/!p }' |
sed -r 's/^[^0-9]*//; s/.*clipse totale.*/           /; s/ h /h/g; s/(à|de|-) //; s/\.|h/:/g; s/^9:/09:/; s/:( |$)/:00\1/g' |
sed ':a; N; $!b a; s/\n/|/g')'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Format the text using sed or awk

I was able to figure out how to format a text. Raw Data: $ cat test Thu Aug 23 15:43:28 UTC 2018, hostname01, 232.02, 3, 0.00 Thu Aug 23 15:43:35 UTC 2018, hostname02, 231.09, 4, 0.31 Thu Aug 23 15:43:37 UTC 2018, hostname03, 241.67, 4, 0.43 (5 Replies)
Discussion started by: kenshinhimura
5 Replies

2. Shell Programming and Scripting

Text replacement with awk or sed?

Hi guys, I worked for almost a half-day for the replacement of some text automatically with script. But no success. The problem is I have hundred of files, which need to be replaced with some new text. It's a painful work to work manually and it's so easy to do it wrong. For example, I... (2 Replies)
Discussion started by: liuzhencc
2 Replies

3. Shell Programming and Scripting

awk or sed? rows text to co

Hello Friends! I would like to help the masters ... I have a file with the entry below and would like a script for that output: Input file: 001 1 01-20152142711532-24S 1637909825/05/2015BAHIA SERVICOS R F, ... (1 Reply)
Discussion started by: He2
1 Replies

4. Debian

Using awk and sed to replace text

Good Day Every one I have a problem finding and replacing text in some large files that will take a long time to manually edit. Example text file looks like this #Example Large Text File unix linux dos squid bind dance bike car plane What im trying to do is to edit all the... (4 Replies)
Discussion started by: linuxjunkie
4 Replies

5. UNIX for Advanced & Expert Users

Need help either with awk or sed to get text between words

Hello All, My requirement is to get test between two words START & END, something like html tags Eg. Input file: START Line1 Line2 Line3 CLOSE START Line4 Line5 Line6 END START Line7 START Line8 (7 Replies)
Discussion started by: konerusuneel
7 Replies

6. UNIX for Dummies Questions & Answers

Changing Text with sed or awk

I'm changing some html code on multiple web pages and I need to match particular phrases but keep some text within each phrase. E.G. I need to change this line: <DIV id="heading">Description:</DIV> into <span class="hlred">Description:</span><br /> The text "Description:" may... (2 Replies)
Discussion started by: hal8000
2 Replies

7. Shell Programming and Scripting

awk or sed to format text file

hi all, i have a text file which looks like the below 01 02 abc Top 40 music Kidz Only! MC 851 MC 852 MC 853 7NOW Arch_Diac xyz2 abc h211 Commacc1 Commacc2 Commacc3 (4 Replies)
Discussion started by: posner
4 Replies

8. Shell Programming and Scripting

sed or awk to parse this text

I am just beginning with sed and awk and understand that they are "per" line input. That is, they operate on each line individually, and output based on rules, etc. But I have multi-line text blocks that looks as follows, and wish to ONLY extract the text between the first hyphen (-) and the... (13 Replies)
Discussion started by: bulgin
13 Replies

9. Shell Programming and Scripting

text processing ( sed/awk)

hi.. I have a file having record on in 1 line.... I want every 400 characters in a new line... means in 1st line 1-400 in 2nd line - 401-800 etc pl help. (12 Replies)
Discussion started by: clx
12 Replies

10. UNIX for Dummies Questions & Answers

Awk/Sed One liner for text replacement

Hi group, I want to replace the occurance of a particular text in a paragraph.I tried with Sed,but Sed only displays the result on the screen.How can i update the changes in the original file??? The solution should be a one liner using awk and sed. Thanks in advance. (5 Replies)
Discussion started by: bishnu.bhatta
5 Replies
Login or Register to Ask a Question