Visit Our UNIX and Linux User Community


Using Python to grab data from a website


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using Python to grab data from a website
# 1  
Old 05-10-2010
Using Python to grab data from a website

Hello Everyone,
I'm trying to write a python script that will go to the following website and grab all the data on the page. The page refreshes regularly and the number of flights is different.

Untitled Document

What I wanted to do was grab all the data (except for top three row containing headers) and save the data in a text file.

Any help would be greatly appreciated.
# 2  
Old 05-10-2010
Not python, but:
Code:
curl http://www.phl.org/cgi-bin/fidsarrival.pl -o "arrival.txt"

# 3  
Old 05-10-2010
jgt, although that was not a python code, that is still pretty cool/good to know.
# 4  
Old 05-11-2010
This code actually stores the source text of a webpage and will work pretty nice in this case:
Code:
#!/usr/local/bin/python

import urllib
import os

# get data
f = urllib.urlopen("http://www.phl.org/cgi-bin/fidsarrival.pl")
s = f.read()
f.close()

# write data
ff = open("output.txt", "w")
ff.write(s)
ff.close()

# run shell command
command="cat output.txt | sed 's/^<.*//;s/.*DATE.*//;s/^Airline.*//;/^$/d' > output.txt"
os.system(command)

Explanation of the sed command:
Code:
sed 's/^<.*//;s/.*DATE.*//;s/^Airline.*//;/^$/d'

Replaces all lines which start with "<" with an empty line
Replaces all lines which contain "DATE" with an empty line
Replaces all lines which start with "Airline" with an empty line
Deletes all empty lines

Smilie

Last edited by pseudocoder; 05-11-2010 at 01:25 AM..
This User Gave Thanks to pseudocoder For This Post:
# 5  
Old 05-11-2010
my solution (i added a for loop to print the output text everytime the script is run, you can remove it if you don't need it.)

Code:
#!/usr/bin/python

import urllib.error, urllib.parse, urllib.request
import re

#get the file
f = urllib.request.urlopen("http://www.phl.org/cgi-bin/fidsarrival.pl")
s = str(f.read())
f.close()

#regular expression pattern matching everything inside < > tags and double-slashed n
pattern = r'(<.*?>|\\n)'

#replaces all instances of the pattern with a newline, then writes it into the file 'refined.txt'
ff = open('refined.txt', 'w')
ff.write(re.sub(pattern, '\n', s))
ff.close()

#prints the file line by line
of = open('refined.txt').readlines()
for line in of:
    print(line, end='')

this is actually built/designed around pseudocode's solution, i just modified it to use in-built regular expressions instead of calling a shell comand to edit the text.

if you're using python 2.x, just replace import urllib.request, urllib.error, urllib.parse with urllib or urllib2, and urllib.request.urlopen gets changed to urllib.urlopen
# 6  
Old 05-11-2010
thanks everyone!

Previous Thread | Next Thread
Test Your Knowledge in Computers #896
Difficulty: Medium
BusyBox, written by Richard Stallman in 1995, is a software suite that provides several Unix utilities in a single executable file.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to grab data in range then search for pattern

im using the following code to grab data, but after the data in the range im specifying has been grabbed, i want to count how many instances of a particular pattern is found? awk 'BEGIN{count=0} /parmlib.*RSP/,/seqfiles.*SSD/ {print; count++ } /103 error in ata file/ END { print count }'... (3 Replies)
Discussion started by: SkySmart
3 Replies

2. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

3. Shell Programming and Scripting

Grab data within a table in a long log file.

in my file which is a rather long log file it contains many text and tables and there is one table with 15 columns and I am interested to read in the value in column6 and its corresponding value in column2. Trouble is I do not know how to script it as the line number various between different log... (8 Replies)
Discussion started by: piynik
8 Replies

4. Shell Programming and Scripting

Grab 2 pieces of data within a file

I am a newbie and what I have is a captured file of content. I want to be able to grab 2 pieces of data, multiple times and print them to the screen. DataFile owner: locke user: fun data size: 60 location: Anaheim owner: david user: work data size: 80 location: Orange my script... (2 Replies)
Discussion started by: greglocke
2 Replies

5. Shell Programming and Scripting

Grab data between 2 keywords any do an array operation and write the file intact

Hi Unix Gurus, I need to grep for a block that is between a start and end keyword and then in between I need to find and replace a keyword. for eg: I need to search between Test = 000; and Test = 000; and find K9 and replace with M9 INPUT FILE Define { Replace = K9; Test =... (6 Replies)
Discussion started by: naveen@
6 Replies

6. Shell Programming and Scripting

How to grab data from xml block?

I tried searching the forums, but couldn't find anything relevant to my question. I have an xml file like the following: <topLevel numberBlock="BLOCK1"> <item="content1" title="Content 1"> <RefPath="path/to/file1.txt /> </item> <item"content2" title="Content 2" >... (4 Replies)
Discussion started by: jl487
4 Replies

7. Shell Programming and Scripting

Grab the data

Hello Honourable Members, I stuck into one issue, my server is migrating from UNIX to linux and ptree command does not work there. I was working with pstree command in linux and need some help regarding the same. suppose i have one line for example: ram (121)--- sita... (3 Replies)
Discussion started by: singhabm
3 Replies

8. UNIX for Dummies Questions & Answers

grab the data from the unix window

Hi, How could i grab a set of data (eg:file execution start & stop time stamp f) from unix? (1 Reply)
Discussion started by: siriv
1 Replies

9. Shell Programming and Scripting

How to grab data between 2 strings ?

Hi All, I have a text file below. How do i grab all the data between "05T00NPQSMR1" and "****" using awk ? Pls note that the text lines may not be fixed and text content is dynamic. Pls help. Thanks Below is my code where $LOT_SUFFIX is my shell variable. awk '/'"$LOT_SUFFIX"'/,/blah/'... (16 Replies)
Discussion started by: Raynon
16 Replies

10. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies

Featured Tech Videos