Sponsored Content
Top Forums Shell Programming and Scripting Parse Page Source and Extract Links Post 302768509 by DGPickett on Friday 8th of February 2013 04:57:37 PM
Old 02-08-2013
Well, are all the URLs in HREF= clauses?
  1. tool reads lines of one url each and fetches the source to stdout with 'while read u ; do wget $u done'
  2. tool reads html and spits out embedded URLs, maybe stripped of # internal tags (sed,awk,perl).
  3. tool (grep?) filters for PDFs.
  4. tool sorts them unique (sort -u).
  5. tool 'while read u ; do wget $u >dest_dir/${u##*/} done' gets the files.
Add pipes.

Last edited by DGPickett; 02-11-2013 at 11:05 AM..
This User Gave Thanks to DGPickett For This Post:
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Which comand to show the source code on a web page?

Hi folks! I am using MacOsX that runs freeBSD. Could you tell me what comand to type on the Unix Terminal to display on the terminal the source code of a certain web page? I think something like #<comand> http://www.apple.com will display on the terminal's window the html source code... (11 Replies)
Discussion started by: fundidor
11 Replies

2. UNIX for Dummies Questions & Answers

reading web page source in unix

is there a command that allows you to take a url and grab the source code from the page and output it to stdout? i want to know because i want to grab a page and pass it thru another program to analyze the page. any help would be appreciated thanks (3 Replies)
Discussion started by: jaymzlee
3 Replies

3. Shell Programming and Scripting

write page source to standard output

I'm new to PERL, but I want to take the page source and write it to a file or standard output. I used perl.org as a test website. Here is the script: use strict; use warnings; use LWP::Simple; getprint('http://www.perl.org') or die 'Unable to get page'; exit 0; ... (1 Reply)
Discussion started by: wxornot
1 Replies

4. Shell Programming and Scripting

Getting source code of a page

I want to download a particular page from the internet and get the source code of the page in html format. I want to parse the source code to find a specific parameters using grep command. could someone tell me the linux command to download a specific page and parse the source code of it. ... (1 Reply)
Discussion started by: ahamed
1 Replies

5. Shell Programming and Scripting

web page source cleanup

is it possible to pass webpages to remove all tag style information, but leave the tag... say I have <h1 style='font-size: xxx; color: xxxxxx'>headline 1</h1> i want to get <h1>headline 1</h1> BTW, i got an oneliner here to remove all tags: sed -n '/^$/!{s/<*>//g;p; Thanks a... (4 Replies)
Discussion started by: dtdt
4 Replies

6. Shell Programming and Scripting

Performing extractions on web source page

I have downloaded a web source page to a file. I then egrep a single word to extract a line containing it to another file. I then cat the second file and remove everything before a word and after a second word to capture the phrase desired. This did not work. I used vi to validate that the 2... (1 Reply)
Discussion started by: slak0
1 Replies

7. Shell Programming and Scripting

Save page source, including javascript

I need to get the source code of a webpage. I have tried to use wget and curl, but it doesn't show the necessary javascript part of the source. I don't have to execute it, only to view the source. How do I do that? (1 Reply)
Discussion started by: locoroco
1 Replies

8. Shell Programming and Scripting

Dump web page source as rendered by browser

Hi guys| I need to retrieve a specific .m3u8 link from a web page, which makes use of iframes and JavaScript I tried to get the full source with "wget", "lynx", "w3m" and "phantomjs", but they can't dump all the source, with the part containing the link that i need, which seems to be inside... (0 Replies)
Discussion started by: Marmz
0 Replies
CALLGRIND 
ANNOTATE(1) Release 3.7.0 CALLGRIND ANNOTATE(1) NAME
callgrind_annotate - post-processing tool for the Callgrind SYNOPSIS
callgrind_annotate [options] [callgrind-out-file [source-files...]] DESCRIPTION
callgrind_annotate takes an output file produced by the Valgrind tool Callgrind and prints the information in an easy-to-read form. OPTIONS
-h --help Show summary of options. --version Show version of callgrind_annotate. --show=A,B,C [default: all] Only show figures for events A,B,C. --sort=A,B,C Sort columns by events A,B,C [event column order]. --threshold=<0--100> [default: 99%] Percentage of counts (of primary sort event) we are interested in. --auto=<yes|no> [default: no] Annotate all source files containing functions that helped reach the event count threshold. --context=N [default: 8] Print N lines of context before and after annotated lines. --inclusive=<yes|no> [default: no] Add subroutine costs to functions calls. --tree=<none|caller|calling|both> [default: none] Print for each function their callers, the called functions or both. -I, --include=<dir> Add dir to the list of directories to search for source files. SEE ALSO
valgrind(1), $INSTALL/share/doc/valgrind/html/index.html or http://www.valgrind.org/docs/manual/index.html. AUTHOR
Josef Weidendorfer <Josef.Weidendorfer@gmx.de>. This manual page was written by Philipp Frauenfelder <pfrauenf@debian.org>. Release 3.7.0 06/05/2012 CALLGRIND ANNOTATE(1)
All times are GMT -4. The time now is 08:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy