"links -dump" output format issue


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting "links -dump" output format issue
# 1  
Old 07-07-2011
"links -dump" output format issue

Hi All,

I tried searching a lot about this but to no avail. I have a HTML file. I used

Code:
links -dump file_page.html > text_html.txt

What the above command gave me was a filtered text from the HTML file with tags removed. Now, the the output from the above command looked something like this:

Code:
This is a [1]test HTML [2]file.
References

         Visible Links
         1. file:///feed/rss.cgi?ChanKey=PubMedNews
         2. file:///corehtml/query/static/pubmedsearch.xml

The above output means that "test" is hyperlinked to page file:///feed/rss.cgi?ChanKey=PubMedNews and "file" is hyperlinked to file:///corehtml/query/static/pubmedsearch.xml


My question is: Can "links" give me output where [1] and [2] are removed and the exact HTML links are embedded? I mean I wish my output could be like:

Code:
This is a [file:///feed/rss.cgi?ChanKey=PubMedNews]test HTML [file:///corehtml/query/static/pubmedsearch.xml]file.

Or is there any other way of accomplishing my task?

I am using Linux with Bash.
# 2  
Old 07-07-2011
The following should work, however it will break the std width formatting of links -dump
Code:
links -dump <Your HTML File> |perl -e '
$on_page=1;
while(<STDIN>){
   $on_page=0 if $on_page && /^References$/;
   push @output,$_ if $on_page;
   $links{$1}=$2 if $in_links && /^\s+(\d+)\.\s(.+)$/;
   $in_links=1 if (!$on_page && /^\s+Visible links$/)
}
for (@output){
   s/\[(\d+)\]/[$links{$1}]/g;
   print
}'


Last edited by Skrynesaver; 07-07-2011 at 11:26 AM.. Reason: tidied code
This User Gave Thanks to Skrynesaver For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script - Print an ascii file using specific font "Latin Modern Mono 12" "regular" "9"

Hello. System : opensuse leap 42.3 I have a bash script that build a text file. I would like the last command doing : print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt where : print_cmd ::= some printing... (1 Reply)
Discussion started by: jcdole
1 Replies

2. Shell Programming and Scripting

""Help Me!""Beginner awk learning issue

Hi All, I have just now started learning awk from the source - Awk - A Tutorial and Introduction - by Bruce Barnett and the bad part is that I am stuck on the very first example for running the awk script. The script is as - #!/bin/sh # Linux users have to change $8 to $9 awk ' BEGIN ... (6 Replies)
Discussion started by: csrohit
6 Replies

3. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

4. Shell Programming and Scripting

Amend format of output from "last"command

When I run a “last” command, I am getting more than one row per user. Eg userabc pts/2 10.253.0.108 Tue Dec 14 09:21 still logged in userabc pts/2 10.253.0.108 Tue Dec 14 03:57 - 03:59 (00:01) userabc pts/2 10.253.0.108 Mon Dec 13 14:25 - 15:43 (01:18) userpqr pts/3 10.253.0.39 Wed Jan... (5 Replies)
Discussion started by: malts18
5 Replies

5. Shell Programming and Scripting

Format output from the file to extract "date" section

Hello Team , I have to extract date section from the below file output. The output of the file is as shown below. I have to extract the "" this section from the above output of the file. can anyone please let me know how can we acheive this? (4 Replies)
Discussion started by: coolguyamy
4 Replies

6. UNIX for Dummies Questions & Answers

Format output from "echo" command

Hi, I have written a BASH shell script that contains a lot of "echo" commands to notify the user about what's going on. The script generates a log file that contains a copy of what is seen in the terminal. The echo statements are generally verbose, and thus extend out for quite a ways on one... (2 Replies)
Discussion started by: msb65
2 Replies

7. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

8. UNIX for Dummies Questions & Answers

Explanation of "total" field in "ls -l" command output

When I do a listing in one particular directory (ls -al) I get: total 43456 drwxrwxrwx 2 root root 4096 drwxrwxrwx 3 root root 4096 -rwxrwxr-x 1 nobody nobody 3701594 -rwxrwxr-x 1 nobody nobody 3108510 -rwxrwxr-x 1 nobody nobody 3070580 -rwxrwxr-x 1 nobody nobody 3099733 -rwxrwxr-x 1... (1 Reply)
Discussion started by: proactiveaditya
1 Replies

9. Solaris

significance of "+" char in SunOS "ls -l" output

Hi, I've noticed that the permissions output from "ls -l" under SunOS differs from Linux in that after the "rwxrwxrwx" field, there is an additional "+" character that may or may not be there. What is the significance of this character? Thanks, Suan (6 Replies)
Discussion started by: sayeo
6 Replies

10. Debian

Debian: doubt in "top" %CPU and "sar" output

Hi All, I am running my application on a dual cpu debian linux 3.0 (2.4.19 kernel). For my application: <sar -U ALL> CPU %user %nice %system %idle ... 10:58:04 0 153.10 0.00 38.76 0.00 10:58:04 1 3.88 0.00 4.26 ... (0 Replies)
Discussion started by: jaduks
0 Replies
Login or Register to Ask a Question