Sponsored Content
Full Discussion: URL partial matching
Top Forums Shell Programming and Scripting URL partial matching Post 302911623 by csim_mohan on Friday 1st of August 2014 05:19:56 PM
Old 08-01-2014
URL partial matching

I have two files: file 1
Code:
http://www.hello.com        http://neo.com/peace/development.html, www.japan.com,  http://example.com/abc/abc.html
http://news.net             http://lolz.com/country/list.html,www.telecom.net, www.highlands.net, www.software.com
http://example2.com         http://earth.net, http://abc.gov.cn/department/1.html

file 2:
Code:
www.neo.com/1/2/3/names.html
http://abc.gov.cn/script.aspx
http://example.com/abc/abc.html

file 2 are the search urls that is used for the partial match in file1 at column2. If it has the partial match it has to return the column 1 url with the partial match url in column 2 of file 1 like this:

Desired output:
Code:
http://www.hello.com    http://neo.com/peace/development.html, http://example.com/abc/abc.html
http://news.net
http://example2.com     http://abc.gov.cn/department/1.html

I am using this script which can give me exact match url pattern at column 2 but cannot work with the partial match which is as follows:

Code:
awk -F '[ \t,]' '
FNR == NR {
    a[$1]
    next
}
{    o = $1
    c = 0
    for(i = 2; i <= NF; i++)
        if($i in a)
            o = o (c++ ? ", " : "\t") $i
    print o
}' file2 file1

The output is :
Code:
http://www.hello.com    http://example.com/abc/abc.html
http://news.net
http://example2.com

Any suggestion to fix this ?
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep all files matching partial filename

What would be the easiest way to grep all files within a particular directory that match a partial filename? For example, searching all files that begin with "filename.txt" and are appended with the date they were created. I am using Ksh 88, btw. (3 Replies)
Discussion started by: mharley
3 Replies

2. UNIX for Advanced & Expert Users

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

3. Shell Programming and Scripting

url calling and parameter passing to url in script

Hi all, I need to write a unix script in which need to call a url. Then need to pass parameters to that url. please help. Regards, gander_ss (1 Reply)
Discussion started by: gander_ss
1 Replies

4. Web Development

Regex to rewrite URL to another URL based on HTTP_HOST?

I am trying to find a way to test some code, but I need to rewrite a specific URL only from a specific HTTP_HOST The call goes out to http://SUB.DOMAIN.COM/showAssignment/7bde10b45efdd7a97629ef2fe01f7303/jsmodule/Nevow.Athena The ID in the middle is always random due to the cookie. I... (5 Replies)
Discussion started by: EXT3FSCK
5 Replies

5. UNIX for Dummies Questions & Answers

Matching A URL pattern

egrep -iow '(http*+|www)*' url.txt is this command logically incorrect to match a url pattern inside a file and display only the urls in the terminal??? Please rectify the error in my syntax , (2 Replies)
Discussion started by: an2up
2 Replies

6. Shell Programming and Scripting

AWK - Print partial line/partial field

Hello, this is probably a simple request but I've been toying with it for a while. I have a large list of devices and commands that were run with a script, now I have lines such as: a-router-hostname-C#show ver I want to print everything up to (and excluding) the # and everything after it... (3 Replies)
Discussion started by: ippy98
3 Replies

7. UNIX for Dummies Questions & Answers

Awk: print all URL addresses between iframe tags without repeating an already printed URL

Here is what I have so far: find . -name "*php*" -or -name "*htm*" | xargs grep -i iframe | awk -F'"' '/<iframe*/{gsub(/.\*iframe>/,"\"");print $2}' Here is an example content of a PHP or HTM(HTML) file: <iframe src="http://ADDRESS_1/?click=5BBB08\" width=1 height=1... (18 Replies)
Discussion started by: striker4o
18 Replies

8. Shell Programming and Scripting

Reading URL using Mechanize and dump all the contents of the URL to a file

Hello, Am very new to perl , please help me here !! I need help in reading a URL from command line using PERL:: Mechanize and needs all the contents from the URL to get into a file. below is the script which i have written so far , #!/usr/bin/perl use LWP::UserAgent; use... (2 Replies)
Discussion started by: scott_cog
2 Replies

9. UNIX for Beginners Questions & Answers

awk to update file with partial matching line in another file and append text

In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

I have a two file as shown below, file:1 >Contig_152_415 (REVERSE SENSE) >Contig_152_420 (REVERSE SENSE) >Contig_152_472 (REVERSE SENSE) >Contig_152_484 (REVERSE SENSE) File:2 >Contig_152:49081-49929 ATCGAGCAGCGCCGCGTGCGGTGCACCCTTGTGCAGATCGGGAGTAACCACGCGCACGGC... (2 Replies)
Discussion started by: dineshkumarsrk
2 Replies
HTML::FormatText(3)					User Contributed Perl Documentation				       HTML::FormatText(3)

NAME
HTML::FormatText - Format HTML as plaintext VERSION
version 2.10 SYNOPSIS
use HTML::TreeBuilder; $tree = HTML::TreeBuilder->new->parse_file("test.html"); use HTML::FormatText; $formatter = HTML::FormatText->new(leftmargin => 0, rightmargin => 50); print $formatter->format($tree); or, more simply: use HTML::FormatText; my $string = HTML::FormatText->format_file( 'test.html', leftmargin => 0, rightmargin => 50 ); DESCRIPTION
HTML::FormatText is a formatter that outputs plain latin1 text. All character attributes (bold/italic/underline) are ignored. Formatting of HTML tables and forms is not implemented. HTML::FormatText is built on HTML::Formatter and documentation for that module applies to this - especially "new" in HTML::Formatter, "format_file" in HTML::Formatter and "format_string" in HTML::Formatter. You might specify the following parameters when constructing the formatter: leftmargin (alias lm) The column of the left margin. The default is 3. rightmargin (alias rm) The column of the right margin. The default is 72. SEE ALSO
HTML::Formatter INSTALLATION
See perlmodinstall for information and options on installing Perl modules. BUGS AND LIMITATIONS
No bugs have been reported. Please report any bugs or feature requests through the web interface at http://rt.cpan.org/Public/Dist/Display.html?Name=HTML-Format <http://rt.cpan.org/Public/Dist/Display.html?Name=HTML-Format>. AVAILABILITY
The project homepage is http://search.cpan.org/dist/HTML-Format <http://search.cpan.org/dist/HTML-Format>. The latest version of this module is available from the Comprehensive Perl Archive Network (CPAN). Visit <http://www.perl.com/CPAN/> to find a CPAN site near you, or see http://search.cpan.org/dist/HTML-Format/ <http://search.cpan.org/dist/HTML-Format/>. The development version lives at http://github.com/nigelm/html-format <http://github.com/nigelm/html-format> and may be cloned from git://github.com/nigelm/html-format.git <git://github.com/nigelm/html-format.git>. Instead of sending patches, please fork this project using the standard git and github infrastructure. AUTHORS
o Nigel Metheringham <nigelm@cpan.org> o Sean M Burke <sburke@cpan.org> o Gisle Aas <gisle@ActiveState.com> COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by Nigel Metheringham, 2002-2005 Sean M Burke, 1999-2002 Gisle Aas. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. perl v5.16.2 2013-08-25 HTML::FormatText(3)
All times are GMT -4. The time now is 03:19 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy