Sponsored Content
Top Forums Shell Programming and Scripting extracting domain names out of a text file Post 302286415 by h.a.l on Wednesday 11th of February 2009 05:25:20 AM
Old 02-11-2009
Please help modify solution

I am trying to extract .co.uk domains from html,
using the command:
cat $DIR/oldfile.txt | tr " " "\n" | grep [A-Za-z0-9_\.-].co.uk > $DIR/newfile.txt

The problem is that this command matches:
/>domain.co.uk<br
/>domain.co.uk<br
/>domain.co.uk<br
etc

How do I modify my regexp to match alphanumeric chars only? (apart from the dots and possible hyphens)

Many Thanks,

Hal
 

10 More Discussions You Might Find Interesting

1. IP Networking

using unregistered domain names

hey what the hell happens if you make sure (as best one can) that a domain name like anything.com is not used at all, and you set up your own DNS and use that name without registering with a registrar, i know if the address is in use you will make some people very upset and give many internet users... (2 Replies)
Discussion started by: norsk hedensk
2 Replies

2. UNIX for Dummies Questions & Answers

Using Sendmail for multiple domain names

Hi, We're an internet company with several domain names. Our mail server was originally set up to deal with xxx@domain1.com email addresses which works fine. The problem I have is that we're now also using a domain2.com, and sales@domain1.com isn't the same as sales@domain2.com. I've added... (1 Reply)
Discussion started by: captainash
1 Replies

3. Shell Programming and Scripting

processing file names using text files

Hi, I have to perform an iterative function on a set of 10 files. After the first round the output files are named differently than the input files. examples input file name = xxxx1.yyy output file name = xxxx1_0001.yyy I need to rename all of the output files to the original input... (5 Replies)
Discussion started by: ligander
5 Replies

4. Shell Programming and Scripting

please help, find domain names in string

Hello, i have a file contains the information like below /home/username/domain.com/log/access /home/username/domain23.net/log/access /home/reseller/username/domain.com/log/access using a loop i can read every line of the file but i wants to extract domain name like(domain.com,... (3 Replies)
Discussion started by: eyes_drinker
3 Replies

5. UNIX for Dummies Questions & Answers

extracting text and reusing the text to rename file

Hi, I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file. eg: 'file.ps' contains following text: 14 (09 01 932688 0)t the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies

6. UNIX for Advanced & Expert Users

extracting the component names from SVN changes xml file

Hi All, The following is the sample xml which is generated by a tool called HUDSON when ever change occurs in SVN(Sub version namespace). In the given XML , path/paths tags ll be vary depends on no.of changes. now , my requirement is, need a script which can extract the payment and... (1 Reply)
Discussion started by: geervani
1 Replies

7. Shell Programming and Scripting

help extracting text from file

Hello I have a large file with lines beginning with 552, 553, 554, below is a small sample, I need to extract the data you can see below highlighted in bold from this file on the same location on every line and output it to a new file. Thank you in advance for any help 55201KL... (2 Replies)
Discussion started by: firefox2k2
2 Replies

8. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

9. UNIX for Dummies Questions & Answers

Get domain names from IP addresses of apache2 access.log

I am totally new to shell scripting. I want to see people from which domain access my website. I want to generate the domain names from IP addresses in the Apache access.log file. There are around 54 log files. I concatenate all the files into one. I am using Ubuntu 12.04 LTS. So I... (4 Replies)
Discussion started by: Ronni
4 Replies

10. UNIX for Dummies Questions & Answers

Extracting URL with domain

I have a file like this: http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies
TV_EXTRACTINFO_EN(1p)					User Contributed Perl Documentation				     TV_EXTRACTINFO_EN(1p)

NAME
tv_extractinfo_en - read English-language listings and extract info from programme descriptions. SYNOPSIS
tv_extractinfo_en [--help] [--output FILE] [FILE...] DESCRIPTION
Read XMLTV data and attempt to extract information from English-language programme descriptions, putting it into machine-readable form. For example the human-readable text '(repeat)' in a programme description might be replaced by the XML element <previously-shown>. --output FILE write to FILE rather than standard output This tool also attempts to split multipart programmes into their constituents, by looking for a description that seems to contain lots of times and titles. But this depends on the description following one particular style and is useful only for some listings sources (Ananova). If some text is marked with the 'lang' attribute as being some language other than English ('en'), it is ignored. SEE ALSO
xmltv(5). AUTHOR
Ed Avis, ed@membled.com BUGS
Trying to parse human-readable text is always error-prone, more so with the simple regexp-based approach used here. But because TV listing descriptions usually conform to one of a few set styles, tv_extractinfo_en does reasonably well. It is fairly conservative, trying to avoid false positives (extracting 'information' which isn't really there) even though this means some false negatives (failing to extract information and leaving it in the human-readable text). However, the leftover bits of text after extracting information may not form a meaningful English sentence, or the punctuation may be wrong. On the two listings sources currently supported by the XMLTV package, this program does a reasonably good job. But it has not been tested with every source of anglophone TV listings. perl v5.14.2 2011-05-07 TV_EXTRACTINFO_EN(1p)
All times are GMT -4. The time now is 04:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy