02-11-2009
Please help modify solution
I am trying to extract .co.uk domains from html,
using the command:
cat $DIR/oldfile.txt | tr " " "\n" | grep [A-Za-z0-9_\.-].co.uk > $DIR/newfile.txt
The problem is that this command matches:
/>domain.co.uk<br
/>domain.co.uk<br
/>domain.co.uk<br
etc
How do I modify my regexp to match alphanumeric chars only? (apart from the dots and possible hyphens)
Many Thanks,
Hal
10 More Discussions You Might Find Interesting
1. IP Networking
hey what the hell happens if you make sure (as best one can) that a domain name like anything.com is not used at all, and you set up your own DNS and use that name without registering with a registrar, i know if the address is in use you will make some people very upset and give many internet users... (2 Replies)
Discussion started by: norsk hedensk
2 Replies
2. UNIX for Dummies Questions & Answers
Hi,
We're an internet company with several domain names. Our mail server was originally set up to deal with xxx@domain1.com email addresses which works fine.
The problem I have is that we're now also using a domain2.com, and sales@domain1.com isn't the same as sales@domain2.com.
I've added... (1 Reply)
Discussion started by: captainash
1 Replies
3. Shell Programming and Scripting
Hi,
I have to perform an iterative function on a set of 10 files. After the first round the output files are named differently than the input files.
examples
input file name = xxxx1.yyy
output file name = xxxx1_0001.yyy
I need to rename all of the output files to the original input... (5 Replies)
Discussion started by: ligander
5 Replies
4. Shell Programming and Scripting
Hello,
i have a file contains the information like below
/home/username/domain.com/log/access
/home/username/domain23.net/log/access
/home/reseller/username/domain.com/log/access
using a loop i can read every line of the file but i wants to extract domain name like(domain.com,... (3 Replies)
Discussion started by: eyes_drinker
3 Replies
5. UNIX for Dummies Questions & Answers
Hi,
I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file.
eg:
'file.ps' contains following text:
14 (09 01 932688 0)t
the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies
6. UNIX for Advanced & Expert Users
Hi All,
The following is the sample xml which is generated by a tool called HUDSON when ever change occurs in SVN(Sub version namespace).
In the given XML , path/paths tags ll be vary depends on no.of changes.
now , my requirement is, need a script which can extract the payment and... (1 Reply)
Discussion started by: geervani
1 Replies
7. Shell Programming and Scripting
Hello I have a large file with lines beginning with 552, 553, 554, below is a small sample, I need to extract the data you can see below highlighted in bold from this file on the same location on every line and output it to a new file.
Thank you in advance for any help
55201KL... (2 Replies)
Discussion started by: firefox2k2
2 Replies
8. UNIX for Dummies Questions & Answers
Hi,
I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies
9. UNIX for Dummies Questions & Answers
I am totally new to shell scripting. I want to see people from which domain access my website. I want to generate the domain names from IP addresses in the Apache access.log file.
There are around 54 log files. I concatenate all the files into one.
I am using Ubuntu 12.04 LTS.
So I... (4 Replies)
Discussion started by: Ronni
4 Replies
10. UNIX for Dummies Questions & Answers
I have a file like this:
http://article.wn.com/view/2010/11/26/IV_drug_policy_feels_HIV_patients_Red_Cross/ http://aidsjournal.com/,www.cfpa.org.cn/page1/page2 , www.youtube.com
http://seattletimes.nwsource.com/html/jerrybrewer/2013517803_brewer25.html... (1 Reply)
Discussion started by: csim_mohan
1 Replies
LEARN ABOUT DEBIAN
tv_extractinfo_en
TV_EXTRACTINFO_EN(1p) User Contributed Perl Documentation TV_EXTRACTINFO_EN(1p)
NAME
tv_extractinfo_en - read English-language listings and extract info from programme descriptions.
SYNOPSIS
tv_extractinfo_en [--help] [--output FILE] [FILE...]
DESCRIPTION
Read XMLTV data and attempt to extract information from English-language programme descriptions, putting it into machine-readable form.
For example the human-readable text '(repeat)' in a programme description might be replaced by the XML element <previously-shown>.
--output FILE write to FILE rather than standard output
This tool also attempts to split multipart programmes into their constituents, by looking for a description that seems to contain lots of
times and titles. But this depends on the description following one particular style and is useful only for some listings sources
(Ananova).
If some text is marked with the 'lang' attribute as being some language other than English ('en'), it is ignored.
SEE ALSO
xmltv(5).
AUTHOR
Ed Avis, ed@membled.com
BUGS
Trying to parse human-readable text is always error-prone, more so with the simple regexp-based approach used here. But because TV listing
descriptions usually conform to one of a few set styles, tv_extractinfo_en does reasonably well. It is fairly conservative, trying to
avoid false positives (extracting 'information' which isn't really there) even though this means some false negatives (failing to extract
information and leaving it in the human-readable text).
However, the leftover bits of text after extracting information may not form a meaningful English sentence, or the punctuation may be
wrong.
On the two listings sources currently supported by the XMLTV package, this program does a reasonably good job. But it has not been tested
with every source of anglophone TV listings.
perl v5.14.2 2011-05-07 TV_EXTRACTINFO_EN(1p)