11-03-2010
Selectively Find/Replace in a file?
I have a file that is HTML encoded. Each line has something like this on each line..
<href=http://link.com/username.aspx>username </a> more info.. <a href=http://link.com/info1.aspx>info1</a> more code... <a href=http://link.com/info2.aspx>info2</a>
I have one goal really.. to clean up the file so that I can more easily parse this info into a PHP application. I'm more familiar with php programming then using grep/sed and such though and I thought I would try to clean it up using a bash script.
So I would like to get rid of the HTML tags and replace them with more meaningfull / cleaner info. Basically I want it to look like this..
USERNAME-username INFO-info1, info2
This would make it easy for me in php to import those values into variables and arrays. I've tried messing around with grep and sed but I can't come up with anything. Any ideas?
Thanks a lot for your help!
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I build several files by using the cut command to grab select fields(columns) from a really bid csv file. Each file is one column of data. I then put them together using paste command. Here is the code built in tcsh:
cut -d , -f 1 some.csv > 1.csv
cut -d , -f 10 some.csv > 10.csv
paste 1.csv... (2 Replies)
Discussion started by: yankee428
2 Replies
2. Shell Programming and Scripting
I have a rather long csh script that works, but it's terribly ungraceful and takes a while from various loops. I only know enough code to get myself into trouble, so I'm looking for some guidance.
I have a large file that is separated at intervals by the same line, like this:
... (2 Replies)
Discussion started by: fusi0n
2 Replies
3. Shell Programming and Scripting
Is there a way to do a find and replace in a .gz file in a single script ?
I can always unzip, find and replace and then zip it again but would hate to do this everytime.
Thanks !
Vivek (1 Reply)
Discussion started by: vashah
1 Replies
4. UNIX for Advanced & Expert Users
Dear users,
I am new to AWK and have been battling with this one for close to a week now. Some of you did offer some help last week but I think I may not have explained myself very well. So I am trying again.
I have a dataset that has the following format where the datasets repeat every... (5 Replies)
Discussion started by: sda_rr
5 Replies
5. UNIX for Dummies Questions & Answers
Dear Members,
Problem is suppose i have 50 lines in a file, 40 lines last character is "\" and the remaining 10 lines are good(i mean these 10 lines do not have "\" character)
How can i remove this character from the file.
Thanks (1 Reply)
Discussion started by: sandeep_1105
1 Replies
6. Shell Programming and Scripting
I'm trying to write a script that will do an ls of a location, echo it into a file, and then read that file and selectively delete files/folders, so it would go something like this:
cd $CLEAN_LOCN
ls >>$TMP_FILE
while read LINE
do
if LINE = $DONTDELETE
skip
elseif LINE =... (2 Replies)
Discussion started by: MaureenT
2 Replies
7. Shell Programming and Scripting
Legends,
I have a file /tmp/list.txt
I want to find "/bin/" and replace it with "/log/"
I tried the follwoing but no luck
Sandy: /tmp> perl -pi -e 's/\/bin\/\/log\/' /tmp/list.txt >> /tmp/try
Substitution pattern not terminated at -e line 1.
AND,
Sandy: /tmp> perl -pi -e... (2 Replies)
Discussion started by: sdosanjh
2 Replies
8. Shell Programming and Scripting
Hi i have a file in which i am doing some processing.
The code is as follows:
#!/bin/ksh
grep DATA File1.txt >> File2.txt
sed 's/DATA//' File2.txt | tr -d ‘ ‘ >> File4.xls
As you can see my output is going in a xl file.The output consist of four columns/feilds out of which the first... (20 Replies)
Discussion started by: Sharma331
20 Replies
9. Shell Programming and Scripting
Hello Forum.
I have a file called abc.sed with the following commands;
s/1/one/g
s/2/two/g
...
I also have a second file called abc.dat and would like to substitute all occurrences of "1 with one", "2 with two", etc and create a new file called abc_new.dat
sed -f abc.sed abc.dat >... (10 Replies)
Discussion started by: pchang
10 Replies
10. UNIX for Dummies Questions & Answers
I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile:
>P39PT-1224_Freq_900
cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg
>P39PT-784_Freq_2
cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc
>P39PT-678_Freq_5... (4 Replies)
Discussion started by: Xterra
4 Replies
LEARN ABOUT DEBIAN
feed::find
Feed::Find(3pm) User Contributed Perl Documentation Feed::Find(3pm)
NAME
Feed::Find - Syndication feed auto-discovery
SYNOPSIS
use Feed::Find;
my @feeds = Feed::Find->find('http://example.com/');
DESCRIPTION
Feed::Find implements feed auto-discovery for finding syndication feeds, given a URI. It (currently) passes all of the auto-discovery tests
at http://diveintomark.org/tests/client/autodiscovery/.
Feed::Find will discover the following feed formats:
o RSS 0.91
o RSS 1.0
o RSS 2.0
o Atom
USAGE
Feed::Find->find($uri)
Given a URI $uri, use a variety of techniques to find the feeds associated with that page. If $uri itself points to a feed (i.e., if the
Content-Type of the response is a recognized feed type), returns $uri.
Returns a list of feed URIs.
The following techniques are used:
1. <link> tag auto-discovery
If the page contains any <link> tags in the <head> section, these tags are examined for recognized feed content types. The following
content types are treated as feeds: application/x.atom+xml, application/atom+xml, application/xml, text/xml, application/rss+xml, and
application/rdf+xml.
2. Scanning <a> tags
If the page does not contain any known <link> tags, the page is then scanned for <a> tags for links to URIs with certain file
extensions. The following extensions are treated as feeds: .rss, .xml, and .rdf.
Note that this technique is employed only if the first technique returns no results.
Feed::Find->find_in_html($html [, $base_uri ])
Given a reference to a string $html containing an HTML page, uses the same techniques as described above in find to find the feeds
associated with that page.
If you know the URI of the page, you should provide it in $base_uri, so that relative links can be properly made absolute. Feed::Find will
attempt to determine the correct base URI, but unless that URI is specified in the HTML itself (in a "<meta>" tag), you'll need to supply
it yourself.
Returns a list of feed URIs.
LICENSE
Feed::Find is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR & COPYRIGHT
Except where otherwise noted, Feed::Find is Copyright 2004 Benjamin Trott, ben+cpan@stupidfool.org. All rights reserved.
perl v5.10.1 2011-01-28 Feed::Find(3pm)