Extract title


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract title
# 1  
Old 03-25-2016
Extract title

need a bit of help please

i have a htmlfile

in the file there is a long list of txt
each line has a different title

i want to extract all the titles with

egrep, sed, awk, and save them to a txt.file


example extract the bits in red
from this

Code:
        <h3 class="single-item__title typo typo--skylark"><strong>Follow the Money</strong></h3>
        <p class="single-item__subtitle typo typo--canary">Episode 1</p>

and save like this
Code:
Follow the Money Episode 1

thanks

Last edited by bob123; 03-25-2016 at 03:09 PM..
# 2  
Old 03-25-2016
What is the criteria for filtering out lines that have titles in them?
# 3  
Old 03-25-2016
Quote:
What is the criteria for filtering out lines that have titles in them?
????


i have already said in post 1
i want to save them to txt file

thats all

can anyone help with my question

thanks

Last edited by bob123; 03-25-2016 at 01:50 PM..
# 4  
Old 03-26-2016
With that sparse info given by you, this will do EXACTLY what you requested for EXACTLY the samples in post#1:
Code:
sed -n '1h;1!H;${x;s/<[^>]*>\|\n\|^ *//gp}' file
Follow the Money        Episode 1

This User Gave Thanks to RudiC For This Post:
# 5  
Old 03-26-2016
Hi.

With augmented data:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate extraction from HTML, lynx.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C lynx sed paste

ORIG=${1-data1}
FILE=${ORIG}.html
cp $ORIG $FILE

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
lynx -dump $FILE |
tee f1 |
sed '/^[        ]*$/d' |
tee f2 |
sed 's/^[       ]*//' |
tee f3 |
paste -d" " - -

exit 0

producing:
Code:
$ ./s1 data2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.3 (jessie) 
bash GNU bash 4.3.30
Lynx Version 2.8.9dev.1 (12 Mar 2014)
sed (GNU sed) 4.2.2
paste (GNU coreutils) 8.23

-----
 Input data file data2.html:
<h3 class="single-item__title typo typo--skylark"><strong>Follow the Money</strong></h3>
<p class="single-item__subtitle typo typo--canary">Episode 1</p>
<h3 class="single-item__title typo typo--skylark"><strong>Follow Your Heart</strong></h3>
<p class="single-item__subtitle typo typo--canary">Episode 0</p>

-----
 Results:
Follow the Money Episode 1
Follow Your Heart Episode 0

Best wishes ... cheers, drl

PS:
Advice for forum posts, general:

To obtain the best answers quickly for processing datasets - extracting, transforming, filtering, you should, after having searched for answers (man pages, Google, etc.):

1. Post representative samples of your data (i.e. data that should "succeed" and data that should "fail")

2. Post what you expect the results to be, in addition to describing them. Be clear about how the results are to be obtained, e.g. "add field 2 from file1 to field 3 from file2", "delete all lines that contain 'possum', etc.

3. Post what you have attempted to do so far. Post scripts, programs, etc. within CODE tags. If you have a specific question about an error, please post the shortest example of the code, script, etc. that exhibits the problem.

4. Place the data and expected output within CODE tags, so that they are more easily readable.

5. If you require the use of a specific shell or command, explain why that is the case: if you cannot solve a problem, it may be because you do not know about or enough about a software tool, in which case the responders are probably better judges of a solution than you are.

If you don't show us a representative sample of your input when you start, it should not be a surprise if responder-created-input, possibly in a different format from yours, will work, but your real data won't work with the solutions we suggest.

Special cases, exceptions, etc., are very important to include in the samples.
This User Gave Thanks to drl For This Post:
# 6  
Old 03-26-2016
Quote:
Originally Posted by RudiC
With that sparse info given by you, this will do EXACTLY what you requested for EXACTLY the samples in post#1:
Code:
sed -n '1h;1!H;${x;s/<[^>]*>\|\n\|^ *//gp}' file
Follow the Money        Episode 1

thank you for your time

i tried the sed code above and it works
but it outputs with no space between Money and Episode
like this
Follow the MoneyEpisode 1

how do i get the space ?
and save like this
Follow the Money Episode 1

also because there are many lines
with your code above
they all save on the same line side by side
i want it to save like this

Follow the Money Episode 1
Follow the Money Episode 2
Follow the Money Episode 3

thank you again
sorry about being a bit thick lol
as this is all new to me

Last edited by bob123; 03-26-2016 at 01:11 PM..
# 7  
Old 03-26-2016
Try
Code:
sed 'N; s/<[^>]*>\|  \+//g; s/\n/ /g' file

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl title help

my codeothers (1 Reply)
Discussion started by: sauravrout
1 Replies

2. Post Here to Contact Site Administrators and Moderators

Changing title of post

In the past some typos have occurred in my post titles. Changing the title of a post is not possible for registered users however, but occasionally the admins are kind enough to do it for me. Will this functionality come available for registered users as well, regardless of whether there are... (1 Reply)
Discussion started by: figaro
1 Replies

3. UNIX for Advanced & Expert Users

PuTTY TITLE

Is it possible to set the PuTTY title to show the current hostname of the terminal opened? I meant to say I would be rlogin between servers, that should be updated in PuTTTY title ? :confused: (4 Replies)
Discussion started by: ./hari.sh
4 Replies

4. SCO

Title changing

I am using SCO Open Server 5.0.6 in different machines in same building. In order to avaoid confusion I would like to give different login messages instead of 'SCO OpenServer(TM) Release 5' prior to machine name. Pl. help. Praseeda (2 Replies)
Discussion started by: praseeda
2 Replies

5. AIX

title with sendmail

hello I send a message with sendmail command: echo "toto" | sendmail titi@up.com but i don't know how put a title in the message ! (i find nothing in the man) thank you (2 Replies)
Discussion started by: pascalbout
2 Replies

6. UNIX for Dummies Questions & Answers

window title..?

how to echo some text onto the window title bar...in exceed client...so that we can know where we are in..in multiple hostnames kinda..thing.. any advice.. (1 Reply)
Discussion started by: tintedwindow
1 Replies

7. UNIX for Dummies Questions & Answers

Change Console Title

Hello. I am trying to figure out how to change the title of my console window. I don't need it to dynamically change with my current directory. I just want to be able to change the name of it, so when I minimize the window I can see which windows are for what purpose. I've tried using the... (8 Replies)
Discussion started by: medmonson
8 Replies
Login or Register to Ask a Question