Search for the word and exporting 35 characters after that word using shell script?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search for the word and exporting 35 characters after that word using shell script?
# 1  
Old 08-16-2012
Search for the word and exporting 35 characters after that word using shell script?

I have a file input.txt which have loads of weird characters, html tags and useful materials. I want to display 35 characters after the word description excluding weird characters like $$#$#@$#@***$# and without html tags in the new file output.txt. Help me. Thanx in advance.

My final goal is to find the word description and print 35 characters after description which shouldn't include the html tags and weird characters. Is it possible? Like here:
Code:
 description><p><img class="float_right"
 src="http://static3.businessinsider.com/image/502ab0036bb3f7147b00000f-400-300/dnu.jpg"
 border="0" alt="dnu" width="400" height="300" /></p><p>The lawn
 was filled with <a class="hidden_link"
 href="http://www.businessinsider.com/blackboard/goldman-sachs">Goldman
 Sachs</a> Group Inc. partners dressed in pink looking out

I want to start from: The lawn is filled with (again skip those tags and continue from) Group Inc. partners (35 characters .done!) and then stop and search for another description!

Last edited by Franklin52; 08-16-2012 at 03:42 AM.. Reason: fixed code tags
# 2  
Old 08-16-2012
please provide the output desired
# 3  
Old 08-16-2012
To me, it looks like the &lt (<) and &gt (>) pairs don't match in your sample so it's difficult to eliminate the HTML stuff consistently. Pls confirm or revise your sample.
# 4  
Old 08-16-2012
Yea, the characters are not uniform. That's the sample. My sample output is:
The lawn is filled with (again skip those tags and continue from) Group Inc. partners (35 characters .done!) and then stop and search for another description!

The script or the command shouldn't be 100% able to remove the html tags and weird characters as there are variation.
Thank you! This is what I thought of:
1) Search for description word using grep.
2) Grab 35 characters after description using sed excluding weird and html characters.
3) Printing out the output in output.txt file.

Is it possible? Please help me!
# 5  
Old 08-16-2012
This will work on exactly your sample, it cannot resolve the <'s and >'s crossing, and it depends on your sed accepting the -r option (extended regex):
Code:
 sed  -r -n ':rep;N; $ !T rep; s/\n//g;s/ description>//; s/&lt;[^&]*&gt;//g; s/(.{35}).*/\1/ p'

yielding
Code:
The lawn was filled with Goldman Sa

# 6  
Old 08-16-2012
@RudiC: Hello, I ran the command But I get only 15 characters after title. I have attached my source file. I want 35 characters from the word description without html tags and weird characters. There are many description words. So 35 characters after every description should be the output. Should we use loop for that? I think it's better to put down the code in the script file.
# 7  
Old 08-16-2012
I knew your sample was NOT representing your input exactly! Anyway, try this:
Code:
sed -r 's/.*description>//g; s/&lt;[^&]*&gt;//g; s/(.{35}).*/\1/ ' input.txt

printing
Code:
The lawn was filled with Goldman Sa
The recall would cover almost all t
Fran&amp;ccedil;ois Hollande, still
More people in the world are overwe
Bloomberg TV just hosted a debate o
 Having failed to graduate from hig
When it comes to big data, "size do
A very successful entrepreneur who 
Unfortunately, there is no World Ba
 Deutsch LA released its first Targ
What if the generation that once ro
Official Chinese economic data have
Andy Grignon is always looking for 
Author Bob Sutton has posted on his
VIENNA (Reuters) - Scientists have 
Whether he's spending time with his
The Harvard Business Review has a f
Today a court in Miami refused a bo
Hedge fund Soros Funds has filed hi
If you want to understand the bigge
Most science journals put up multip
Legendary hedge fund manager John P
There has been a lot of noise about
Barcelona is a partygoer&amp;rsquo;
Hedge fund titan Bill Ackman, the f

from your input.txt file.

Last edited by RudiC; 08-16-2012 at 08:57 AM.. Reason: omitted the NOT as first sample was quite misleading
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to search for a word in column header that fully matches the word not partially in awk?

I have a multicolumn text file with header in the first row like this The headers are stored in an array called . which contains I want to search for each elements of this array from that multicolumn text file. And I am using this awk approach for ii in ${hdr} do gawk -vcol="$ii" -F... (1 Reply)
Discussion started by: Atta
1 Replies

2. Shell Programming and Scripting

Search for a specific word and print only the word from the input file

Hi, I have a sample file as shown below, I am looking for sed or any command which prints the complete word only from the input file. Ex: $ cat "sample.log" I am searching for a word which is present in this file We can do a pattern search using grep but I need to cut only the word which... (1 Reply)
Discussion started by: mohan_kumarcs
1 Replies

3. Shell Programming and Scripting

Shell Script @ Find a key word and If the key word matches then replace next 7 lines only

Hi All, I have a XML file which is looks like as below. <<please see the attachment >> <?xml version="1.0" encoding="UTF-8"?> <esites> <esite> <name>XXX.com</name> <storeId>10001</storeId> <module> ... (4 Replies)
Discussion started by: Rajeev_hbk
4 Replies

4. Shell Programming and Scripting

[Solved] Search for a word and print the next word

Hi, I am trying to search for a word and print the next word. For example: My text is "<TRANSFORMATION TYPE ="Lookup Procedure">" I am searching for "TYPE" and trying to print ="Lookup Procedure" I have written a code like following: echo $line | nawk... (4 Replies)
Discussion started by: sampoorna
4 Replies

5. Shell Programming and Scripting

Search for the word and exporting 35 characters after that word using shell script

I have a file input.txt which have loads of weird characters, html tags and useful materials. I want to display 35 characters after the word "description" excluding weird characters like $&lmp and without html tags in the new file output.txt. Help me. Thanx in advance. I have attached the input... (4 Replies)
Discussion started by: sachit adhikari
4 Replies

6. UNIX for Dummies Questions & Answers

Find EXACT word in files, just the word: no prefix, no suffix, no 'similar', just the word

I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL. I need the whole word... (6 Replies)
Discussion started by: chicchan
6 Replies

7. UNIX for Dummies Questions & Answers

Script to search for a particular word in files and print the word and path name

Hi, i am new to unix shell scripting and i need a script which would search for a particular word in all the files present in a directory. The output should have the word and file path name. For example: "word" "path name". Thanks for the reply in adv,:) (3 Replies)
Discussion started by: virtual_45
3 Replies

8. Shell Programming and Scripting

Search the word to be deleted and delete lines above this word starting from P1 to P3

Hi, I have to search a word in a text file and then I have to delete lines above from the word searched . For eg suppose the file is like this: Records P1 10,23423432 ,77:1 ,234:2 P2 10,9089004 ,77:1 ,234:2 ,87:123 ,9898:2 P3 456456 P1 :123,456456546 P2 abc:324234 (2 Replies)
Discussion started by: vsachan
2 Replies

9. Shell Programming and Scripting

To search a file for a specific word in a file using shell script

Hi All, I have a sql output file has below. I want to get the values 200000040 and 1055.49 .Can anyone help me to write a shell script to get this. ACCOUNT_NO ------------------------------------------------------------ BILL_NO ... (8 Replies)
Discussion started by: girish.raos
8 Replies

10. Shell Programming and Scripting

Can a shell script pull the first word (or nth word) off each line of a text file?

Greetings. I am struggling with a shell script to make my life simpler, with a number of practical ways in which it could be used. I want to take a standard text file, and pull the 'n'th word from each line such as the first word from a text file. I'm struggling to see how each line can be... (5 Replies)
Discussion started by: tricky
5 Replies
Login or Register to Ask a Question