Extracting Words from Text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extracting Words from Text
# 1  
Old 05-19-2012
Extracting Words from Text

Hi there, Unix Gurus

Back in September last year you helped me find a way to extract the words in brackets in a textfile to a new one.

In that case my textfile was made up of sentences containing an only bracketed word per sentence/line:

1. If the boss's son had been [kidnapped], someone would have asked for money by now.
2. Look, I haven't [committed] a crime, so why can't you let me go?
....

Bur I am trying in vain to do the same but this time on a file full of different texts, not sentences.

...Many astronauts [have] travelled [in] space, but now, ordinary people [are] travelling [in] space too. Dennis Tito [is] over 60 years old, [but] he [hasn't] stopped working yet. In fact, [he] is very active, and [in] 2001, he [did] something amazing. He [became] the world's first [space] tourist. So ... [who] is Dennis Tito? Where [does] he come [from] ? How [did] he become [a] space [tourist] ? Tito [comes] from [the] United States. He was [born] in New York, but [he] has [been] [living] in California [for] many years. He [is] a very rich [and] successful [businessman]...

The following code only extracts the last bracketed word.

sed 's/\(.*\[\).*\(\].*\)/\1\2/g' inputfile > outputfile

As I asked back then, adding the blanked out bracketed words to a new file would be a bonus.

Any help infinitely appreciated.
# 2  
Old 05-19-2012
Code:
sed 's/][^]]*\[/ /g' infile | sed 's/.*\[\([^]]*\).*/\1/'

This User Gave Thanks to complex.invoke For This Post:
# 3  
Old 05-19-2012
An awk attempt:
Code:
$ awk -v RS=[ -v FS=] '$2 {print $1}' file
have
in
are
in
is
but
hasn't
he
in
did
became
space
who
does
from
did
a
tourist
comes
the
born
he
been
living
for
is
and
businessman

This User Gave Thanks to Scott For This Post:
# 4  
Old 05-19-2012
close but...

Quite near,

yeah, both pieces of code list differently the bracketed words in the text, but I would also need the text with the empty brackets, such as this:

... I met an old [ ] friend last week that I hadn't [ ] [ ] twenty [ ] . He [ ] me about what I [ ] doing and I [ ] him I was back [ ] England for [my] nephew's [ ] , but that I [ ] [ ] ...

Thanks guys!!Smilie

Last edited by eldeingles; 05-19-2012 at 02:27 PM..
# 5  
Old 05-19-2012
Sorry, did I misread your question?

Are you saying you want to blank out the [words] with _, while also storing those words in a new file?

First, do the awk
Code:
(the awk) > newfile

Then do the sed:
Code:
sed -i "s/\[[^]]*/[ /g" file

cat file
...Many astronauts [ ] travelled [ ] space, but now, ordinary people [ ] travelling [ ] space too. Dennis Tito [ ] over 60 years old, [ ] he [ ] stopped working yet. In fact, [ ] is very active, and [ ] 2001, he [ ] something amazing. He [ ] the world's first [ ] tourist. So ... [ ] is Dennis Tito? Where [ ] he come [ ] ? How [ ] he become [ ] space [ ] ? Tito [ ] from [ ] United States. He was [ ] in New York, but [ ] has [ ] [ ] in California [ ] many years. He [ ] a very rich [ ] successful [ ]

# 6  
Old 05-19-2012
Yes, Scott!
# 7  
Old 05-19-2012
We're posting across each other!

I think my previous post fits what you describe Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting words and lines based on keywords

Hello! I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here: 1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script? For example: Given a keyword "world" in the line: ... (2 Replies)
Discussion started by: seemad
2 Replies

2. Shell Programming and Scripting

grep - Extracting multiple key words from stdout

Hello. From command line, the command zypper info nxclient return a bloc of data : linux local # zypper info nxclient Loading repository data... Reading installed packages... Information for package nxclient: Repository: zypper_local Name: nxclient Version: 3.5.0-7 Arch: x86_64... (7 Replies)
Discussion started by: jcdole
7 Replies

3. Shell Programming and Scripting

Extracting words from file

I am having a file from which i need to extract different length words into different file. For example 2 letter word into file2, 3 letter word into file3 and so on.... I did one using grep and shell script.. for (( i=1; i<7; i++)) do egrep -o '\<\(?{$i}\)?\>' $1 | sort -u -f|tr >file$i... (4 Replies)
Discussion started by: akhay_ms
4 Replies

4. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

5. UNIX for Dummies Questions & Answers

Extracting only words from a log file

hello: i have a file and i am trying to extract only unique words from that file. i used the command: cat messages.1 | tr " " "\n" | sort | uniq -c but using this command outputs everything unique in the file be it words, numbers, like all the characters..i need a command which will only... (6 Replies)
Discussion started by: vikbenq
6 Replies

6. Shell Programming and Scripting

words extracting

Hi, Pls assist. dn: uid=test,ou=test,dc=com description: password sunIdentityServerDeviceStatus: Active uid: test objectClass: sunIdentityServerDevice objectClass: iplanet-am-user-service objectClass: top objectClass: iPlanetPreferences sunIdentityServerDeviceType: blabla cn: default... (3 Replies)
Discussion started by: hudson03051nh
3 Replies

7. Shell Programming and Scripting

Extracting part of line between two words

Hi, I have a file few hundred MB's with text like one below in single line. 20091117 abc xyg 20091117 def ghi 20091118 ppp ttt 20091118 zzz zzz xxx I need to extract part of line from 1st occurence of pattern 20091117 till first occurence of another pattern 20091118. I tried... (3 Replies)
Discussion started by: artistic94555
3 Replies

8. Shell Programming and Scripting

Extracting Text Between Two Words

Hi all! Im trying to extract a portion of text from a KML and put it into a new file. Im trying to get all of the points out of it, ignoring everything else so I need only the text between <Placement> and </Placement>. Is there a way to make it extract all instances of these points and not just... (2 Replies)
Discussion started by: Grizzly
2 Replies

9. UNIX for Dummies Questions & Answers

extracting text and reusing the text to rename file

Hi, I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file. eg: 'file.ps' contains following text: 14 (09 01 932688 0)t the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies

10. Shell Programming and Scripting

extracting some words

i run a command that submits a word to WordNET which stores the search results in a document which looks like this... i searched "car" in this instance and id like to extract auto, automobile, machine, and store it in a file with the , , stripped away just the words. WordNET's results' template... (2 Replies)
Discussion started by: mark_nsx
2 Replies
Login or Register to Ask a Question