Extracting only words from a log file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting only words from a log file
# 1  
Old 08-14-2010
Data Extracting only words from a log file

hello:

i have a file and i am trying to extract only unique words from that file.

i used the command: cat messages.1 | tr " " "\n" | sort | uniq -c

but using this command outputs everything unique in the file be it words, numbers, like all the characters..i need a command which will only words like starting with a-z, A-Z

can you please help me with this. thanx
# 2  
Old 08-15-2010
Sed seems the best choice, then pipe output to sort -u.

This is split onto multiple lines so that comments can be added. There also might be an easier and/or cleaner way, but this does work.

Code:
sed -r '
s/<//g;                         # ditch existing < and > from input
s/>//g;
s/[ \t]([a-zA-Z])/ <\1/g;       # mark start of word with <
s/^[a-zA-Z]/<&/;              # special case to mark first word on line
s/<[^ \t]*/&> /g;             # mark end of words with >
s/^[^<]*</</;                   # delete all up to the first <
s/ //g;                         # delete blanks
s/>[^<]*/>/g;                   # delete non-words (things between > and <)
s/>//g;                         # delete all >
s/^<//;                        # ditch leading <
s/</\n/g;                       # split words with newlines
' |sort -u

If you're running on a BSD based system, then you might need to change the -r option to -E. Certainly -E is needed if you are using sed from AT&T's AST distribution.

This treats all tokens delimited by spaces/tabs that begin with an alpha character as a word. So something like Jul09 is also printed.
# 3  
Old 08-15-2010
thanx for the reply... but i cannot get it to work and i am using Ubuntu.
# 4  
Old 08-15-2010
I failed to mention that I was running the sed as a part of a script and redirecting the file to be searched to the script. Hence, no filename was given on the command. I hope that didn't trip you up. Add the input filename between the closing single quote and the pipe to sort if you haven't.

If you've done that, and you are still having issues, then can you post the sed command you used (to check for typos etc), the first few lines of the input file, and the first few lines of output, someone might be able to help. Really cannot offer anything more without a bit more detail about what is happening.

Last edited by agama; 08-15-2010 at 04:33 PM.. Reason: fix typo
# 5  
Old 08-15-2010
thnx for the reponse.. below is the command im tryin to use.

Code:
sed -r 'messages.1 
s/<//g;
s/>//g;
s/[ \t]([a-zA-Z])/ <\1/g;
s/^[a-zA-Z]/<&/;
s/<[^ \t]*/&> /g;
s/^[^<]*</</;
s/ //g;
s/>[^<]*/>/g;
s/>//g;
s/^<//;
s/</\n/g;
' |sort -u


Last edited by Scott; 08-16-2010 at 02:25 AM.. Reason: Code tags, please...
# 6  
Old 08-15-2010
I think I see what's happening. Your input file is in the sed "programme." Try this small change (move the file name after the programme):

Code:
sed -r ' 
s/<//g;
s/>//g;
s/[ \t]([a-zA-Z])/ <\1/g;
s/^[a-zA-Z]/<&/;
s/<[^ \t]*/&> /g;
s/^[^<]*</</;
s/ //g;
s/>[^<]*/>/g;
s/>//g;
s/^<//;
s/</\n/g;
' messages.1 |sort -u

# 7  
Old 08-15-2010
Agama, Thanx a lot..you are the man Smilie

yes this code is working now... u r like totally cool ..thanx for ur help.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to grep a log file for words listed in separate text file?

Hello, I want to grep a log ("server.log") for words in a separate file ("white-list.txt") and generate a separate log file containing each line that uses a word from the "white-list.txt" file. Putting that in bullet points: Search through "server.log" for lines that contain any word... (15 Replies)
Discussion started by: nbsparks
15 Replies

2. Shell Programming and Scripting

Extracting words and lines based on keywords

Hello! I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here: 1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script? For example: Given a keyword "world" in the line: ... (2 Replies)
Discussion started by: seemad
2 Replies

3. Shell Programming and Scripting

Extracting Words from Text

Hi there, Unix Gurus Back in September last year you helped me find a way to extract the words in brackets in a textfile to a new one. In that case my textfile was made up of sentences containing an only bracketed word per sentence/line: 1. If the boss's son had been , someone would... (9 Replies)
Discussion started by: eldeingles
9 Replies

4. Shell Programming and Scripting

grep - Extracting multiple key words from stdout

Hello. From command line, the command zypper info nxclient return a bloc of data : linux local # zypper info nxclient Loading repository data... Reading installed packages... Information for package nxclient: Repository: zypper_local Name: nxclient Version: 3.5.0-7 Arch: x86_64... (7 Replies)
Discussion started by: jcdole
7 Replies

5. Shell Programming and Scripting

Extracting words from file

I am having a file from which i need to extract different length words into different file. For example 2 letter word into file2, 3 letter word into file3 and so on.... I did one using grep and shell script.. for (( i=1; i<7; i++)) do egrep -o '\<\(?{$i}\)?\>' $1 | sort -u -f|tr >file$i... (4 Replies)
Discussion started by: akhay_ms
4 Replies

6. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ... (9 Replies)
Discussion started by: bds052189
9 Replies

7. Shell Programming and Scripting

words extracting

Hi, Pls assist. dn: uid=test,ou=test,dc=com description: password sunIdentityServerDeviceStatus: Active uid: test objectClass: sunIdentityServerDevice objectClass: iplanet-am-user-service objectClass: top objectClass: iPlanetPreferences sunIdentityServerDeviceType: blabla cn: default... (3 Replies)
Discussion started by: hudson03051nh
3 Replies

8. Shell Programming and Scripting

Extracting part of line between two words

Hi, I have a file few hundred MB's with text like one below in single line. 20091117 abc xyg 20091117 def ghi 20091118 ppp ttt 20091118 zzz zzz xxx I need to extract part of line from 1st occurence of pattern 20091117 till first occurence of another pattern 20091118. I tried... (3 Replies)
Discussion started by: artistic94555
3 Replies

9. Shell Programming and Scripting

Extracting Text Between Two Words

Hi all! Im trying to extract a portion of text from a KML and put it into a new file. Im trying to get all of the points out of it, ignoring everything else so I need only the text between <Placement> and </Placement>. Is there a way to make it extract all instances of these points and not just... (2 Replies)
Discussion started by: Grizzly
2 Replies

10. Shell Programming and Scripting

extracting some words

i run a command that submits a word to WordNET which stores the search results in a document which looks like this... i searched "car" in this instance and id like to extract auto, automobile, machine, and store it in a file with the , , stripped away just the words. WordNET's results' template... (2 Replies)
Discussion started by: mark_nsx
2 Replies
Login or Register to Ask a Question