Text analysis


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Text analysis
# 22  
Old 04-05-2011
hey

Thanks for your post,

I took upon your example code and tried it, but i am getting 'unrecognized history modifier'. I am running the command on the previously attached file 'walle.txt'

---------- Post updated at 02:47 PM ---------- Previous update was at 02:16 PM ----------

Hey guys,

So along with the following code:

walle.txt | sed "s/'/ /g" | xargs -n1 | sort | uniq -c - this runs a word count of all of the words in the text file 'walle.txt' but some of the output contains punctuation and multiple words e.g A.?/, or A/B.
how would i combine this with sed "s/[\"',.;?!:]/ /g;s/ */ /g" walle.txt
which removes punctuation but still outputs the word frequency?
Is this possible?
Can somebody help me please?

Kind regards
# 23  
Old 04-05-2011
Why don't you just paste the text into: Word Frequency Counter
Anyway, here is the results:

Last edited by AlphaLexman; 04-05-2011 at 07:18 PM..
This User Gave Thanks to AlphaLexman For This Post:
# 24  
Old 04-05-2011
Quote:
Originally Posted by AlphaLexman
Why don't you just paste the text into: Word Frequency Counter
Anyway, here is the results:
Hmmm.... very interesting unorthodox approach Smilie
# 25  
Old 04-06-2011
Code:
strings walle.txt | head -5521 | tail +91 | sed "s/[][^\"'(\`),=\.;?!:#%+-]/ /g;s/  */ /g" | tr ' [:upper:]' '\n[:lower:]' | grep -v '^[0-9]*$' | sort | uniq -c >wordcount.txt

---------- Post updated at 11:23 AM ---------- Previous update was at 09:23 AM ----------

Check the file at the different step and see if it can fit your needs

Code:
strings walle.txt | head -5521 | tail +91 >f1
sed "s/[][^\"'(\`),=\.;?!:#%+-]/ /g;s/  */ /g" f1 >f2
tr ' [:upper:]' '\n[:lower:]' <f2 >f3
grep -v '^[0-9]' f3 | sort >f4
uniq -c f4 >f5
sort -k 1n f5 >f6

Code:
strings walle.txt | head -5521 | tail +91 >f1
sed "s/[][^\"'(\`),=\.;?!:#%+-]/ /g;s/  */ /g" f1 >f2
tr ' [:upper:]' '\n[:lower:]' <f2 >f1
grep -v '^[0-9]' f1 | sort >f2
uniq -c f2 >f1
sort -k 1n f1 >f2


Last edited by ctsgnb; 04-06-2011 at 06:31 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Infrastructure Monitoring

Nmon Analysis

Dear All, I am an performance tester. Now i am working in project where we are using linux 2.6.32. Now I got an oppurtunity to learn the monitoring the server. As part of this task i need to do analysis of the Nmon report. I was completely blank in this. So please suggest me how to start... (0 Replies)
Discussion started by: iamsengu
0 Replies

2. UNIX for Dummies Questions & Answers

Help with text analysis - UNIX

Hey Guys I recently posted yesterday about trying to count the amount of separate words that exists in a text file e.g. walle.txt. i want the output to give to give me a list of words with a number next indicating how many times its came up in the file e.g: cat 20 the 11 if 40 I'm... (0 Replies)
Discussion started by: John0101
0 Replies

3. Shell Programming and Scripting

Analysis of a script

what does this line in a script mean?? I have tried to give it at the command prompt and here is what it returns ksh: /db2home/db2dap1/sqllib/db2profile: not found. . /db2home/db2dap1/sqllib/db2profile i have tried the same thing for my home directory too and the result is the same .... (5 Replies)
Discussion started by: ramky79
5 Replies

4. Shell Programming and Scripting

Metacharacters analysis

:confused:Hi , Can someone please advise what is the meaning of metacharacters in below code? a_PROCESS=${0##*/} a_DPFX=${a_PROCESS%.*} a_LPFX="a_DPFX : $$ : " a_UPFX="Usage: $a_PROCESS" Regards, gehlnar (3 Replies)
Discussion started by: gehlnar
3 Replies

5. Shell Programming and Scripting

text file analysis

Hello, I have a text file containin 4 lines which are repeated along the file, ie the file looks like this: 16:20:12.060769 blablabla 40 16:20:12.093199 blablabla 640 16:20:12.209003 blablabla 640 16:20:12.273179 blablabla 216 16:20:27.217444 blablabla 40 16:20:27.235410 blablabla 640... (2 Replies)
Discussion started by: Celine19
2 Replies

6. Programming

Regarding stack analysis

I would like to know how I could do the following : void func(){ int a = 100; b=0; int c = a/b; } void sig_handler (int sig,siginfo_t *info,void *context){ //signal handling function //here I want to access the variables of func() } int main(){ struct sigaction *act =... (7 Replies)
Discussion started by: vpraveen84
7 Replies

7. Shell Programming and Scripting

AWK script: decrypt text uses frequency analysis

Ez all! I have a question how to decrypt text uses letter frequency analysis. I have code which count the letters, but what i need to do after that. Can anybody help me to write a code. VERY NEEDED! My code now: #!/usr/bin/awk -f BEGIN { FS="" } { for (i=1; i <= NF; i++) { if ($i... (4 Replies)
Discussion started by: SerJel
4 Replies

8. Solaris

Catalina Analysis

How can I make analysis for catalina.out (2 Replies)
Discussion started by: Burhan
2 Replies
Login or Register to Ask a Question