Find redundant text in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find redundant text in a file
# 1  
Old 12-03-2011
Find redundant text in a file

I want to find which pattern or strings have occurred more than one time so that I can remove unnecessary redundancy.

For example:

If I have the sentence:

A quick brown brown fox jumps jumps jumps over the lazy dog

in a file, then I want to know that

1. the word "brown" has occurred 2 times.
2. the word "jump" has occurred 3 times.

in the above mentioned sentence.

Note that I have no idea which words have been repeated.
So I cannot make a pattern match search.

So I just need to know what are the texts/strings are redundant in a file. Is it possible?

Thanks.
# 2  
Old 12-03-2011
Try:
Code:
perl -0ne 'while (/(\w+ )\1+/g){@x=split / /,$&;print "$x[0]: " . ($#x+1) . " times\n"}' file

# 3  
Old 12-03-2011
Sorry I didn't get any output !

Suppose I have a file called test.sh

cat test.sh

gives

abc dfg
ecd xkl mno
abc
dfg asj kllll
jkl p
dfg
o

Now you see 'abc' is repeated in the 1st and 3rd line.

'dfg' is repeated in 1st, 4th, and 5th line.

I may expect to see 'abc' and 'dfg' to be printed out on the screen with highlights in the corresponding lines or something similar.

I have attached the sample file.

Thanks.

Code:
abc   dfg
ecd  xkl mno
abc  
dfg  asj kllll 
jkl  p
dfg
o


Last edited by Scott; 12-03-2011 at 11:40 AM.. Reason: Please don't attach 59-byte files. No-one wants that in their Downloads folder.
# 4  
Old 12-03-2011
I thought you need only consecutive repetitions. Try this:
Code:
perl -ne 'while (/\w+/g){$c{$&}++};END{for $i (keys %c){print "$i: $c{$i}\n" if $c{$i}>1}}' file

# 5  
Old 12-03-2011
Thanks what if a file contain names like this:

Bat:Ball

Bat:Wicket

Bat:Ball

Bat:Bat

Wicket:Bat

I wish to get "Bat:Ball" to be printed, not the "Bat" or "Ball" individually.

Thanks.
# 6  
Old 12-04-2011
Please some one reply. It seems quite important to me. Thanks.
# 7  
Old 12-04-2011
Try this...
Code:
awk '{for(i=1;i<=NF;i++){a[$i]++}}END{for(i in a){if(a[i]>1){print i,a[i]}}}' input_file

--ahamed
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find and replace a string in a text file

Dear all, I want to find all the "," in my text file and then replace the commas to a tab. I found a script online but I don't know how to modify the script for my case. Any one can help? Thank you. @echo off &setlocal set "search=%1" set "replace=%2" set "textfile=Input.txt" set... (2 Replies)
Discussion started by: forevertl
2 Replies

2. Shell Programming and Scripting

Reduce redundant file

Dear All, I have to reduce the redundancy of a file that is like this: a b 0 a c 0 a f 1 b a 1 b a 0 b c 1 d f 0 g h 1 f d 1 Basically, this file describe a network with relative nodes and edges. The nodes are the different letters and the edges are represented by the numbers (in... (7 Replies)
Discussion started by: giuliangiuseppe
7 Replies

3. UNIX for Dummies Questions & Answers

Find text in file

Hello i a script: #!/bin/sh count=0 for iname in `cat mysong` do for cname in `cat mysong` do if then count=`expr $count + 1` fi done echo "word: $iname - found in the text: $count times" count=0 donethe proplem: how i... (2 Replies)
Discussion started by: levitmic
2 Replies

4. Shell Programming and Scripting

How to find numbers in text file?

Hi I have a text file with rows like this: 7 Herman ASI-40 Jungle (L) Blueprint (L) Weapon Herman ASI-40 Jungle (L) 215.00 57 65.21 114.41 and 9 Herman CAP-505 (L) Blueprint (L) Weapon Herman CAP-505 (L) 220.00 46.84 49.1 104.82 and 2 ClericDagger 1C blueprint Melee - Shortblade... (2 Replies)
Discussion started by: pesa
2 Replies

5. Shell Programming and Scripting

Find string in text file

Hello! Please, help me to write such script. I have some text file with name filename.txt I must check if this file contains string "test-string-first", I must cut from this file string which follows string "keyword-string:" and till first white-space and save it to some variable. For... (3 Replies)
Discussion started by: optik77
3 Replies

6. Shell Programming and Scripting

Find ^Z in a unix text file

Hi Everybody, I have an unknown number of files that for some reason contain the ^Z character. I would need a command that helps me identifying these files. Here is an example of a line: JUAN HERN^ZNDEZ I would greatly appreciate your help. Thanks in advance, Sebastian (3 Replies)
Discussion started by: hhoosscchhii
3 Replies

7. Shell Programming and Scripting

find a string in a file and add some text after that file

Hi Could you please help me out by solving teh below problem ? I have a file with as below source1|target1|yes source2|target2|no source1 is file in which i have to place some code under the <head> tag in it. What code i have to place in source1 is something like this "abcd.....<target1>... (5 Replies)
Discussion started by: Tasha_T
5 Replies

8. UNIX for Dummies Questions & Answers

How to find exact text in file ?

I have file named shortlist , and it contains this: 2233|charles harris |g.m. |sales |12/12/52| 90000 9876|bill johnson |director |production|03/12/50|130000 5678|robert dylan |d.g.m. |marketing |04/19/43| 85000 2365|john woodcock |director |personnel... (1 Reply)
Discussion started by: Cecko
1 Replies

9. Shell Programming and Scripting

How to find Duplicate Records in a text file

Hi all pls help me by providing soln for my problem I'm having a text file which contains duplicate records . Example: abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 abc 1000 3452 2463 2343 2176 7654 3452 8765 5643 3452 tas 3420 3562 ... (1 Reply)
Discussion started by: G.Aavudai
1 Replies

10. Shell Programming and Scripting

to check redundant file names

hi i have a very simple problem iam moving files from download to archive folder but before such a transfer want to make sure no two file of same are present in my download directory how to check for redundant file names i thought of using WC but it counts inside the file (lines and... (5 Replies)
Discussion started by: maverick
5 Replies
Login or Register to Ask a Question