Finding consecutive same words in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding consecutive same words in a file
# 1  
Old 07-20-2011
Finding consecutive same words in a file

Hi All,

I tried this but I am having trouble formulating this:
I have a file that looks like this (this is a sample file words can be different):

Code:
network
router
frame
network
router
computer
card
host
computer
card

One can see that in this file "network" and "router" occur together two times and also "computer" and "card" two times. I want to find the count of those words in the file that occur together in the same file taking two words at a time. I expect my output to be like this:
Code:
1
1
1
2
1
1
1
1
2

The output above means that "network" and "router" have occurred 1 time in the first occurrence. Then "router" and "frame" occur 1 time. Then "frame" and
"network" occurs 1 time. Then again we encounter "network", "router" and now this count becomes 2. And we keep on doing this for the rest of the file.

This is what I tried, but the problem is I am supplying the words manually. Moreover, I have more than one file all with .dat extension. One can see that I am reading .dat files and storing the result in .txt files.:

Code:
ls -1 *.dat | while read page
do
cat $page | grep "$network[[:blank:]]*$router" $page>$page.txt
done


I am using BASH in Linux.
# 2  
Old 07-20-2011
Something like this?
Code:
ls *.dat | while read page
do
  awk '{print ++a[$0]}' $page > $page.txt
done

This User Gave Thanks to Franklin52 For This Post:
# 3  
Old 07-20-2011
Thanks. But what I am taking into account are two words at a time rather than one. That's why in the original file you can see 10 words and in the result file 9 numbers. So, this is how things should work actually:
1. Take "network" and "router" together. But now count of "network" and "router" is 1.
2. Go to "router" and "frame" again count is one as previous two words were "network" and "router". Hence, don't match with this current two words.
3. Take "frame" and "network", again count is 1
4. Take "network" and "router" but now count of "network" and "router" is 2 as I have already seen "network" and "router" before.
5. In this way till the file ends.
# 4  
Old 07-20-2011
Code:
ls *.dat | while read page
do
  awk '!p{p=$0; next} {if(a[$0]==p) {s="2"} else {a[$0]=p; s="1"} p=$0; print s}' $page > $page.txt
done

This User Gave Thanks to Franklin52 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Finding the same pattern in three consecutive lines in several files in a directory

I know how to search for a pattern/regular expression in many files that I have in a directory. For example, by doing this: grep -Ril "News/U.S." . I can find which files contain the pattern "News/U.S." in a directory. I am unable to accomplish about how to extend this code so that it can... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

2. Shell Programming and Scripting

Get group of consecutive uppercase words using gawk

Hi I'd like to extract, from a text file, the strings starting with "The Thing" and only composed of words with a capital first letter and apostrophes, like for example: "The Thing I Only" from "those are the The Thing I Only go for whatever." or "The Thing That Are Like Men's Eyewear" ... (7 Replies)
Discussion started by: louisJ
7 Replies

3. Shell Programming and Scripting

Finding consecutive maxima and recording them

Hello, I have a file with two columns (I uploaded it because it is some 500K): File-Upload.net - data.dat If you plot the data with, say, gnuplot, plot 'data.dat' u 1:2 w l you will see that there are jumps. This is actually an orbit and a maximum corresponds to an apocenter and a... (2 Replies)
Discussion started by: pau
2 Replies

4. Shell Programming and Scripting

Finding my lost file by searching for words in it

Got a question for you guys...I am searching through a public directory (that has tons of files) trying to find a file that I was working on a longggggg time ago. I can't remember what it is called, but I do remember the content. It should contains words like this: Joe Pulvo botnet zeus... (5 Replies)
Discussion started by: statichazard
5 Replies

5. Shell Programming and Scripting

finding and removing 2 identical consecutive words in a text

i want to write a shell script that correct a text file.for example if i have the input file: "john has has 2 apples anne has 3 oranges oranges" i want that the output file be like this: "john has 2 apples anne has 3 oranges" i've tried to read line by line from input text file into array... (11 Replies)
Discussion started by: cocostaec
11 Replies

6. Shell Programming and Scripting

Finding consecutive numbers in version names on a txt file

Hi all. I have a directory which contains files that can be versioned. All the files are named according to a pattern like this: TEXTSTRING1-001.EXTENSION TEXTSTRING2-001.EXTENSION TEXTSTRING3-001.EXTENSION ... TEXTSTRINGn-001.EXTENSION If a file is versioned, a file called ... (10 Replies)
Discussion started by: fox1212
10 Replies

7. Shell Programming and Scripting

Finding the number of unique words in a file

find the number of unique words in a file using sort com- mand. (7 Replies)
Discussion started by: abhikamune
7 Replies

8. Shell Programming and Scripting

Finding the most frequently occurring set of words

Hi guys, I have a file with a list of phoneme for words, it looks like this: AILS EY1 L Z AIMLESSLY EY1 M L AH0 S L IY0 AIMONE EY1 M OW2 N AIMS EY1 M Z AINGE EY1 NG AINGE(2) EY1 N JH AINLEY EY1 N L IY0 AINSLIE EY1 N Z L IY0 AIR EH1 R AIRBAGS EH1 R B AE2 G Z and I need to... (5 Replies)
Discussion started by: Andrew9191
5 Replies

9. UNIX for Dummies Questions & Answers

finding no of counts the words occured

hi, cud u help me to find this. i hav 2 files. file1 has data as "ARUN ARUN is from Australia Arun likes America etc.. ARUN ARUN " file2 has "ARUN Australia America" i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies

10. Shell Programming and Scripting

how to find capital letter names in a file without finding words at start of sentence

Hi, I want to be able to list all the names in a file which begin with a capital letter, but I don't want it to list words that begin a new sentence. Is there any way round this? Thanks for your help. (1 Reply)
Discussion started by: kev269
1 Replies
Login or Register to Ask a Question