Greping frequency for list of phrases is a separate file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Greping frequency for list of phrases is a separate file
# 1  
Old 02-19-2014
Question Greping frequency for list of phrases is a separate file

Dear All I have a big set of data which I would like to summerize to have a better sense of it
the problem is like this ...
I have more than 200 files each related to a person which includes different sentences with specific elements (files in a directory)
e.g. mark (one file in the directory)
Code:
I like reading books
I have five books
I love flowers
I am not allergic to flowers (list.txt)
...

the I have a file with 200 or more phrases like
Code:
books
flowers
house pet
cooking skills
...

now I want to create a file that will create
Code:
.......|books|flowers|house pets| ...
mark |   2   |   2     | 0      |....
john  |   5   |   0     |  2     |...

Can someone help me please ?

I have tried this
Code:
mkdir result
FFILES="People/*"
for U in $FFILES
do
	docU=$(basename $U)
	docpathU=$(dirname $U)
	grep  -o -f accept-list.txt $U | sort | uniq -c | awk 'BEGIN{FS=" ";}{print $2,","$1}' > result/${docU}
done

but this has few problems i dont know how to address
1. it those not work for words such as "house pet" and i have many lines which are phrases
so I need to get frequency count by line not by words
2. dont know how I can summerise all into the desired structure which is more combined and horizontal

many thanks for the help
A-V
# 2  
Old 02-19-2014
Hi, I did some modification with your code.
Now it may works as you want, but may not all of the columns could format print beautifully.
And I must go to sleep now~
Good night.
Code:
mkdir result
FFILES="People/*"
awk 'BEGIN{printf "PeopleName"}{printf "|"$0}END{printf "\n"}' accept-list.txt >result/total_result #added
for U in $FFILES
do
 docU=$(basename $U)
 docpathU=$(dirname $U)
# grep  -o -f accept-list.txt  $U | sort | uniq -c | awk 'BEGIN{FS=" ";}{print $2,","$1}' > result/${docU} #comment
 grep -o -f accept-list.txt  $U|sort|uniq -c|awk '{s=$1;$1="";print $0","s}'|sed 's/^[[:blank:]]*//' > result/${docU} #modified
 awk -F',' -vn=${docU} 'NR==FNR{a[i++]=$0;next}{b[$1]=$2}END{printf n;for(i=0;i<length(a);i++){if(a[i] in b){printf "|"b[a[i]]}else{printf "|\t"}}printf "\n"}' accept-list.txt  result/${docU}>>result/total_result #added
done

This User Gave Thanks to Lucas_0418 For This Post:
# 3  
Old 02-19-2014
An awk approach:
Code:
awk '
        NR == FNR {
                P[$0]
                next
        }
        {
                if ( ! ( FILENAME in F ) )
                        F[FILENAME]

                for ( k in P )
                {
                        if ( $0 ~ k )
                        {
                                R[FILENAME FS k]++
                        }
                }
        }
        END {
                printf "\t"
                for ( k in P )
                        printf "%s\t", k
                printf "\n"

                for ( j in F )
                {
                        printf "%s\t", j
                        for ( k in P )
                                printf "%s\t\t", (R[j FS k] ? R[j FS k] : 0)
                        printf "\n"
                }
        }
' phrases mark john

Input:
Code:
$ cat phrases
books
flowers
house pet
cooking skills

$ cat mark
I like reading books
I have five books
I love flowers
I am not allergic to flowers (list.txt)

$ cat john
I love house pets
I collect books

Output:
Code:
        books   cooking skills  house pet       flowers
john    1               0               1               0
mark    2               0               0               2

This User Gave Thanks to Yoda For This Post:
# 4  
Old 02-19-2014
Lucas, thanks a lot for the help ... i managed to make it work even without the first like and it seems to be ok
i replaces "|" and "|\t" with "," and "0" and saved as CSV file
now I am only wondering whether I can have the phrases as the header for the file ?
cheers

---------- Post updated at 01:36 PM ---------- Previous update was at 12:43 PM ----------

Dear Yoda,
there are few things I am not sure how to handle === the code is bit complicated for me to understand

1. can I remove the count of words exactly after the people's name? --- and have the count of the words in that row at the end of it?
2. i can feed the results into one file but then the headers keep reappearing
Code:
' accept-all.txt $U 
done > result-here.csv

3. otherwise I can create a loop around it and
Code:
' accept-all.txt $U > result/${docU}
done

I have replaces the tabs with ", " and made a CSV out of it

Last edited by A-V; 02-19-2014 at 03:40 PM..
# 5  
Old 02-19-2014
The program will not work if you feed one input file at a time using a for loop.

You could pass them all at once:
Code:
awk '
        -- code --
' accept-all.txt People/*

This User Gave Thanks to Yoda For This Post:
# 6  
Old 02-19-2014
thats what I have done at the end
but still did not manage to figure out how to delete the count number next to the name or add the general frequency count of existing words in the row at the end
# 7  
Old 02-19-2014
I'm sorry, I didn't get what you are asking. Post what you got and what is expected.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. What is on Your Mind?

Custom Fonts for Usernames, Buttons, Other Short Phrases

I'm thinking to replace many of the buttons like "Reply" or "Thank You" with custom, modern, space age'e looking fonts. When I was game-engine programming in Unity last year, I used fonts like "Terminator" and "Neuropol" and "Space Age" and "Jupiter". At the moment, I cannot get "Neuropol" to... (2 Replies)
Discussion started by: Neo
2 Replies

2. Shell Programming and Scripting

Partial content greping into a 3rd file

Hi, I do have couple of files in folder. The names of each of the files have a pattern. B_A17_A17_1T.txt B_A17_A17_2T.txt B_A17_A17_3T.txt B_A17_A17_7T.txt ..... ..... B_A17_A17_45T.txt Each of the above files have the same pattern of data with 4 columns and have an header for... (10 Replies)
Discussion started by: Kanja
10 Replies

3. Shell Programming and Scripting

Greping values from a text file

Hi All, I have 100's of files in the following format. I need to grep or parse out some values from each of the files {  “tree”: “((A:0.2{0},B:0.09{1}):0.7{2},C:0.5{3}){​4};”,  “placements”:  , ], “n”: },   {“p”: ], “n”: } ],  “metadata”:  {“invocation”:   “pplacer -c... (1 Reply)
Discussion started by: Lucky Ali
1 Replies

4. Shell Programming and Scripting

Need Help with greping two line from a file Pls help ASAP

Hi all - I''m in a little bit of jam - If you can please help I have a file that has the following content ( please see below) I need to read the file and then get this result in this format ------------- To put out in this format name: sophis Total: 22328 name: tca ... (2 Replies)
Discussion started by: mnassiri
2 Replies

5. Shell Programming and Scripting

greping $2 into a list

Hi When I run this command: lsuser -a auditclasses ALL I got: user1 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user2 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user3 auditclasses=general,objects,cron,files,rbac,audit,lvm,aixpert user4... (7 Replies)
Discussion started by: iga3725
7 Replies

6. Shell Programming and Scripting

Matching a string (zip code) from a list in a separate file

I have a list of postal addresses and I need to pull the records that match a list of zip codes in a separate file. The postal addresses are fixed width. The zip code is located in character position 149-157. Something better than: cat postalfile.txt | grep -f zipcodes.txt would be great. $... (8 Replies)
Discussion started by: sitney
8 Replies

7. AIX

Filesystem /phrases 100% full because of some phrases*.icp files

This morning I see on one of our monitors that we have a server with the filesystem /phrases 100% full. Looking at it, I see 4 phrases*.icp files which are big and were created around midnight. I asked the previous operator what was running at that time (batch processing transaction files) and from... (0 Replies)
Discussion started by: Browser_ice
0 Replies

8. Shell Programming and Scripting

Greping columns data from file.

Hi i am using shell script which perform oracle database query and after that output is redirect to some temporary file. the output of this file looks like SQL*Plus: Release 10.2.0.2.0 - Production on Tue Aug 5 16:08:06 2008 Copyright (c) 1982, 2005, Oracle. All Rights Reserved. ... (6 Replies)
Discussion started by: esungoe
6 Replies

9. Shell Programming and Scripting

problem in greping the string from file using awk

I am using awk command for greping an value from the file the file contains .. file ---------------------------- content----------- -------- String main = "81507066666"; ------------------------------ i am greping the above value using awk command NumberToReplace=`cat "file" | grep... (1 Reply)
Discussion started by: vastare
1 Replies

10. Shell Programming and Scripting

greping date in a file

i have a script that checks inside the file and find the start date and end date with time........... #!/bin/ksh cd /ednadtu3/u01/pipe/logs TZ=`date +%Z`+24 ;b=`date +%Y-%m-%d` echo $b for i in DBMaint.log do echo "Start Time:" >> /ednadtu3/u01/pipe/naveed/Report.txt cat $i | grep... (3 Replies)
Discussion started by: ali560045
3 Replies
Login or Register to Ask a Question