Indexing or Filtering code- Pattern Search by comparing two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Indexing or Filtering code- Pattern Search by comparing two files
# 1  
Old 08-01-2008
True Coding Challeng...Indexing or Filter code- Pattern Search by comparing two files

So here is goes to the Gurus of shell programming......I have tried a lot of different ways and its a very challenging code to write but i am enjoying it as i troubleshoot and hopefully someone can provide me a better option....Thank you in advance for your time and support....Much appreciated...

i have two files
Log file
Keyword file

Logfiles varies in page numbers and has a lot of logs. This logs files on the bottom of each page contains a page number field like (page 10) etc.

I need to write a script (filter) where i would like to read both files and get an output files........

I like to update keyword file at any time to provide search criteria for log file..

So logs file containing thousands of words and i am extracting some words which are listed in my keyword file and placing those words in my output files alphabetically as well as providing the page number where they have occurrences in log file.

I would also like to have my first line of the output file as "Filtered Output Results".....as a heading for this output file...(so something similar to the index page that we have at the end of the book).

Like to get Sed, Awk and grep script examples as i am very interested how can i make this happen in all three.....its really cool and exciting when you get your desired result.....

I have serach several example in this forum and tried to write a script but i am not successful yet......tried grep , sed and awk as follows

Some of my tries......

#!/bin/bash
Keyword='/home/aavam/keyword'
Data='/home/aavam/data'
Output='C:/cygwin/home/aavam/output'
grep -f $Keyword $Data > $Output

#/usr/xpg4/bin/grep -f $Keyword $Data > $Output
#page=`echo $LINE | awk -F= '{print $NF}'`
#/usr/xpg4/bin/grep -Ff $Keyword $Data > $Output | page=`echo $LINE | awk -F= '{print $NF}'`
#/usr/xpg4/bin/grep -f $Keyword $Data > $Output
#/usr/xpg4/bin/grep -Ff $Keyword $Data > $Output | page=`echo $LINE | awk -F= '{print $NF}'`

#sed -f keyword.sed /export/home/aavam/shell-prog/data ---->gives error that keyword.sed not known

/export/home/aavam/outputv3

#for name in 'cat keyword'
#do
#grep $name data
#done > output
#awk -f compare.awk keyword data ---->gives error compare.awk not known etc

Thank you again

Last edited by aavam; 08-01-2008 at 12:09 PM..
# 2  
Old 08-01-2008
A possible solution using awk.
Script (aavam.sh)
Code:
sort aavam.key |
awk '
BEGIN {
   print "Filtered Output Results";
}
NR==FNR {
   wlist[++wcount] = $1;
   words[$1] = 0;
   pages[$1] = "";
    last[$1] = 0;
   if (length($1) > wlen)
      wlen = length($1);
   next;
}
NF>=2 && $(NF-1)=="Page" {
   Page = $NF
}
{
   for (i=1; i<=NF; i++) {
      if ($i in words)   {
         if (last[$i] != Page) {
            pages[$i] = pages[$i] Page " ";
             last[$i] = Page;
         }
      }
  }
}
END {
   for (i=1; i<=wcount; i++) {
      w = wlist[i];
      printf("%-" wlen "s : %s\n", w, pages[w])
   }
}

' - aavam.log

Keyword file (aavam.key)
Code:
bugs
output
for
full
and

Input log file (aavam.log)
Code:
----------- Page 1
LS(1)                            User Commands                           LS(1)



NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort.

       Mandatory arguments to long options are  mandatory  for  short  options
       too.

       -a, --all
              do not ignore entries starting with .
----------- Page 2

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print octal escapes for nongraphic characters

       --block-size=SIZE
              use SIZE-byte blocks

       -B, --ignore-backups
              do not list implied entries ending with ~

       -c     with -lt: sort by, and show, ctime (time of last modification of
              file status information) with -l: show ctime and  sort  by  name
              otherwise: sort by ctime
----------- Page 3
 . . . . .
----------- Page 12

COPYRIGHT
       Copyright (C) 2007 Free Software Foundation, Inc.
       This is free software.  You may redistribute copies  of  it  under  the
       terms       of       the      GNU      General      Public      License
       <http://www.gnu.org/licenses/gpl.html>.  There is NO WARRANTY,  to  the
       extent permitted by law.

SEE ALSO
       The  full  documentation  for ls is maintained as a Texinfo manual.  If
       the info and ls programs are properly installed at your site, the  com-
       mand

              info ls

       should give you access to the complete manual.



----------- Page 13
GNU coreutils 6.9                 March 2007                             LS(1)

Output:
Code:
Filtered Output Results
and    : 2 3 6 7 9 10 11 12
bugs   : 11
for    : 1 2 3 6 8 11 12
full   : 12
output : 3 7 10 11

Jean-Pierre.
# 3  
Old 08-01-2008
Thank you Jean-Pierre....

Jean-Pierre.... i have got to bow down......Thank you very much.....i haven't tried it yet but you have got it........This could have been an exemplary code for all of different situations..... Thank you

Cristian Smith

PS: If not much to ask and at your convenience can you also email me any explanation of this code as well. Thanks
# 4  
Old 08-01-2008
Anyone for a Sed solution?
# 5  
Old 08-02-2008
off by one page number

my page numbers are at the bottom of each page .... therefore when i run the code .. i am off by one page number...for eg keyword is on page 1 but out shows as page 2......Any suggestions....I am working at my end here as well... Cheers

C.Smith
# 6  
Old 08-02-2008
Please, show us your code, input data (extract), script output and required output.

Jean-Pierre.
# 7  
Old 08-02-2008
everything is same for now ...Except

Page 1 is not first line in my log but instead the last line of page and so on..

For Eg: your Log file looks like as follows

*********************************
----------- Page 1
LS(1) User Commands LS(1)

NAME
ls - list directory contents

SYNOPSIS
-c with -lt: sort by, and show, ctime (time of last modification of
file status information) with -l: show ctime and sort by name
otherwise: sort by ctime
----------- Page 3
. . . . .
----------- Page 12


Mine Looks as follows

*******************************************
LS(1) User Commands LS(1)

NAME
ls - list directory contents

SYNOPSIS
-c with -lt: sort by, and show, ctime (time of last modification of
file status information) with -l: show ctime and sort by name
otherwise: sort by ctime
----------- Page 1 ---------------------->page number at the end of page
...........
----------- Page 3
. . . . .
----------- Page 12



Any help would be appreciated. Thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep/awk using a begin search pattern and end search pattern

I have this fileA TEST FILE ABC this file contains ABC; TEST FILE DGHT this file contains DGHT; TEST FILE 123 this file contains ABC, this file contains DEF, this file contains XYZ, this file contains KLM ; I want to have a fileZ that has only (begin search pattern for will be... (2 Replies)
Discussion started by: vbabz
2 Replies

2. Shell Programming and Scripting

Pattern search multiple files

#!/usr/bin/ksh a="Run successfully" cd $APPS ls -l *.txt | while read $txt do if then cp $APPS/$txt cp $hist/$txt else rm $APPS/$txt echo "Files has been removed" fi done New in shell script please help me out Around 100 txt files in $APPS dir i want to search pattern from... (8 Replies)
Discussion started by: Kalia
8 Replies

3. Shell Programming and Scripting

Search pattern in today's files only

Hi Friends, I am in search of unix command which can search a particular pattern in all files which are created/modified today ONLY. Which is the best way to achieve this? Thanks in advance. (1 Reply)
Discussion started by: Nakul_sh
1 Replies

4. Shell Programming and Scripting

To search the pattern on the basis of date and exit code

Hi, I am getting scheduler log file on daily basis from windows box which contains job status and corresponding date, date is in windows format. I wanted to write one script which will search the pattern (Exit code) for the today's date and if code is Zero then Job Success message should be... (14 Replies)
Discussion started by: ajju
14 Replies

5. Shell Programming and Scripting

Indexing each repeating pattern of rows in a column using awk/sed

Hello All, I have data like this in a column. 0 1 2 3 0 3 4 5 6 0 1 2 3 etc. where 0 identifies the start of a pattern in my data. So I need the output like below using either awk/sed. 0 1 (2 Replies)
Discussion started by: ks_reddy
2 Replies

6. Shell Programming and Scripting

[ask]filtering file to indexing...

dear all, i have file with format like this file_master.txt 20110212|231213|rio|apri|23112|222222 20110212|312311|jaka|dino|31223|543234 20110301|343322|alfan|budi|32131|333311 ... i want filter with output like this index_nm.txt rio|apri jaka|dino ... index_years.txt 20110212... (7 Replies)
Discussion started by: zvtral
7 Replies

7. Shell Programming and Scripting

awk script issue : comparing two files with a pattern

File 1 ################################################################# pma.zcal.iop_pma_zcal_cntl (2710.080 115.200) pma.lanea23.rx0.cntl (696.960 844.800) pma.lanea67.rx0.cntl (1733.760 844.800) pma.zcal.iop_pma_zcal_cust (2280.960 115.200)... (1 Reply)
Discussion started by: jaita
1 Replies

8. UNIX for Advanced & Expert Users

pattern search between 2 files

Afternoon guys, I have 2 files, 1.txt and 2.txt containing employee numbers. the 1st file (1.txt) is an extract from sybase with active employee numbers, the 2nd (2.txt) is a scan from the sybase log containing successfull logins *** which i have already mined and now contains only employee... (5 Replies)
Discussion started by: Jefferson333
5 Replies

9. UNIX for Dummies Questions & Answers

search all files for a pattern

Hi there, I am looking for a shell script which recursively searches all the files under all the directories for a pattern specified in the script. For e.g., i am looking forward to search for the file names which contains numbers of the form 001*****. Thanks in advance, Naik (4 Replies)
Discussion started by: ynaik002
4 Replies
Login or Register to Ask a Question