Hsearch() problem when searching for strings.


 
Thread Tools Search this Thread
Top Forums Programming Hsearch() problem when searching for strings.
# 1  
Old 03-11-2016
Hsearch() problem when searching for strings.

I have written a code in C where the objective is to search for strings.

There are two files:
1. Database file which has more than one billion entries. This file is read in argv[1] in the C code below. The format of the file is like this:
Code:
a.txt apple
b.txt candle
c.txt glue

2. There is another file which has strings each in a newline. Strings from this file have to be searched in the database file. This file is read using argv[2] in the C code below. The format of this file is this:
Code:
apple
candle
glue
computer
database

The objective is to read each string present in the second file and search for that string in the database file.
This is what I have tried:
Code:
//This program will read the bigram, trigram and quadgram file generated from the Wikipedia and search for the entities from it.
//replace space with - before running this code in both files.

#define _GNU_SOURCE
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<search.h>

int main ( int argc , char **argv )
{
  FILE *wikipedia_ngram = NULL; //This is the file on which searching has to be done. We will call this the "database file".
  FILE *entities = NULL; //This file contains the list of strings to be searched.
  
  char *line = NULL;
  char *file_name = NULL;
  char *word = NULL;
  
  size_t len = 0;
  ssize_t read;
  //Defining the hash variables
  ENTRY e, *ep;
  unsigned long int lines_ngram = 0; //number of lines in the n-gram file
  unsigned long iterator = 0;
  
  wikipedia_ngram = fopen ( argv [ 1 ] , "r" );
  if ( wikipedia_ngram == NULL )
  {
    fprintf ( stderr , "Wikipedia n-gram file open error\n" );
    return ( EXIT_FAILURE );
  }
  
  entities = fopen ( argv [ 2 ] , "r" );
  if ( entities == NULL )
  {
    fprintf ( stderr , "Entities file open error\n" );
    return ( EXIT_FAILURE );
  }
  
  file_name = ( char * ) malloc ( 5000 * sizeof ( char ) );
  if ( file_name == NULL )
  {
    fprintf ( stderr , "malloc() memory allocation failure in file_name\n" );
    return ( EXIT_FAILURE );
  }
  
  word = ( char * ) malloc ( 1000 * sizeof ( char ) );
  if ( word == NULL )
  {
    fprintf ( stderr , "malloc() memory allocation failure in word\n" );
    return ( EXIT_FAILURE );
  }
    
  while ( ( read = getline ( &line , &len , wikipedia_ngram ) ) != -1 )
  {
    lines_ngram ++; //finding the number of lines in the database file
  }
  rewind ( wikipedia_ngram );
  //got the number of lines above
  //create the hash table now.
  
  hcreate ( lines_ngram ); //the code below is an adaption of an example in the hsearch() man page on LInux system
  
  for ( iterator = 0; iterator < lines_ngram; iterator++ )
  {
    fscanf ( wikipedia_ngram , "%s %s\n" , file_name , word ); //read data line by line from the database file
    e.key = word;
    e.data = (char *) file_name;
    ep = hsearch ( e , ENTER ); //create a hash table
    /* there should be no failures */
    if (ep == NULL)
    {
      fprintf(stderr, "Entry failed\n");
      exit ( EXIT_FAILURE );
    }
  }
  
  memset ( word , 0 , 1000 );
  //find the entities in the hash table.
  while ( !feof ( entities ) )
  {
    fscanf ( entities , "%s\n" , word ); //read the strings to be searched line by line
    e.key = word;
    ep = hsearch (e, FIND);
    if ( ep == NULL )
    {
      fprintf ( stderr , "ep search error\n" );
      exit ( EXIT_FAILURE );
    }
    printf ("%s %s\n" , ep->key, ( char * ) ( ep->data ) );
  }
  
  if ( line )
  {
    free ( line );
  }
  fclose ( wikipedia_ngram );
  fclose ( entities );
  free ( file_name );
  free ( word );
  hdestroy();
  return ( EXIT_SUCCESS );
}

The code above compiles on a Linux system using gcc (gcc version 4.8.2 (GCC)), but the output is this:
Code:
apple c.txt
candle c.txt
glue c.txt
ep search error

I am not able to figure out where the problem lies. I even used GDB to debug the code, but I could not locate the problem.
# 2  
Old 03-12-2016
Note that the code:
Code:
    e.key = word;
    e.data = (char *) file_name;
    ep = hsearch ( e , ENTER ); //create a hash table

has word and file_name with identical addresses for each entry you add to the hash table. And hsearch(e,ENTER) copies pointers to data into the hash table; not the data itself. The data values from earlier lines you load into the hash table are being overwritten by the data values from the last line you load into the hash table.

You need to read the entire file into memory and add entries into your hash table that point to where the individual elements of each entry are located instead of reading one line at a time from the file and overlaying each element value with the next line's value after you add the data values from a line into the hash table.
These 2 Users Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Can I combine below mentioned grep commands using OR (when searching strings having spaces)

Command 1: $script | grep 'Write to ECC( SSID=MARGIN)' Command 2: $script | grep 'is not greater than existing logical processing' The above commands run my script and search the mentioned strings but I do not want to run my script twice. It is increasing run time. Can someone tell me... (3 Replies)
Discussion started by: Tanu
3 Replies

2. Shell Programming and Scripting

Searching for a list of strings in a file with Python

Hi guys, I'm trying to search for several strings, which I have in a .txt file line by line, on another file. So the idea is, take input.txt and search for each line in that file in another file, let's call it rules.txt. So far, I've been able to do this, to search for individual strings: ... (1 Reply)
Discussion started by: starriol
1 Replies

3. Shell Programming and Scripting

Searching for string between two different strings

I want to the following results using unix script, can some one help ? Thanks Input: select col1, col2 from tab1 where ......... select col1,.....,coln from tab2, tab3 where.... select clo1,clo2,col3 from tab4 where... output (results) tab1 tab2 tab3 tab4 basically I need... (5 Replies)
Discussion started by: kanagalav
5 Replies

4. Shell Programming and Scripting

Searching for strings amongst non-uniform data

Hi Guys, I have a source file which contains significant strings amongst a lot of dross in non-uniform format, I'd like to search the input file for any examples of data from my reference file, and then output any matches to a list (text file). I've made something that achieves this, it's... (4 Replies)
Discussion started by: gazza86
4 Replies

5. Programming

Search using hsearch() Linux

Hi All, I have written a code on Linux that searches a long dictionary. I have used hsearch() function but the problem is it does not work. This is my code: //Search the count values from the dictionary. #define _GNU_SOURCE #include<stdio.h> #include<stdlib.h> #include<string.h>... (0 Replies)
Discussion started by: shoaibjameel123
0 Replies

6. Shell Programming and Scripting

Searching Problem

Hi, I would like produce follow console-printing if I searching a string (but for all hits): e.g.: Datei1HelloWorld Option -H is not possible on my unix. Thanks for help! ---------- Post updated at 02:36 AM ---------- Previous update was at 02:34 AM ---------- my actually... (9 Replies)
Discussion started by: Timmää
9 Replies

7. Shell Programming and Scripting

Need help in searching 2 files for strings then perform an action

I have 2 files. I basically want to search both of them to see if the 1st column ($1) matches and if it matches then check to see if the 2nd column ($2) matches, then execute some code showing the results of the matches. File 1: AAA 123 misc blah BBB 456 CCC 789 File 2: ... (2 Replies)
Discussion started by: streetfighter2
2 Replies

8. Solaris

higlighting strings while searching

Hello experts, i am using sun solaris 9 i try to searching string from a file using more command. I wish when i search a string it will higlight the string/strings from the file. Have any idea how to do it..? I use putty. br//purple (9 Replies)
Discussion started by: thepurple
9 Replies

9. Shell Programming and Scripting

searching for filenames with search strings in another file

Hi, I have 5 files in a directory. emp1_usage.txt emp2_usage.txt emp3_usage.txt emp4_usage.txt emp5_usage.txt I am using sqlldr to get the contents of the above 5 files and store it in a temp table and update my original table using temp table. for f in *emp*.txt do sqlldr... (3 Replies)
Discussion started by: pathanjalireddy
3 Replies

10. UNIX for Dummies Questions & Answers

searching for strings/user IP addresses

Hi, I'm trying to write a script, which will perform the following actions. Pick up the IP address of the PC I have used to telnet into the SUN server. Export this. Run the rest of my script. I am struggling with the first part, I know the IP address can be displayed by the command... (2 Replies)
Discussion started by: 30694
2 Replies
Login or Register to Ask a Question