Faster search needed


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Faster search needed
# 1  
Old 07-13-2012
Question Faster search needed

Hope you guys out there can help.

I have 2 files as below:

file 1:

Code:
111,222,333,444,555,666
777,888,999,000,111,222
111,222,333,444,555,888

file 2:
Code:
666,AAA
222,BBB
888,CCC

I want to get the 6th column from file 1 (example, 666) and check in file 2 for the value in the 2nd column (AAA). Then print the file2 value (AAA) at the end of file1. Results should be as below:

result:
Code:
111,222,333,444,555,666,AAA
777,888,999,000,111,222,BBB
111,222,333,444,555,888,CCC

I already have a code for this but I found it to be slow (file1 has about a million lines while file2 has about 20,000 lines). I think there should be a faster way of doing this. See below for the code I did:

Code:
for line in `cat file2`
do
  cellid=`echo $line|awk -F"," {'print $6'}`
  area=`nawk -F"," -v cellid=$cellid '{if($1==cellid) print $2}' file2`
  echo "$line,$area"  >> result.txt
done

Hope you can help.

Thanks in advance!

Last edited by Scrutinizer; 07-13-2012 at 10:25 AM.. Reason: code tags for data files
# 2  
Old 07-13-2012
Code:
nawk -F, 'FNR==NR{a[$1]=$2;next}{$(NF+1)=a[$NF]}1' OFS="," file2 file1

I haven't tested the performance though...Smilie
# 3  
Old 07-13-2012
Quote:
Originally Posted by elixir_sinari
Code:
nawk -F, 'FNR==NR{a[$1]=$2;next}{$(NF+1)=a[$NF]}1' OFS="," file2 file1

I haven't tested the performance though...Smilie
Thanks for the quick reply! Btw, I think the code could get faster if, once a match has been found, it ends the search for that value and goes immediately to the next. Would you know how to add this in? Smilie
# 4  
Old 07-13-2012
this should be pretty fast :
Code:
sort  -t, -k 6,6  -o file1 file1

Code:
sort  -t, -k 1,1  -o file2 file2

Code:
join -t, -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2 -j1 6 -j2 1 file1 file2 |sort -t, -k 7,7

# 5  
Old 07-13-2012
Quote:
Originally Posted by Klashxx
this should be pretty fast :
Code:
sort  -t, -k 6,6  -o file1 file1

Code:
sort  -t, -k 1,1  -o file2 file2

Code:
join -t, -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2 -j1 6 -j2 1 file1 file2 |sort -t, -k 7,7

the join command had no output.
# 6  
Old 07-13-2012
Quote:
Originally Posted by daytripper1021
Thanks for the quick reply! Btw, I think the code could get faster if, once a match has been found, it ends the search for that value and goes immediately to the next. Would you know how to add this in? Smilie
Have you tried it?
# 7  
Old 07-13-2012
Works fine in a HP-UX box:
Code:
# cat file1
111,222,333,444,555,666
777,888,999,000,111,222
111,222,333,444,555,888
# cat file2
666,AAA
222,BBB
888,CCC
# sort  -t, -k 6,6  -o file1 file1
# sort  -t, -k 1,1  -o file2 file2
#cat file1 file2
777,888,999,000,111,222
111,222,333,444,555,666
111,222,333,444,555,888
222,BBB
666,AAA
888,CCC
# join -t, -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2 -j1 6 -j2 1 file1 file2
777,888,999,000,111,222,BBB
111,222,333,444,555,666,AAA
111,222,333,444,555,888,CCC

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

A faster way to read and search

I have a simple script that reads in data from fileA.txt and searches line by line for that data in multiple files (*multfiles.txt). It only prints the data when there is more than 1 instance of it. The problem is that its really slow (3+ hours) to complete the entire process. There are nearly 1500... (10 Replies)
Discussion started by: ncwxpanther
10 Replies

2. Shell Programming and Scripting

Recursive folder search faster than find?

I'm trying to find folders created by a propritary data aquisition software with the .aps ending--yes, I have never encountered folder with a suffix before (some files also end in .aps) and sort them by date. I need the whole path ls -dt "$dataDir"*".aps"does exactly what I want except for the... (2 Replies)
Discussion started by: Michael Stora
2 Replies

3. UNIX for Dummies Questions & Answers

Help needed - find command for recursive search

Hi All I have a requirement to find the file that are most latest to be modified in each directory. Can somebody help with the command please? E.g of the problem. The directory A is having sub directory which are having subdirectory an so on. I need a command which will find the... (2 Replies)
Discussion started by: sudeep.id
2 Replies

4. Shell Programming and Scripting

Search for a pattern and replace. Help needed

I have three variables $a, $b and $c $a = file_abc_123.txt $b = 123 $c = 100 I want to search if $b is present in $a. If it is present, then i want to replace that portion by $c. Here $b = 123 is present in "file_abc_123.txt", so i need the output as "file_abc_100.txt' How can this be... (3 Replies)
Discussion started by: irudayaraj
3 Replies

5. Shell Programming and Scripting

search needed part in text file (awk?)

Hello! I have text file: From aaa@bbb Fri Jun 1 10:04:29 2010 --____OSPHWOJQGRPHNTTXKYGR____ Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline My code '234565'. ... (2 Replies)
Discussion started by: candyme
2 Replies

6. Shell Programming and Scripting

Help needed with basic search

hi, im trying to find the longest word in /usr/share/dict/words that does not contain the letter i. i've tried using the wc -L command like so: $ wc -L /usr/share/dict/words which basically tells me the longest word which is good but how do i get the longest word which Does not contain the... (7 Replies)
Discussion started by: tryintolearn
7 Replies

7. Shell Programming and Scripting

Printing 10 lines above and below the search string: help needed

Hi, The below code will search a particular string(say false in this case) and return me 10 lines above and below the search string in a file. " awk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r;print("***********************************");print;c=a;}b{r=$ 0}' b=10 a=10 s="false" " ... (5 Replies)
Discussion started by: vimalm22
5 Replies

8. Shell Programming and Scripting

Complex Search/Replace Multiple Files Script Needed

I have a rather complicated search and replace I need to do among several dozen files and over a hundred occurrences. My site is written in PHP and throughout the old code, you will find things like die("Operation Aborted due to....."); For my new design skins for the site, I need to get... (2 Replies)
Discussion started by: UCCCC
2 Replies

9. UNIX for Advanced & Expert Users

search a replace each line- help needed ASAP

can someone help me with the find and replace command. I have a input file which is in the below format: 0011200ALN00000000009EGYPT 000000000000199900000 0011200ALN00000000009EGYPT 000000000000199900000 0011200ALN00000000008EGYPT 000000000000199800000 0011200ALN00000000009EGYPT ... (20 Replies)
Discussion started by: bsandeep_80
20 Replies

10. UNIX for Dummies Questions & Answers

Help needed in search string

Hi , I learning shell scripting.. I need to do the following in my shell script. Search a given logfile for two\more strings. If the the two strings are found. write it to a outputfile if only one of the string is found, write the found string in one output file and other in other... (2 Replies)
Discussion started by: amitrajvarma
2 Replies
Login or Register to Ask a Question