Sponsored Content
Full Discussion: Matching by key fields
Top Forums Shell Programming and Scripting Matching by key fields Post 302282706 by drl on Sunday 1st of February 2009 12:52:03 PM
Old 02-01-2009
Hi, CB.

You might benchmark these approaches to see what is best for your situation,

I would start with the smallest practical set to see what the overhead might be, then perhaps 1, 2, 5, and 10% of the data to see how it is progressing. Many complex solutions tend not to be linear. I think both will be somewhat memory intensive if you have 300K lines in your key.dat file.

Keep us posted ... cheers, drl

PS I used fgrep to avoid any interpretation of special characters in your patterns. The sample you provided used only "|" which is recognized by egrep, but I don't know how representative your sample was.

Last edited by drl; 02-01-2009 at 01:57 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Fill the Key fields : Please help us

Hi .... we are having the below file .Column 1, Column 2 ,column 3 are key fields... In the below ...for 2 nd , 3 rd row the repeated key column is missing .... i want the new file to be populated with all missing key columns. ... (11 Replies)
Discussion started by: charandevu
11 Replies

2. Shell Programming and Scripting

matching 2 exact fields

Dear experts, I have a file1 that looks like 60127930928 2091 60129382039 2092 60126382937 2091 60128937928 2061 60127329389 2062 60123748730 2061 60128730293 2061 and file 2 that looks like 60127930928 2091 60129382039 2092 60126382937 2093 60128937928 2061 60127329389... (2 Replies)
Discussion started by: aismann
2 Replies

3. Shell Programming and Scripting

Perl function to sort a file based on key fields

Hi, I am new to PERL.I want to sort all the lines in a file based on 1,2 and 4th filelds. Can U suggest me a command/function in perl for this operation.. (5 Replies)
Discussion started by: karthikd214
5 Replies

4. Shell Programming and Scripting

awk should output if one input file doesnt have matching key

nawk -F, 'FNR==NR{a= $3 ;next} $2 in a{print $1, 'Person',$2, a}' OFS=, filea fileb Input filea Input fileb output i am getting : (2 Replies)
Discussion started by: pinnacle
2 Replies

5. Linux

matching two fields

Hi I am having 2 fields and if f1=f2 i wanna print that line eg 1 2 1 3 1 9 2 2 3 5 9 9 In the abov eg. the highlighted lines shud be printed 2 2 9 9 Thanking u (3 Replies)
Discussion started by: binnybio
3 Replies

6. Shell Programming and Scripting

Compare Fields from two text files using key columns

Hi All, I have two files to compare. Each has 10 columns with first 4 columns being key index together. The rest of the columns have monetary values. Using Perl, I want to read one file into hash; check for the key value availability in file 2; then compare the values in the rest of 6... (2 Replies)
Discussion started by: Sangtha
2 Replies

7. Shell Programming and Scripting

Rsa public private key matching

Hi All, I have a requirement where i need to check if an rsa public key corresponds to a private key and hence return success or failure. Currently i am using the command diff <( ssh-keygen -y -e -f "$PRIVKEY" ) <( ssh-keygen -y -e -f "$PUBLICKEY" ) and its solving my purpose. This is in... (1 Reply)
Discussion started by: mritusmoi
1 Replies

8. UNIX for Dummies Questions & Answers

awk - Print lines if only matching key is found

I am looking to move matching lines (01 - 07) from File1 and 77 tab the matching string from File2, to File3.txt. I am almost done but - Currently, script is not printing lines to File3.txt in order. Thanks a lot. Any help is appreciated. Script I am using: awk 'FNR == NR && ! /^]*$/ {... (9 Replies)
Discussion started by: High-T
9 Replies

9. UNIX for Dummies Questions & Answers

File updation on matching key

I have input file like Input.dat with below content RRD 0Z91YUn000000Lk 9000100001 103020151117 STMT151117155527001 0000 2 000000 000004 RRD 0Z91YUn00000ysj 9000100001 103020151117 STMT151117155527001 0000 3 000000 000003 RRD 0Z91YUn00001vGh 9000100002... (12 Replies)
Discussion started by: PRAMOD 96
12 Replies

10. UNIX for Beginners Questions & Answers

Matching 2 files based on key

Hi all I have two files I need to match record from first file and second file on column 1,8 and and output only match records on file1 File1: 020059801803180116130926800002090000800231000245204003160000000002000461OUNCE000000350000100152500BM01007W0000 ... (5 Replies)
Discussion started by: arunkumar_mca
5 Replies
GREP(1) 						      General Commands Manual							   GREP(1)

NAME
grep, egrep, fgrep - search a file for a pattern SYNOPSIS
grep [ option ] ... expression [ file ] ... egrep [ option ] ... [ expression ] [ file ] ... fgrep [ option ] ... [ strings ] [ file ] DESCRIPTION
Commands of the grep family search the input files (standard input default) for lines matching a pattern. Normally, each line found is copied to the standard output. Grep patterns are limited regular expressions in the style of ex(1); it uses a compact nondeterministic algorithm. Egrep patterns are full regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. Fgrep patterns are fixed strings; it is fast and compact. The following options are recognized. -v All lines but those matching are printed. -x (Exact) only lines matched in their entirety are printed (fgrep only). -c Only a count of matching lines is printed. -l The names of files with matching lines are listed (once) separated by newlines. -n Each line is preceded by its relative line number in the file. -b Each line is preceded by the block number on which it was found. This is sometimes useful in locating disk block numbers by con- text. -i The case of letters is ignored in making comparisons -- that is, upper and lower case are considered identical. This applies to grep and fgrep only. -s Silent mode. Nothing is printed (except error messages). This is useful for checking the error status. -w The expression is searched for as a word (as if surrounded by `<' and `>', see ex(1).) (grep only) -e expression Same as a simple expression argument, but useful when the expression begins with a -. -f file The regular expression (egrep) or string list (fgrep) is taken from the file. In all cases the file name is shown if there is more than one input file. Care should be taken when using the characters $ * [ ^ | ( ) and in the expression as they are also meaningful to the Shell. It is safest to enclose the entire expression argument in single quotes ' '. Fgrep searches for lines that contain one of the (newline-separated) strings. Egrep accepts extended regular expressions. In the following description `character' excludes newline: A followed by a single character other than newline matches that character. The character ^ matches the beginning of a line. The character $ matches the end of a line. A . (period) matches any character. A single character not otherwise endowed with special meaning matches that character. A string enclosed in brackets [] matches any single character from the string. Ranges of ASCII character codes may be abbreviated as in `a-z0-9'. A ] may occur only as the first character of the string. A literal - must be placed where it can't be mistaken as a range indicator. A regular expression followed by an * (asterisk) matches a sequence of 0 or more matches of the regular expression. A regular expression followed by a + (plus) matches a sequence of 1 or more matches of the regular expression. A regular expression followed by a ? (question mark) matches a sequence of 0 or 1 matches of the regular expression. Two regular expressions concatenated match a match of the first followed by a match of the second. Two regular expressions separated by | or newline match either a match for the first or a match for the second. A regular expression enclosed in parentheses matches a match for the regular expression. The order of precedence of operators at the same parenthesis level is [] then *+? then concatenation then | and newline. Ideally there should be only one grep, but we don't know a single algorithm that spans a wide enough range of space-time tradeoffs. SEE ALSO
ex(1), sed(1), sh(1) DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible files. BUGS
Lines are limited to 256 characters; longer lines are truncated. 4th Berkeley Distribution April 29, 1985 GREP(1)
All times are GMT -4. The time now is 05:58 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy