Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Shell script for search and replace by field Post 302710877 by Don Cragun on Friday 5th of October 2012 04:39:54 AM
Old 10-05-2012
Quote:
Originally Posted by chandrath
Don,
- The field separator is pipe (|). My apologies for not putting that in the initial post.
- Yes, The reject file is supposed to contain the original contents of every line that was changed by one or more rules in the rules file while producing the output file.
- My apologies for typos, It will have 'Contains' or 'Equals' type condition no other conditions.
-Yes, "contauns" was a typo.
-As I mentioned earlier, it is safe to assume only two conditions 'Contains' or Equals'. The example input given in my earlier post has typos as you pointed out.
-The distinction between two conditions:
1. 'Equals' is used for exact string match. (example, if field1 value equals XX, then replace it with rc).
2. 'Contains' is used for pattern match (example, if field1 value contains $, replace it with <blank>, example, field1 value of rc$21 will become rc21 as '$' gets replaced with '' as field one contained '$'.
Appreciate all your help with any unix script solution to this.

---------- Post updated at 11:15 PM ---------- Previous update was at 11:02 PM ----------

Also I updated the files with field/line delimiters:
Input:
field1|field2|
rc11|rc12|
rc$21|rc#21|
XX31|yy32|
rc41|r!42|

Rules:
field|search|condition|replace|
field1|XX|Equals|rc|
field1|$|contains||
field2|#|contains||
field2|!|contains|c|
field2|yy|Equals|rc|

Output:
field1|field2|
rc11|rc12|
rc21|rc22|
rc31|rc32|
rc41|rc42|

Rejects:
field1|field2|
rc$21|rc#22|
XX31|yy32|
rc41|r!42|

Thank you!
OK. Let me try again. (And, PLEASE use code tags surrounding the contents of your input and output files.)

I do not see any difference in your sample output between condition Equals and condition Contains. If the string listed in the 2nd field in the Rules file appears in the field in the Input file field with the heading named by the first column in your Rules file, that string is replaced by the string in the 4th field in your Rules file. You said:
Quote:
-The distinction between two conditions:
1. 'Equals' is used for exact string match. (example, if field1 value equals XX, then replace it with rc).
2. 'Contains' is used for pattern match (example, if field1 value contains $, replace it with <blank>, example, field1 value of rc$21 will become rc21 as '$' gets replaced with '' as field one contained '$'.
but when field1 is XX31 (which is not equal to XX), your desired output changed the XX to rc anyway??? And you say that 'Contains' is a "pattern match", but don't define what pattern matching rules are to be used. (Is it shell pattern matching, filename pattern matching, basic regular expression matching, extended regular expression matching, or something else?) In the possible solution below, I assume that anytime the string in the 2nd field in the Rules file is found in the specified field in the Input file it will be replaced by the string in the 4th field in the Rules file. This matches the behavior shown given your Input file, Rules file, and Output file even though it doesn't match your description. Since your examples do not show any difference in the expected output between Equals and Contains, the possible solution below ignores the 3rd field in the Rules file.

You say that only Equals and Contains appear in the 3rd field in the Rules file. But, your sample Rules file 3rd field is contains on three lines and is never Contains (with an upper case C). But, since the possible solution below ignores the 3rd field in the Rules file, it doesn't make any difference.

The following produces the Output file you specify when given the Input and Rules files you specified, except for two issues:
  1. you have a <space> character at the end of the line:
    Code:
    rc11|rc12|

    in Input, but there is no space at the end of the corresponding line in the Output file, and
  2. the line in your Input file:
    Code:
    rc$21|rc#21|

    is transformed into:
    Code:
    rc21|rc#21|

    and then into:
    Code:
    rc21|rc21|

    by the rules:
    Code:
    field1|$|contains||   
        and
    field2|#|contains||

    but your Output file shows:
    Code:
    rc21|rc22|

    instead.
There are corresponding differences in what this script produces in the Reject file compared to what you said should appear in the Reject file.

Anyway, play around with the following to see how it works:
Code:
#!/bin/ksh
rejectfile="Reject"
awk -F "|" -v rejf="$rejectfile" 'BEGIN {OFS = "|"}
FNR==NR{
if(debug)printf("# rules record read: %s\n", $0)
        if(FNR == 1) next #skip Rules file header.
        ruleF[++rc] = $1
        ruleS[rc] = $2
        cnt = gsub(/./, "[[.&.]]", ruleS[rc])
        ruleC[rc] = $3
        ruleR[rc] = $4
        gsub(/\\/, "\\", ruleR[rc]);
        gsub(/[.&.]/, "\\\\&", ruleR[rc])
if(debug)printf("ruleF[%d]=%s, ruleS[%d]=%s (%d elements), ruleC[%d]=%s, ruleR[%d]=\"%s\"\n",
rc,ruleF[rc],rc,ruleS[rc],cnt,rc,ruleC[rc],rc,ruleR[rc])
        next
}
FNR==1{ # Process input file header
if(debug)printf("@ input header read:\n")
        for(i = 1; i <= NF; i++) {
                mF[$i] = i
if(debug)printf("@ mF[%s]=%d\n",$i,i)
        }
        fc = NF
        print
        print > rejf
        next
}
{       cc = 0 # of changes made to this line
        o0 = $0
if(debug)printf("@ input record read: %s\n", $0);
        for(i = 1; i <= rc; i++) {
if(debug)printf("@ f:%s(%d): s/%s/%s/\n", ruleF[i], mF[ruleF[i]], ruleS[i], ruleR[i])
                if((cnt = sub(ruleS[i], ruleR[i], $mF[ruleF[i]]))) {
                        cc += cnt
if(debug)printf("@ %s changed to \"%s\"\n", ruleF[i], $mF[ruleF[i]])
                }
        }
        if(cc) print o0 > rejf
        print
}' Rules Input > Output

Note that if you change the line:
Code:
awk -F "|" -v rejf="$rejectfile" 'BEGIN {OFS = "|"}

to:
Code:
awk -F "|" -v rejf="$rejectfile" 'BEGIN {OFS = "|"; debug = 1}

you'll get lots of debugging data in Outfile showing how it evaluates input lines, how it transforms search and replace patterns into extended regular expressions and replacement patterns, respectively, and which rules cause transformations of input fields.

If you want to make "Equals" behave as you described it (instead of as your expected Output file contents demonstrate, you just need to add ERE anchoring chracters to the start and end of "ruleS[x]" after the gsub() call converts each character to be matched into its corresponding collating symbol matching expression (which is used to avoid having "special" characters in EREs being treated specially).
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

search and replace dynamic data in a shell script

Hi, I have a file that looks something like this: ... 0,6,256,87,0,0,0,1187443420 0,6,438,37,0,0,0,1187443380 0,2,0,0,0,10,0,1197140320 0,3,0,0,0,10,0,1197140875 0,2,0,0,0,23,0,1197140332 0,3,0,0,0,23,0,1197140437 0,2,0,0,0,17,0,1197140447 0,3,0,0,0,17,0,1197140543... (8 Replies)
Discussion started by: csejl
8 Replies

2. Shell Programming and Scripting

awk search and replace field

I am writing a c++ program that has many calls of pow(input,2). I now realize that this is slowing down the program and these all should be input * input for greater speed. There should be a simple way of doing this replacement throughout my file with awk, but I am not very familiar with awk.... (2 Replies)
Discussion started by: bluejayek
2 Replies

3. Shell Programming and Scripting

Perl - search and replace a particular field

Hi, I have a file having around 30 records. Each record has 5 fields delimited by PIPE. Few records in the file having Junk characters in the field2 and field4. I found the junk charcter and I tested it and replace the junk with space with the command below perl -i -p -e "s/\x00/ /g"... (1 Reply)
Discussion started by: ramkrix
1 Replies

4. Shell Programming and Scripting

Search duplicate field and replace one of them with new value

Dear All, I have file with 4 columns: 1 AA 0 21 2 BB 0 31 3 AA 0 21 4 CC 0 41 I would like to find the duplicate record based on column 2 and replace the 4th column of the duplicate by a new value. So, the output will be: 1 AA 0 21 2 BB 0 31 3 AA 0 -21 4 CC 0 41 Any suggestions... (3 Replies)
Discussion started by: ezhil01
3 Replies

5. Shell Programming and Scripting

Search a string,get line and replace with second field

Hi, I need to search for source path in file2 , as per file1 and if found get the next line and take the field value and put it in URL value of file1. In file1, NF is not same for all the lines. file1: <type source="/home/USER/Desktop" Dest="/home/USER/DIR1/Desktop" URL="ssh/path"/> <type... (8 Replies)
Discussion started by: greet_sed
8 Replies

6. Shell Programming and Scripting

Search and replace field?

I have 2 files A.txt and B.txt A.txt 3 fields and separate by a comma some,thing,florida any1,thing1,california some2,thing2,dallas just,fun,kansas B.txt has 8 fields and separate by a comma what,ever,florida-state,,,,,, some,one,dallas_state,,,,,, You will see 3rd fields are the... (5 Replies)
Discussion started by: sabercats
5 Replies

7. Shell Programming and Scripting

awk search and replace in a targeted field instead of $0

Hi I would like to apply this gawk command: gawk '{$0=gensub(/\y+\y/,"","g"); print}' file not to the whole $0 but just to the part of $0 that is between: (a number)"> and </mrk> Is it possible? thanks for your help. (4 Replies)
Discussion started by: louisJ
4 Replies

8. Shell Programming and Scripting

Search field in text file and replace value

Hi there, First of all this is my first post here. Thank you in advance for your help. What I am trying to do is the following. I have a text file where each field of each row is separated by a tabulator. Looks like this: ATOM 1 N HSE A 26 3.033 -10.429 -2.262 1.00 17.07 ... (8 Replies)
Discussion started by: doom4
8 Replies

9. Shell Programming and Scripting

Search for a value and replace other field in the same set

Hello friends, I have huge file with many sets where each "set" has few lines and each set always begins with "Set" in Sq brackets as shown above. # cat file1 (2 Replies)
Discussion started by: magnus29
2 Replies

10. UNIX for Dummies Questions & Answers

Search and replace the last field

Hi All, Seeking for your assistance on how to search and replace the last field/column. please see sample below: inputfile1.csv ="8923523434",="543623534"="afd23535623",="100"="200" ="8923523431",="543623536"="afd23535626",="101"="201"... (3 Replies)
Discussion started by: poginiks
3 Replies
All times are GMT -4. The time now is 10:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy