Advance search using sed/awk/perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Advance search using sed/awk/perl
# 1  
Old 06-22-2011
Advance search using sed/awk/perl

Hi,

I have a file with more than 50,000 lines of records and each record is 50 bytes in length.

I need to search every record in this file between positions 11-19 (9 bytes) and 32-40 (9 bytes) and in case any of the above 2 fields is alpha-numeric, i need to replace the whole 9 bytes of that field by a default numeric value(say, 888888888).

For example, say the input file looks like:
Code:
D00000000236778767878     745454545456785.7USA8762
D000000001SMF46567878     458795477876785.7RSA9763
D00000000223684589878     11254DUT7876785.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009785DOT67878     66589455DOS6785.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005423SOW67878     9685658TSB76785.7AUS8765
D000000008TUR59767878     55425TUR5976785.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

After the search and replace, the output file should look like:
Code:
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     1125S8888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

Also, i need to have a log of all the values which I am replacing by using this script.

I tried using awk, sed and perl scripts but could not get the desired output.

Once of the example code I tried using sed but it did not work:
Code:
sed "s/^\(.\{10\}\}\)\([a-zA-Z]*\).*/\1888888888/"

Any help regarding this will be much appreciated. Thanks in advance

Last edited by kikionline; 06-22-2011 at 11:46 PM..
# 2  
Old 06-23-2011
I do not know sed. Maybe my awk is much, but this works. variable 'd' you can place your 8's but I tested with _'s

Code:
mute@geek:~/test$ awk -v FS='' -v d=_________ 'substr($0,11,9) ~ /[A-Za-z]/ { $0=substr($0,1,10) d substr($0,20) } substr($0,32,9) ~ /[A-Za-z]/ { $0=substr($0,1,31) d substr($0,41) }1' file.txt
D00000000236778767878     745454545456785.7USA8762
D000000001_________78     458795477876785.7RSA9763
D00000000223684589878     11254_________5.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009_________78     66589_________5.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005_________78     96856_________5.7AUS8765
D000000008_________78     55425_________5.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

# 3  
Old 06-23-2011
Code:
% RE='[A-Z]' DEFAULT=888888888
% perl -wlpe \
'@F = /(.{10})(.{9})(.{12})(.{9})(.*)/;
$F[1] =~ /'$RE'/ and $F[1] = '$DEFAULT';
$F[3] =~ /'$RE'/ and $F[3] = '$DEFAULT';
$_ = join "", @F;
' testfile
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965


Last edited by yazu; 06-23-2011 at 12:45 AM..
# 4  
Old 06-23-2011
Thanks Neutronscott and Yazu for the quick response!

Really appreciate your help.
# 5  
Old 06-23-2011
Code:
# sed 's/\(.\{10\}\)[A-Z]\{3\}[0-9]\{6\}\(.\{12\}\(.\{9\}\).\{10\}\)/\1888888888\2/
s/\(.\{10\}\)[0-9]\{3\}[A-Z]\{3\}[0-9]\{3\}\(.\{12\}\(.\{9\}\).\{10\}\)/\1888888888\2/
s/\(.\{31\}\)[A-Z]\{3\}[0-9]\{6\}\(.\{10\}\)/\1888888888\2/
s/\(.\{31\}\)[0-9]\{2,3\}[A-Z]\{3\}[0-9]\{3,4\}\(.\{10\}\)/\1888888888\2/' file >newfile ; more newfile
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458795477876785.7RSA9763
D00000000223684589878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965

regards
ygemici
# 6  
Old 06-23-2011
Alternate sed..
Code:
sed '/^[A-Z][0-9]*  *.*/!s/\(.\{10\}\)\(.\{9\}\)\(..\)  *\(.*\)$/\1*********\3     \4/ ; /^[^ ]*  *.....[0-9]\{9\}.*$/!s/\([^ ]*\)  *\(.....\)\(.\{9\}\)\(.*\)$/\1     \2*********\4/' inputfile

# 7  
Old 06-23-2011
And alternate Perl:

Code:
$
$ # display the contents of the data file
$ cat data
D00000000236778767878     745454545456785.7USA8762
D000000001SMF46567878     458795477876785.7RSA9763
D00000000223684589878     11254DUT7876785.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D000000009785DOT67878     66589455DOS6785.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D000000005423SOW67878     9685658TSB76785.7AUS8765
D000000008TUR59767878     55425TUR5976785.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965
$
$ # display the contents of the Perl program
$ cat -n process.pl
     1  #!perl -w
     2  $old = "data";     # the data file that we begin with
     3  $new = "data.new"; # temporary data file
     4  $log = "log";      # log file to record only those records that will be modified
     5
     6  $DEFAULT = "888888888";  # default replacement value
     7
     8  open (OLD, "<", $old) or die "Can't open $old for reading: $!";
     9  open (NEW, ">", $new) or die "Can't open $new for writing: $!";
    10  open (LOG, ">", $log) or die "Can't open $log for writing: $!";
    11  while (<OLD>) {
    12    if (substr($_,10,9) =~ !/\D+/ or substr($_,31,9) =~ !/\D+/) {
    13      # log the record, modify it, write to temp file
    14      print LOG "$.\t$_";
    15      substr($_,10,9) = substr($_,31,9) = $DEFAULT;
    16      print NEW $_;
    17    } else {
    18      # simply write to temp file
    19      print NEW $_;
    20    }
    21  }
    22  close (OLD) or die "Can't close $old: $!";
    23  close (NEW) or die "Can't close $new: $!";
    24  close (LOG) or die "Can't close $log: $!";
    25
    26  # rename NEW file to OLD file, effectively overwriting the old file
    27  rename($new, $old) or die "can't rename $new to $old: $!";
$
$ # Run the program
$ perl process.pl
$
$ # Check the modified data file
$ cat data
D00000000236778767878     745454545456785.7USA8762
D00000000188888888878     458798888888885.7RSA9763
D00000000288888888878     112548888888885.7IND8762
D00000000898454667878     788987984876785.7FRA4765
D00000000214569787836     325455454576785.7ENG4568
D00000000236778767878     564878754876785.7ZIM8766
D00000000988888888878     665898888888885.7KEN8963
D00000000236548767878     225456687876785.7PAK8761
D00000000998651233878     878965454876785.7BRA8764
D00000000588888888878     968568888888885.7AUS8765
D00000000888888888878     554258888888885.7ARG4669
D00000000412563767878     225635445256785.7EGP8766
D00000000779654267878     332256367876785.7GER8965
$
$ # Check the log file
$ cat log
2       D000000001SMF46567878     458795477876785.7RSA9763
3       D00000000223684589878     11254DUT7876785.7IND8762
7       D000000009785DOT67878     66589455DOS6785.7KEN8963
10      D000000005423SOW67878     9685658TSB76785.7AUS8765
11      D000000008TUR59767878     55425TUR5976785.7ARG4669
$
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Another sed/awk search=>replace question

Hello, Need a little bit of help. Basically I need to replace lines in a file which were calculated wrong as it would 12 hours to regenerate the data. I need to calculate values based on other files which I've managed to figure out with grep/cut but now am stuck on how to shove these new... (21 Replies)
Discussion started by: f77coder
21 Replies

2. Shell Programming and Scripting

Search strings and highlight them using Perl or bash/awk/sed

Hi, I have two files: a.doc and b.txt I wish to search the strings from file b.txt in a.doc and want to highlight them in a.doc with different colours using Perl or bash./awk/sed? Please guide me. :) Thanks!!!!! (10 Replies)
Discussion started by: bioinfo
10 Replies

3. Shell Programming and Scripting

Search and replace is not working by sed or awk

Hi , I have one file and in this file i have one like TEST1 KEY0=AAC040R1;AAC041R1ISE;AAC041R2ISE;AAC370R1;ADR0500;ADR0600;AME245R1;AME245R2;BAP0135;BAP0300;PPINVDTD*;PPJERPTD*;PPJERPT*;PRBSUMM*;: i want to replace this line with the following line TEST1... (4 Replies)
Discussion started by: ashissau
4 Replies

4. Shell Programming and Scripting

AWK/SED line based search

Hi, I have a file with values like this 1 11 2 11 3 44 4 55 5 66 (an representative of what I have). I want to split this file into smaller files based on column 1 values (values within a range). The issue that I am facing is that the file is really big, and takes too long to... (21 Replies)
Discussion started by: new_one
21 Replies

5. UNIX for Dummies Questions & Answers

awk search in sed

Hi I search CSV (deliminited by ,) for the string "test" in the second column and if matches print the hole line. awk -F, '{ if($2=="\"test\"") print $0} ' test.csvHowto to the same in sed? I heared that sed is faster. Test.csv is 121 MB. (1 Reply)
Discussion started by: slashdotweenie
1 Replies

6. Shell Programming and Scripting

awk/sed string search and replace

Need help with either sed or awk to acheive the following file1 ----- In the amazon forest The bats eat all the time... mon tue wed they would eat berries In the tropical forest The bats eat all the time... on wed bats eat nuts In the rain forest The bats eat all the time... on... (2 Replies)
Discussion started by: jville
2 Replies

7. Shell Programming and Scripting

Help with sed/awk for reverse search and print

I have a file which is DFDG START DSFDS DSDS XXX END (VIO) AADD START SDSD FGFG END and I have to print the lines between START and END (VIO). In the files there are multiple places where START would be followed by END with few lines in between but I need to print only if START is... (18 Replies)
Discussion started by: pgbuddy
18 Replies

8. Shell Programming and Scripting

Multiline pattern search using sed or awk

Hi friends, Could you please help me to resolve the below issue. Input file :- <Node> <username>abc</username> <password>ABC</password> <Node> <Node> <username>xyz</username> <password>XYZ</password> <Node> <Node> <username>mnp</username> ... (3 Replies)
Discussion started by: haiksuresh
3 Replies

9. Shell Programming and Scripting

search and replace with restriction (awk, sed)

Hello, i have a file like that foo A new bar A new bar B new I need to replace 'new' with 'done', but only in lines containing 'bar' AND 'A'. output file should then become foo A new bar A done bar B new Sorry im not able to figure it out, not even shure if i should take sed.... (10 Replies)
Discussion started by: knoxo
10 Replies

10. Shell Programming and Scripting

Advance string pattern search Please

Here is my problem.. 1. I want to search all those files with file name starting AJ128**** (in all the sub directories also) 2. I want to search for the follwoing type of string line beging with string - 'AK*any_1_char*any_2_char*510' 3. I need to display list of file names... (2 Replies)
Discussion started by: sainj
2 Replies
Login or Register to Ask a Question