Help with replace character based on specific location


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with replace character based on specific location
# 1  
Old 03-11-2011
Help with replace character based on specific location

Hi,

I got long list of reference file (column one is refer to the header in input file; column 2 is info of start position in input file; column 3 is info of end position in input fileSmilie shown as below:
Code:
read_2 10 15
read_3 5 8
read_1 4 10
.
.
.

Input file (huge file with total file size more than 1GB):
Code:
++read_1
AFSFAFHAKSJFHAKSJFHAKJSFHAFHASJKFHAS
++read_2
ASDFASJFOASUFRIEUIEWOUIEOWUIOEWUIEWFSFSJDHGJDSKHGSKDJHG
++read_3
UTWETEWTWETEWTEWTEF
.
.

Desired output:
Code:
++read_1
AFSXXXXXXXJFHAKSJFHAKJSFHAFHASJKFHAS
++read_2
ASDFASJFOXXXXXXEUIEWOUIEOWUIOEWUIEWFSFSJDHGJDSKHGSKDJHG
++read_3
UTWEXXXXWETEWTEWTEF
.
.

I would like to replace the character in input file with "X" based on the header description (colomn 1), start position to replace (column 2), end position to replace (column) in reference file.
My input file might be quite huge (>1GB)
Thanks for any advice.
# 2  
Old 03-11-2011
Code:
awk 'NR==FNR{a["++"$1]=$2;b["++"$1]=$3}
NR>FNR&&a[RS$1]{print RS$1"\n"substr($2,1,a[RS$1]-1)gensub(/./,"X","g",substr($2,a[RS$1],b[RS$1]-a[RS$1]+1))substr($2,b[RS$1]+1)}' ref RS="++read" file
++read_1
AFSXXXXXXXJFHAKSJFHAKJSFHAFHASJKFHAS
++read_2
ASDFASJFOXXXXXXEUIEWOUIEOWUIOEWUIEWFSFSJDHGJDSKHGSKDJHG
++read_3
UTWEXXXXWETEWTEWTEF

This User Gave Thanks to yinyuemi For This Post:
# 3  
Old 03-12-2011
Code:
#!/usr/bin/perl
use strict;

my %ref_h;
my (@refdata,@ar);
my ($start,$end,$format,$col);

open(RF,"<","reffile") or  die "Fail-$!\n";
while (<RF>) {
chomp;
@refdata=split;
$ref_h{$refdata[0]}=[$refdata[1],$refdata[2]]
}
close(RF);

open(DF,"<","datafile") or die "Fail-$!\n";
while (<DF>){
        chomp;
        if (/^\+\+/) {
        ($col=$_)=~s/\++//g;
        next unless exists $ref_h{$col} ;
        print "$_\n";
        $start=$ref_h{$col}[0];
        $end=$ref_h{$col}[1];
        next;
        }
        next unless $start>0;
        $format="A".($start-1)." x".($end-$start+1)." A*";
        @ar=unpack($format, $_);
        ($ar[1],$ar[2])=("X" x ($end-$start+1),$ar[1]);
        print @ar,"\n";
        $start=0;$end=0;
}
close(DF);


Last edited by pravin27; 03-12-2011 at 05:47 AM..
This User Gave Thanks to pravin27 For This Post:
# 4  
Old 03-14-2011
Hi yinyuemi,

I just sent a message to your mail box.
Need your advice about it.
Thanks Smilie
# 5  
Old 03-14-2011
Try this,
Code:
#!/usr/bin/perl

my %ref_h;
my (@refdata,@ar);
my ($start,$end,$format,$col);

open(RF,"<","reffile") or  die "Fail-$!\n";
while (<RF>) {
chomp;
@refdata=split;
push (@{$refdata[0]},$refdata[1],$refdata[2]);
$ref_h{$refdata[0]}=\@{$refdata[0]}
}
close(RF);

open(DF,"<","input_file.txt") or die "Fail-$!\n";
while (<DF>){
        chomp;
        if (/^\>/) {
        ($col=$_)=~s/\>//g;
        next unless exists $ref_h{$col} ;
        print "$_\n";
        next;
        }
        $index=$#{$col};
        for ($k=0;$k<=$index;$k++) {
        $start=$ref_h{$col}[$k];
        $end=$ref_h{$col}[++$k];
        $format="A".($start-1)." x".($end-$start+1)." A*";
        @ar=unpack($format,$_);
        ($ar[1],$ar[2])=("X" x ($end-$start+1),$ar[1]);
        ($_="@ar")=~s/\s//g;
        undef @ar;
        }
        print $_,"\n";
        $start=0;$end=0;$index=0;
}
close(DF);

You can use below code instead of the above red lines
Code:
substr($_,($start-1),($end-$start+1))="X" x ($end-$start);


Last edited by pravin27; 03-14-2011 at 09:39 AM..
This User Gave Thanks to pravin27 For This Post:
# 6  
Old 03-18-2011
Hi pravin27,

I just sent you a message yesterday.
Need your advice.
Thanks Smilie

---------- Post updated 03-18-11 at 02:17 AM ---------- Previous update was 03-17-11 at 04:15 AM ----------

Hi pravin27,

I just resend you the input and reference file.
Thanks.
# 7  
Old 03-18-2011
Try this,
Your reffile column should match with input_file.txt. I have added lowercase(lc). Hope this will resolved your problem.

Code:
#!/usr/bin/perl

my %ref_h;
my (@refdata,@ar);
my ($start,$end,$format,$col);

open(RF,"<","reffile") or  die "Fail-$!\n";
while (<RF>) {
chomp;
@refdata=split;
push (@{$refdata[0]},$refdata[1],$refdata[2]);
$ref_h{lc $refdata[0]}=\@{$refdata[0]}
}
close(RF);

open(DF,"<","/tmp/input_file.txt") or die "Fail-$!\n";
while (<DF>){
        chomp;
        if (/^\>/) {
        ($col=$_)=~s/\>//g;
        next unless exists $ref_h{lc $col} ;
        print "$_\n";
        next;
        }
        $index=$#{$col};
        for ($k=0;$k<=$index;$k++) {
        $start=$ref_h{lc $col}[$k];
        $end=$ref_h{lc $col}[++$k];
        substr($_,($start-1),($end-$start+1))="X" x ($end-$start);
        }
        print $_,"\n";
        $start=0;$end=0;$index=0;
}
close(DF);

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert character at specific location in a each line of the file

Hi All, I am trying to write a shell script where it should insert character 'I' in 180th position of each line(except first and last line) of the file. Below is the script for file in /home/test/bharat/*.RET do # Process file echo "File Name=" $file #l_fileName="${file##*/}" ... (19 Replies)
Discussion started by: bharath561989
19 Replies

2. Shell Programming and Scripting

Copy files based on specific word in a file name & its extension and putting it in required location

Hello All, Since i'm relatively new in shell script need your guidance. I'm copying files manually based on a specific word in a file name and its extension and then moving it into some destination folder. so if filename contains hyr word and it has .md and .db extension; it will move to TUM/HYR... (13 Replies)
Discussion started by: prajaktaraut
13 Replies

3. Shell Programming and Scripting

File Parsing based on a character in a specific field

Hi All, I'm having a hard time finding a starting point for my issue. I have a 30k line file (fspsec.txt) that I would like to parse into smaller files based on any character existing in field 1. ACCOUNTANT LEVEL 1 (ACCT.ACCOUNTANT) OPERATORS: DOEJO (418) TOOLS: Branch Maintenance ... (2 Replies)
Discussion started by: aahlrich
2 Replies

4. UNIX for Advanced & Expert Users

Replace certain character at specific place with related character

hello i have file with 100k records and each one has certain value that starts at 28th column and certain value that starts at 88th column e.g. 1st file <25>1234567 ..... <88> 8573785485 i have aditional file with values which are related to value that starts at 88th column of the... (1 Reply)
Discussion started by: dell1520
1 Replies

5. Shell Programming and Scripting

Using sed to replace a word at specific location

I'm try to change a the prohibit to aix for the lines starting with ssh and emagent and rest should be the same. Can anyone please suggest me how to do that using a shell script or sed passwd account required /usr/lib/security/pam_prohibit passwd session required ... (13 Replies)
Discussion started by: pjeedu2247
13 Replies

6. Shell Programming and Scripting

File character adjustment based on specific character

i have a reqirement to adjust the data in a file based on a perticular character the sample data is as below 483PDEAN CORRIGAN 52304037528955WAGES 50000 89BP ABCD MASTER352 5434604223735428 4200 58BP SOUTHERN WA848 ... (1 Reply)
Discussion started by: pema.yozer
1 Replies

7. Shell Programming and Scripting

Replace spaces at a specific Location

Hello All, I have a comma separated file which needs to be loaded to the database. But, I need to trim the white spaces for a specific column before its loaded. Below is the sample line from the input file: 690,690,0575,"01011940","01011940", , , , , ,36720,36722,V2020,V2999,... (6 Replies)
Discussion started by: Praveenkulkarni
6 Replies

8. Shell Programming and Scripting

how to replace specific character , if possible using sed

My script is extracting data from SQl session, however sometimes the result contains one or multiple space after/before any numerical value. e,g . "123","1 34","1 3 45", "43 5" How to remove these unwanted spaces..so that I can get the following result : "123","134",1345","435" (1 Reply)
Discussion started by: mady135
1 Replies

9. Shell Programming and Scripting

Find and replace a string a specific value in specific location in AIX

Hi, I have following samp.txt file in unix. samp.txt 01Roy2D3M000000 02Rad2D3M222222 . . . . 10Mik0A2M343443 Desired Output 01Roy2A3M000000 02Rad2A3M222222 . . (5 Replies)
Discussion started by: techmoris
5 Replies

10. Shell Programming and Scripting

Using sed to replace specific character and specific position

I am trying to use sed to replace specific characters at a specific position in the file with a different value... can this be done? Example: File: A0199999123 A0199999124 A0199999125 Need to replace 99999 in positions 3-7 with 88888. Any help is appreciated. (5 Replies)
Discussion started by: programmer22
5 Replies
Login or Register to Ask a Question