URGENT:- Data Scrubbing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting URGENT:- Data Scrubbing
# 1  
Old 05-26-2009
Java URGENT:- Data Scrubbing

Hi All,

I have a Flatfile (any delimitation) of millions of lines of data, where in i have to scrubb the data of the line from the position ($1 ) given in input parameter until the length ($2) given in the input parameter. I utilised awk , sed but i am unable to do it.

scrub key - 12345 should be replaced by 67890
eg: 01289 - before scrubbing
06789 - after scrubbing


example : sample.ksh 6 4



Input - Flatfile:
-------
"1234,5678,0987,12345667,000000976655,+1234,013994878356"
"0987,23467,11243554,0000887651,1234567,09876,1234455"
"0987675,1223443,797784784784,09866545,+232322,097865"

I want the output as scrubbed file as below:

"1234,0678,0987,12345667,000000976655,+1234,013994878356"
"0987,78967,11243554,0000887651,1234567,09876,1234455"
"0987675,6778443,797784784784,09866545,+232322,097865"
# 2  
Old 05-26-2009
if you have Python
Code:
#!/usr/bin/env python
import string
FROM="12345"
TO="67890"
table=string.maketrans(FROM,TO)
for line in open("file"):
    line=line.strip().split(",")
    line[1]=line[1][:4].translate(table) + ''.join(line[1][4:])
    print ','.join(line)

output
Code:
# python test.py
"1234,0678,0987,12345667,000000976655,+1234,013994878356"
"0987,78967,11243554,0000887651,1234567,09876,1234455"
"0987675,6778443,797784784784,09866545,+232322,097865"

# 3  
Old 05-26-2009
Code:
sub scrub{
	my($pos,$len)=(@_);
	while(<DATA>){
		substr($_,$pos-1,$len) =~ y/12345/67890/;
		print $_;
	}
}
scrub(6,3);
__DATA__
1234,5678,0987,12345667,000000976655,+1234,013994878356
0987,23467,11243554,0000887651,1234567,09876,1234455
0987,1223443,797784784784,09866545,+232322,097865

# 4  
Old 05-26-2009
URGENT:- Data scrubbing

Hi All, thanks for the reply ....
i dont have python or perl ....
I have only ksh, bash, sh, csh

please help me
# 5  
Old 05-26-2009
Quote:
Originally Posted by padhu.47
Hi All, thanks for the reply ....
i dont have python or perl ....
I have only ksh, bash, sh, csh

please help me
Better to install PERL or Python. It is free and easy to install.
# 6  
Old 05-26-2009
Quote:
Originally Posted by padhu.47
Hi All, thanks for the reply ....
i dont have python or perl ....
I have only ksh, bash, sh, csh

please help me
then use awk
Code:
awk -F"," 'BEGIN{
 t["1"]="6"
 t["2"]="7"
 t["3"]="8"
 t["4"]="9"
 t["5"]="0" 
}
{
 s=""
 for(i=1;i<=4;i++){
  if( substr($2,i,1) in t ){
     s=s t[substr($2,i,1)]
  }else{
     s=s substr($2,i,1)
  }   
 }
 $2=s substr($2,5)
}
1
' OFS="," file

# 7  
Old 05-27-2009
URGENT:- Data Scrubbing

the above code is not working .....
please help me in writing this code in KSH,CSH,SH....

-----Post Update-----

hello guys,

I have wrote a awk prog ...as below to do it. but its doing for all the numbers inside the flatfile.

code#:
#!/usr/bin/awk -f
BEGIN {
CnvFrom = "0123456789";
CnvTo = "4590382617";

Field = 1;
}
{
newField = ""
for (i=1; i<=length($Field); i++) {
char = substr($Field, i, 1);
if (pos=index(CnvFrom, char))
char = substr(CnvTo, pos, 1)
newField = newField char
}
$Field = newField
print
}


But my requirment is to change/translate the values from the position(input parameter - $2) and length (input parameter - $3) for the flatfile mentioned in a directory (input parameter - $1). please help me ....

eg : scrub.ksh file1 68 9 ( $1 - filename, $2 -postion (68), $3 - lenth from position (9) )

Before scrub- file1:
---------------------
"37713000000","12000000202","0000000000000000000007102","0000377310013683931",20090114,20080301,20080331,20060304,+000000000005897."
"37713000000","12000000202","0000000000000000000007102","0000377310013683931",20090114,20080301,20080331,20060304,+000000000005897."
"37713000000","12000000202","0000000000000000000010739","0000377310044493243",20090114,20080501,20080531,20070224,+000000000000000."
"37713000000","12000000202","0000000000000000000010739","0000377311018365607",20090114,20080401,20080430,20070224"


After scrub -file1:
-----------------

"37713000000","12000000202","0000000000000000000007102","0000377310450210705",20090114,20080301,20080331,20060304,+000000000005897."
"37713000000","12000000202","0000000000000000000007102","0000377310450210705",20090114,20080301,20080331,20060304,+000000000005897."
"37713000000","12000000202","0000000000000000000010739","0000377310433370930",20090114,20080501,20080531,20070224,+000000000000000."
"37713000000","12000000202","0000000000000000000010739","0000377311451028246",20090114,20080401,20080430,20070224"

please help me ..... i want the scrub as per the input parameters.....
please help ....

Last edited by padhu.47; 05-27-2009 at 05:43 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

URGENT Reading a file and assessing the syntax shell script URGENT

I am trying to write a shell script which takes an input file as an arguement in the terminal e.g. bash shellscriptname.sh input.txt. I would like for the file to be read line by line each time checking if the .txt file contains certain words or letters(validating the syntax). If the line being... (1 Reply)
Discussion started by: Gurdza32
1 Replies

2. Shell Programming and Scripting

Need Urgent Help from UNIX gurus to get specific data from a file

Hi, I have a file monitor.txt as below... # Times are converted to local time from GMT. # Local Timezone: EST (GMT -05:00) PARAM1 { TIME 30; CC1 "xxxxx"; CC2 "xxxxx"; CC3 "xxxxx"; CC4 "xxxxx"; } PARAM2 { 4061 :... (3 Replies)
Discussion started by: zaq1xsw2
3 Replies

3. Shell Programming and Scripting

urgent<parsing data from a excel file>

Hi all, I wud like to get ur assistance in retrieving lines containing l1.My excel dataset contains around 8000 lines.I converted it into a text tab delimiter file and got the lines containing l1,My output is a list of lines containing l1 saved in a outfile.Some of d lines from my outfile s... (5 Replies)
Discussion started by: sayee
5 Replies

4. Shell Programming and Scripting

Extract block of data and the error reason too. So so urgent

Hi , this is my first enty in our forum. Problem scenario: Using informatica tool am loding records from source DB to target DB. While loading some records getting rejected due to some reason. Informatica will capture those rejected records in session log file.now the session log ll be... (2 Replies)
Discussion started by: Gopal_Engg
2 Replies

5. Shell Programming and Scripting

urgent help : want to check data in oracle from flate file

Hi All, I have a flat file like this on unix AIX server: MXBOFO CSWP 5340 3794499 MXBOBIS CSWP 5340 3581124 MXBOFO CSWP 5340 3794531 MXBOBIS CSWP 5340 3583720 MXBOFO CSWP 5340 3794514 MXBOBIS CSWP 5340 3580763 MXBOFO CSWP 5340 3795578 MXBOBIS CSWP 5340 3794995 MXBOFO CSWP 5340 3710835... (3 Replies)
Discussion started by: unknown123
3 Replies

6. Shell Programming and Scripting

how to convert the result of the select query to comma seperated data - urgent pls

how to convert the result of the select query to comma seperated data and put in a .csv file using korn shell. Pls help me as its very urgent. Thanks, Hema. (1 Reply)
Discussion started by: Hemamalini
1 Replies

7. Solaris

formatting hard drive (scrubbing)

I'm in the process of scrubbing a tonne of hard drives. Having a few problems with my formatting. I've been following http://www.sun.com/blueprints/0600/scrub.pdf however on the steps of #format> format Ready to format. Formatting cannot be interrupted and takes 100 minutes (estimated).... (0 Replies)
Discussion started by: Jamiee
0 Replies

8. Shell Programming and Scripting

urgent-extracting block data from flat file using shell script

Hi, I want to extract block of data from flat file. the data will be like this start of log One two three end of log i want all data between start of log to end of log i.e One two three to be copied to another file. This particular block may appear multiple times in same file. I... (4 Replies)
Discussion started by: shirish_cd
4 Replies

9. Shell Programming and Scripting

Help replacing or scrubbing unicode characters

I have a csv (tab delimited) file that is created by an application (that I didn't write). Every so often it throw out a <U+FEFF> (Zero Width no break space) character at the begining of a tabbed field. The charcater is invisible to some editors, but it shows up bolded in less. The issue is... (3 Replies)
Discussion started by: roninuta
3 Replies

10. Shell Programming and Scripting

[urgent need help]compare data

hi all, very need help urgently :( i have a problem compare 2 files from solaris, the 2 files its shown below: data1.log : 6512345678 6512345677 20070131 073824 420 6511111111 6522222222 20070131 103747 87 6522222222 6233333333 ... (2 Replies)
Discussion started by: bucci
2 Replies
Login or Register to Ask a Question