Data Cleaning in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data Cleaning in a file
# 1  
Old 06-26-2011
Data Cleaning in a file

Hi ,

I have the below source data, I need to clean the data in 2nd,4th,5th columns.

Code:
Source Data
RECORD,CASH_TRANS,BEJING,AUG2011/CASH_TRANS,Y/N150/CASH_TRANS,N/201108
RECORD,CASH_TRANS,INDIA,AUG2011/CASH_TRANS,Y/NC110/CASH_TRANS,N/201108
RECORD,LOAN_GRANTED_AMOUNT,RUSSIA,AUG2011/LOAN_GRANTED_AMOUNT,Y/NP874/LOAN_GRANTED_AMOUNT,N/201108
RECORD,LOAN_GRANTED_AMOUNT,PARIS,AUG2011/LOAN_GRANTED_AMOUNT,Y/NB6543/LOAN_GRANTED_AMOUNT,N/201108


Code:
Target Data
RECORD,TRANS,BEJING,AUG2011/TRANS,Y/N150/TRANS,N/201108
RECORD,TRANS,INDIA,AUG2011/TRANS,Y/NC110/TRANS,N/201108
RECORD,GRANTED_AMOUNT,RUSSIA,AUG2011/GRANTED_AMOUNT,Y/NP874/GRANTED_AMOUNT,N/201108
RECORD,GRANTED_AMOUNT,PARIS,AUG2011/GRANTED_AMOUNT,Y/NB6543/GRANTED_AMOUNT,N/201108

Please give me some hints on how can i achieve this.

-Mora
# 2  
Old 06-26-2011
Maybe a simpler solution will be enough?
Code:
sed  's/CASH_//g; s/LOAN_//g' INPUT.FILE > OUTPUT.FILE

This User Gave Thanks to yazu For This Post:
# 3  
Old 06-26-2011
HI Yazu,

Thanks,simple and great solution.

Can you please explain me the code as I don't know the sed I just want to know how this is stripping off the additional characters.

And if I want to limit the code to search in specific records like in awk we will write like
awk '$1=="RECORD" then apply the logic else not

how can we do this in sed.

--Mora
# 4  
Old 06-26-2011
Hey ,

What if the 2nd,4th,5th column names are changing dynamically, do we need to write the code for that many dynamic names are is there any aautomated way to do it..,just interested to know if this is possible.

Regards,
Wang
# 5  
Old 06-26-2011
@ mora It's very simple.
sed applies its commands to each line of input. In this case it's
s/SOMETHING/ANOTHER/g
it's means substitute(s) on each line SOMETHING to ANOTHER everywhere (globally - g flag) in the line. There are maybe several commands, they separated with ";"

In out case we apply two commands on each line and ANOTHER is nothing: 's/CASH_//g; s/LOAN_//g'

It may fail if there are another CASH_ or LOAN_ which shouldn't be deleted.

@ wangkc I really didn't understand your question. Sorry English is not my native language (as you can see, of course Smilie )
# 6  
Old 06-26-2011
Hey yazu,

Sorry if I confused you,I was asking if I have the below source data , do I need to repeat the sed command for that many occurrence or is there any alternate way.
Code:
RECORD,CASH_TRANS,BEJING,AUG2011/CASH_TRANS,Y/N150/CASH_TRANS,N/201108
RECORD,ABC_TRANS,INDIA,AUG2011/ABC_TRANS,Y/NC110/ABC_TRANS,N/201108
RECORD,LOAN_GRANTED_AMOUNT,RUSSIA,AUG2011/LOAN_GRANTED_AMOUNT,Y/NP874/LOAN_GRANTED_AMOUNT,N/201108
RECORD,XYZ_GRANTED_AMOUNT,PARIS,AUG2011/XYZ_GRANTED_AMOUNT,Y/NB6543/XYZ_GRANTED_AMOUNT,N/201108
RECORD,123_TRANS,BEJING,AUG2011/123_TRANS,Y/N150/123_TRANS,N/201108
RECORD,ZSX_TRANS,INDIA,AUG2011/ZSX_TRANS,Y/NC110/ZSX_TRANS,N/201108




Code:
sed  's/CASH_//g; s/LOAN_//g';s/ABC_//g' ;s/XYZ_//g' ;s/123_//g' ;s/ZSX_//g' INPUT.FILE > OUTPUT.FILE

Regards,
Wang
# 7  
Old 06-26-2011
Hi Yazu,

I want to apply the logic,only if the first column is equal to RECORD,not for all the records in the file,how can I do this in sed.

--Mora
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace data of a file with data from another file using shell scripting.

Dears, I'm new to shell scripting and i was wondering if you can help me with following matter. I have a file containing 400,000 records. The file contains two columns like: 00611291,0270404000005453 25262597,1580401000016155 25779812,1700403000001786 00388934,1200408000000880... (1 Reply)
Discussion started by: paniklas
1 Replies

2. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached... (17 Replies)
Discussion started by: nex_asp
17 Replies

3. UNIX for Dummies Questions & Answers

Mapping a data in a file and delete line in source file if data does not exist.

Hi Guys, Please help me with my problem here: I have a source file: 1212 23232 343434 ASAS1 4 3212 23232 343434 ASAS2 4 3234 23232 343434 QWQW1 4 1134 23232 343434 QWQW2 4 3212 23232 343434 QWQW3 4 and a mapping... (4 Replies)
Discussion started by: kokoro
4 Replies

4. Shell Programming and Scripting

cleaning the file

Hi, I have a file with multiple rows. each row has 8 columns. Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas. 1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G Thanks, Diya (3 Replies)
Discussion started by: Diya123
3 Replies

5. Shell Programming and Scripting

File cleaning

HI , I am getting the source data as below. Source Data CDR_Data,,,,, F1,F2,F3,F4,F5,F6 5,5,6,7,8,7 6,6,g,,, 7,7,76,,, 8,8,gt,,, 9,9,df ,d,d,d ,,,,, (4 Replies)
Discussion started by: wangkc
4 Replies

6. UNIX for Dummies Questions & Answers

AWK Data Cleaning

Hello, I am trying to analyze data I recently ran, and the only way to efficiently clean up the data is by using an awk file. I am very new to awk and am having great difficulty with it. In $8 and $9, for example, I am trying to delete numbers that contain 1. I cannot find any tutorials that... (20 Replies)
Discussion started by: carmar87
20 Replies

7. UNIX for Dummies Questions & Answers

cleaning up spaces from fixed width file while converting to csv file

Open to a sed/awk/or perl alternative so that i can stick command into my bash script. This is a problem I resolve using a combination of cut commands - but that is getting convoluted. So would really appreciate it if someone could provide a better solution which basically replaces all... (3 Replies)
Discussion started by: svn
3 Replies

8. Shell Programming and Scripting

Help me with the cleaning of a file using shell script

Dear, I have an input file and need to clean and get the following output: ATM1/0/0,19072000,97848000 ATM1/0/1,18015000,83270000 ATM1/0/2,16879000,89491000 ATM1/0/3,21684000,122096000 Input file: show interface ATM1/0/0 | inc rate Interface ATM1/0/0 queueing strategy: PXF... (3 Replies)
Discussion started by: He2
3 Replies

9. UNIX for Dummies Questions & Answers

Cleaning file system

Hello all, I have a crontab entry to clean up a file system once a week that is used as tmp directory. 00 12 * * 0 find /mytmp -mtime +2 -exec rm {} \; The job starts ok but I always get an error message in mails because it is a file system and there is a directory lost+found that should... (1 Reply)
Discussion started by: qfwfq
1 Replies

10. AIX

doing some spring cleaning....

USERS="me you jim joe sue" for user in ${USERS}; do rmuser -p $user usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` echo Deleting: $user '\t' REMOVING: $usrdir done This is for AIX ONLY!!! but easily ported to... (0 Replies)
Discussion started by: Optimus_P
0 Replies
Login or Register to Ask a Question