Need to strip control-A characters from a column in a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need to strip control-A characters from a column in a file
# 1  
Old 05-19-2015
Need to strip control-A characters from a column in a file

Hi All,

I currently have flat file with 32 columns. The field delimiter is cntl-A ( \x01). The file has been extracted from an oracle table using a datastage job. However, in the 6th field, the data contains additional control -A characters which came as a part of the table data.

I need some help in removing these control-A characters in just this 6th field alone.

I tried using sed command to replace the first 5 delimiters and last 24 delimiters with another delimiter , like a | , and then use tr to strip off the remaining control-A characters. But it is taking too long. Any help is appreciated.
# 2  
Old 05-19-2015
Can we see an example of a sample file?
What OS are you using?
Which shell is preferred?

If it is taking too long then how big is this _flat_file_?
# 3  
Old 05-19-2015
Here is a sample. I am using ',' as field delimiter instead of cntl-a.
Code:
1,A,USA,0
2,B,GERMANY,0
3,C,IND,IA,0
4,D,CH,INA,0

In the above example, the values "IND,IA" and "CH,INA" are coming from the table.

The files are in .gz format and the sizes are around 12 GB each.

Moderator's Comments:
Mod Comment
Please wrap all code, files, input & output/errors in CODE tags.
It makes them easier to read and preserves multiple spaces for indenting and fixed-width data.

Last edited by rbatte1; 05-20-2015 at 05:55 AM.. Reason: Added CODE tags for the file
# 4  
Old 05-19-2015
You could produce your own test file like this:

Code:
$ printf "%s\x01" {1..31} > infile
$ printf "32\n" >> infile
$ printf "%s\x01" {1..5} 6{A..E} {7..31} >> infile
$ printf "32\n" >> infile

# 5  
Old 05-19-2015
Quote:
Originally Posted by harsha1238
Here is a sample. I am using ',' as field delimiter instead of cntl-a.

1,A,USA,0
2,B,GERMANY,0
3,C,IND,IA,0
4,D,CH,INA,0

In the above example, the values "IND,IA" and "CH,INA" are coming from the table.

The files are in .gz format and the sizes are around 12 GB each.
Ignoring the 6th column in the OP you never mentioned that, (in this case), the third column(ish) might or might not have this delimiter.
Is this a random event or is it every pair as in your example?
Can we see your attempt please?
# 6  
Old 05-19-2015
These rows are random. I used sed command to convert the intial delmiters to another value and then tried to strip the additinal chars from the required column
Code:
 
echo "1,A,USA,0" > test_input.dat
echo "2,B,GERMANY,0" >> test_input.dat
echo "3,C,IND,IA,0" >> test_input.dat
echo "4,D,CH,INA,0" >> test_input.dat
 
sed -i 's/,/|/' test_input.dat                        ## for first delimiter ##
sed -i 's/,/|/' test_input.dat                        ## for 2nd delimiter ##
rev test_input.dat  > test_input.dat_rev       ## reversing the record, since there might be multiple additional delimiters in the problematic column ##
sed -i 's/,/|/' test_input.dat_rev                 ## for the last delimiter in the original record ##
sed -i 's/,//g' test_input.dat_rev                ## this removes the additional delimiters in the required column  ##
rev test_input.dat_rev > test_input.dat      ## revering the file to its original form ##
sed -i 's/|/,/g' test_input.dat ## replacing the new delimiters with original delimiters ##

# 7  
Old 05-19-2015
try something like this:

Code:
awk -F$(printf '\x01') '
NF>32{
   E=NF-32
   for(i=7;i<7+E;i++) $6=$6$i
   for(i=7;i<=32;i++) $i=$(i+E)
   NF=32
} 1' OFS=$(printf '\x01') infile


Last edited by Chubler_XL; 05-19-2015 at 08:32 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to strip some characters before putting in array?

Hi Gurus, my current code like below: nawk '{f1 = (NF>1)?$1:""}{print f1, $NF}'|sed -e 's/s(/,/g;s/)//g;s/ *,/,/'|nawk -F"," '{ab}END{for (i in b) if (!(i in a))print i}' I have file like below. (this is autosys job dependencies) the job with s() is dependencies, the job without s() is... (10 Replies)
Discussion started by: ken6503
10 Replies

2. Red Hat

Special control characters in file

Hi Guys, We receive some huge files on to Linux server. Source system use FTP mechanism to transfer these files on our server. Occasionally one record is getting corrupted while transfer, some control characters are injecting into the file. How to fix this issue ? please advice ? Sample... (2 Replies)
Discussion started by: srikanth38
2 Replies

3. Shell Programming and Scripting

How to view the control characters in a file?

Hello, How can I view control and special characters of a text file?. For example, space, tabs, new line chars etc. Can I use hexdump for it? Thanks (3 Replies)
Discussion started by: reddyr
3 Replies

4. Shell Programming and Scripting

Request for advise on how to remove control characters in a UNIX file extracted from top command

Hi, Please excuse for posting new thread on control characters, I am facing some difficulties in removing the control character from a file extracted from top command, i am able to see control characters using more command and in vi mode, through cat control characters are not visible ... (8 Replies)
Discussion started by: karthikram
8 Replies

5. Shell Programming and Scripting

Strip First few Characters

I want to strip first few characters from each record until a proper datesamp is found. Request for getNextPage.................06/29/12 07:49:30 VVUKOVIC@67.208.166.131{7A805FEF76A62FCBB23EA78B5380EF95.tomcat1}TP-Processor14 LogExchUsage: ERROR:: isprof=false : exch=NSDQ output should be... (2 Replies)
Discussion started by: ratheeshjulk
2 Replies

6. Shell Programming and Scripting

sed replacing specific characters and control characters by escaping

sed -e "s// /g" old.txt > new.txt While I do know some control characters need to be escaped, can normal characters also be escaped and still work the same way? Basically I do not know all control characters that have a special meaning, for example, ?, ., % have a meaning and have to be escaped... (11 Replies)
Discussion started by: ijustneeda
11 Replies

7. Shell Programming and Scripting

Extra control characters being added when I create a file using cat command

Hi, I am using Cygwin.I created a new file and type into it using cat > newfile. When I open this using vi editor, it contains loads of extra control characters. Whats happening? (1 Reply)
Discussion started by: erora
1 Replies

8. Shell Programming and Scripting

display all possible control characters from .xml file in unix

Hi, I have a .xml file in unix. We are passing this file through a xml parser. But we are getting some control characters from input file and XML parser is failing for the control character in file.Now I am getting following error, Error at byte 243206625 of file filename_$.xml: Error... (1 Reply)
Discussion started by: fantushmayu
1 Replies

9. Shell Programming and Scripting

Hidden control characters in a Unix Text File!

Can anyone seem to know how to find out whether a UNIX text file has 'hidden' control characters? Can I view them using 'vi' by some command line options? If there are control characters in a text file which are invisible/hidden.. then how do I get rid of them? Your intelletual answers are... (6 Replies)
Discussion started by: kewl_guy
6 Replies

10. Programming

Identifying and removing control characters in a file.

What is the best method to identify an remove control characters in a file. Would it be easier to do this in Unix or in C. (0 Replies)
Discussion started by: oracle8
0 Replies
Login or Register to Ask a Question