How to remove alphabets/special characters/space in the 5th field of a tab delimited file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove alphabets/special characters/space in the 5th field of a tab delimited file?
# 1  
Old 05-07-2014
Tools How to remove alphabets/special characters/space in the 5th field of a tab delimited file?

Thank you for 4 looking this post.

We have a tab delimited file where we are facing problem in a lot of funny character. I have tried using awk but failed that is not working.
In the 5th field ID which is supposed to be a integer only of that file, we are getting corrupted data as below.
I want to remove the entire corrupted data for the corresponding row & replace it with empty value.

i am not sure what are these symbles are and what command can replace these funny junk chars.

Your suggestion are appreciated.

Example:
Code:
record 1 - 
"14"    "50603" "1012"  "123"      "12ռ4�Z{>�}ŪiY2���3�'�ݐ؍>N�C�7>S"      "19-Mar-2014 14:58:26" 
record 2 - 
"14"    "50603" "1012"  "37164455"      "ռ4�Z>S"     "19-Mar-2014 14:58:26"

Output Should be like:

Code:
"14"    "50603" "1012"  "123"      ""     "19-Mar-2014 14:58:26"  
"14"    "50603" "1012"  "37164455"      ""      "19-Mar-2014 14:58:26"


Last edited by Don Cragun; 05-07-2014 at 06:09 PM.. Reason: Add CODE tags.
# 2  
Old 05-07-2014
You didn't say anything about removing numeric characters. Why isn't the output for the 5th field in record 1 "124237"? Why isn't the output for the 5th field in record 2 "4"?
# 3  
Old 05-07-2014
try:
Code:
awk -F"\t" 'NF>4{
  f=$5;
  sub("^\"", "", f);
  sub("\"$", "", f);
  f = (f ~ /^[0-9]*$/) ? f : "";
  $5="\"" f "\"";
  print;
}' infile

# 4  
Old 05-08-2014
For your request to remove field 5, this might suffice:
Code:
awk '{$5="\"\""}1' OFS="\t" file
"14"    "50603"    "1012"    "123"    ""    "19-Mar-2014    14:58:26"
"14"    "50603"    "1012"    "37164455"    ""    "19-Mar-2014    14:58:26"

But, as Don Cragun says, it might be worthwhile to consider repair instead of remove, and to try to track down the cause of the unwanted behaviour.
# 5  
Old 05-08-2014
Hi DON,

We need to remove the entire field if it contains any funny characters. we dont need to maintain the numeric characters in that field if it contains funny characters. So the output of the 5 th field should be " " for both the cases.


Thanks !
# 6  
Old 05-08-2014
You could try something like:
Code:
awk '
BEGIN {	FS = OFS = "\t"
}
/^"/ {	if($5 !~ /^"[0-9]*"$/) $5 = "\"\""
	print
}' file

If you want to try this on a Solaris/SunOS system change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
If file contains:
Code:
record 1 - 
"14"	"50603"	"1012"	"123"	"12ռ4�Z{>�}ŪiY2���3�'�ݐ؍>N�C�7>S"	"19-Mar-2014 14:58:26"	
record 2 - 
"14"	"50603"	"1012"	"37164455"	"ռ4�Z>S"	"19-Mar-2014 14:58:26"
record 3 - 
"1"	"2"	"3"	"4"	"5"	"08-May-2014 11:14:59"

this will produce:
Code:
"14"	"50603"	"1012"	"123"	""	"19-Mar-2014 14:58:26"	
"14"	"50603"	"1012"	"37164455"	""	"19-Mar-2014 14:58:26"
"1"	"2"	"3"	"4"	"5"	"08-May-2014 11:14:59"

# 7  
Old 05-08-2014
Hi DON,

Thanks for you reply. It helped me a lot. But that is not working for one scenario. If that corrupted records is splitted into 3 lines (as mentioned in example) then it is removing the entire data in the 5th field & also from the successive fields. Let's see some examples:

Input:

Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	"18��#��nz�S^l�����
��`��Z�/*�������ˮ7d_�gˉ�RB�nx����R�
9gd,�P�X�O"	"02-May-2014 04:11:54"

Expected Output:
Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	""	"02-May-2014 04:11:54"

Actual Output: (6th field is cutted from the row)
Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	""

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output file with <Tab> or <Space> Delimited

Input file: xyz,pqrs.lmno,NA,NA,NA,NA,NA,NA,NA abcd,pqrs.xyz,NA,NA,NA,NA,NA,NA,NA Expected Output: xyz pqrs.lmno NA NA NA NA NA NA NA abcd pqrs.xyz NA NA NA NA NA NA NA Command Tried so far: awk -F"," 'BEGIN{OFS=" ";} {print}' $File_Path/File_Name.csv Issue:... (5 Replies)
Discussion started by: TechGyaann
5 Replies

2. Shell Programming and Scripting

Remove blank columns from a tab delimited text file

Hello, I have some tab delimited files that may contain blank columns. I would like to delete the blank columns if they exist. There is no clear pattern for when a blank occurs. I was thinking of using sed to replace instances of double tab with blank, sed 's/\t\t//g' All of the examples... (2 Replies)
Discussion started by: LMHmedchem
2 Replies

3. Shell Programming and Scripting

How to convert space&tab delimited file to CSV?

Hello, I have a text file with space and tab (mixed) delimited file and need to convert into CSV. # cat test.txt /dev/rmt/tsmmt32 HP Ultrium 6-SCSI J3LZ 50:03:08:c0:02:72:c0:b5 F00272C0B5 0/0/6/1/1.145.17.255.0.0.0 /dev/rmt/c102t0d0BEST /dev/rmt/tsmmt37 ... (6 Replies)
Discussion started by: prvnrk
6 Replies

4. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks. (1 Reply)
Discussion started by: jagdishrout
1 Replies

5. UNIX for Dummies Questions & Answers

Changing only the first space to a tab in a space delimited text file

Hi, I have a space delimited text file but I only want to change the first space to a tab and keep the rest of the spaces intact. How do I go about doing that? Thanks! (3 Replies)
Discussion started by: evelibertine
3 Replies

6. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

7. Shell Programming and Scripting

Merging files into a single tab delimited file with a space separating

I have a folder that contains say 50 files in a sequential order: cdf_1.txt cdf_2.txt cdf_3.txt cdf_3.txt . . . cdf_50.txt. I need to merge these files in the same order into a single tab delimited file. I used the following shell script: for x in {1..50}; do cat cdf_${x}.txt >>... (3 Replies)
Discussion started by: Lucky Ali
3 Replies

8. UNIX for Dummies Questions & Answers

Insert Field into a tab-delimited file

Hello, I have about 100 files in a directory with fields which are tab delimited. I would like to append the file name as the first field and it has to be done as many times as the total lines in the file. For example, myFile1.txt has the following data: 1 x y z 2 a b ... (5 Replies)
Discussion started by: Gussifinknottle
5 Replies

9. Shell Programming and Scripting

insert a field into a tab delimited file

Hello, Can someone help me to do this with awk or sed? I have a file with multiple lines, each line has many fields separated with a tab. I would like to add one more field holding 'na' in between the first and second fields. old file looks like, 1, field1 field2 field3 ... 2, field1... (7 Replies)
Discussion started by: ssshen
7 Replies

10. UNIX for Dummies Questions & Answers

Converting Space delimited file to Tab delimited file

Hi all, I have a file with single white space delimited values, I want to convert them to a tab delimited file. I tried sed, tr ... but nothing is working. Thanks, Rajeevan D (16 Replies)
Discussion started by: jeevs81
16 Replies
Login or Register to Ask a Question