How to remove alphabets/special characters/space in the 5th field of a tab delimited file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove alphabets/special characters/space in the 5th field of a tab delimited file?
# 8  
Old 05-08-2014
Quote:
Originally Posted by Srithar
Hi DON,

Thanks for you reply. It helped me a lot. But that is not working for one scenario. If that corrupted records is splitted into 3 lines (as mentioned in example) then it is removing the entire data in the 5th field & also from the successive fields. Let's see some examples:

Input:

Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	"18��#��nz�S^l�����
��`��Z�/á*”�������ˮ7d_�gˉ�RB�nx����R�
9gd,�P�X�O"	"02-May-2014 04:11:54"

Expected Output:
Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	""	"02-May-2014 04:11:54"

Actual Output: (6th field is cutted from the row)
Code:
Record 1 :
"14"	"50603"	"1012"	"2131609"	""

Your original problem did not say anything about multi-line records.
Your original problem clearly showed that the lines starting with Record in your input file were not supposed to be copied to your output file, but now you say that those lines should be copied to the output.
The script I gave you works perfectly for any input that you described in your original problem statement.

Before we make another attempt to help you, you need to clearly describe your input file format and what you want to appear in the output. Start by answering the following questions and then add any other information we need to know to help you get code that will do what you want:
  1. Are input lines starting with Record supposed to be copied to the output?
  2. Can <newline> characters appear in any field other than field 5?
  3. Should the output ever have more than one line of output per input (multi-line) record?
  4. Can <tab> characters ever appear in any field?
  5. Can double-quote characters (") ever appear in any field other than as the 1st and last characters in the field?
  6. Do all input fields have double-quote characters as the 1st and last character in every field?
  7. Are the <newline> characters in your input file data in that field, or is there a fixed input line length that adds <newline> characters to enforce the input line length maximum?
  8. Can <newline> characters ever appear in an input record other than as the last character in a record and in character positions that are integral multiples of 80?
  9. Is there a maximum number of characters in the output file format?
# 9  
Old 05-12-2014
Hi Don,

Please find the answers for your queries below:
  1. Are input lines starting with Record supposed to be copied to the output?
  2. Can <newline> characters appear in any field other than field 5?
    I just put that for understanding purpose. The first line of the file is always the HEADER (field names).The expected output should be like below:
    Code:
    "ID"	"ID2"	"NUMBER"	"ID4"	"ID5"	"DATE1"
    "14"	"503"	"1012"	"314580"	"173124"	"02-May-2014 06:16:53"
    "14"	"503"	"1032"	"247100"	"143773"	"02-May-2014 06:17:17"
    "15"	"503"	"1012"	"247210"	"142773"	"02-May-2014 06:17:34"
    "14"	"503"	"1062"	"122430"	"17828"	"02-May-2014 06:18:11"
    "14"	"503"	"1012"	"-1"	""	"02-May-2014 06:18:11"
    "15"	"503"	"1027"	"-1"	""	"02-May-2014 06:18:52"

  3. Should the output ever have more than one line of output per input (multi-line) record?
    No
  4. Can <tab> characters ever appear in any field?
    No
  5. Can double-quote characters ( " ) ever appear in any field other than as the 1st and last characters in the field?
    No
  6. Do all input fields have double-quote characters as the 1st and last character in every field?
    Yes
  7. Are the <newline> characters in your input file data in that field, or is there a fixed input line length that adds <newline> characters to enforce the input line length maximum?
    No
  8. Can <newline> characters ever appear in an input record other than as the last character in a record and in character positions that are integral multiples of 80?
    No
  9. Is there a maximum number of characters in the output file format?
    No
# 10  
Old 05-12-2014
Please show us a sample input file that should be transformed into that expected output. Nothing you have shown us so far matches the data in your expected output in message #9 in this thread.
# 11  
Old 05-13-2014
In the below file you can see the 5th field (red color) is having the funny characters & that is splitted in multiple lines.

Input File:
Code:
"ID1"	"ID2"	"ID3"	"RD"	"NUM"	"DATE"	
"14"	"50603"	"1012"	"213093"	"18��#��nz�S^l�����
��`��Z�/á*”�������ˮ7d_�gˉ�RB�nx����R�
9gd,�P�X�O"	"02-May-2014 04:11:54"
"15"	"50603"	"1012"	"213093"	"180778699"	"02-May-2014 04:12:48"
"14"	"50603"	"1012"	"139793"	"16M�E��~�,
J:/E��	I��VԽ�ɬ����[��?�]GޱCM�7d_�B��t�a"	"02-May-2014 04:13:07"
"14"	"50603"	"1012"	"372886"	""	"02-May-2014 04:13:11"
"14"	"50603"	"1012"	"480831"	"235345"	"02-May-2014 03:04:03"
"14"	"50603"	"1012"	"183007"	"15RM�N���>w����"	"02-May-2014 03:03:53"

Expected Output File:

Code:
"ID1"	"ID2"	"ID3"	"RD"	"NUM"	"DATE"	
"14"	"50603"	"1012"	"213093"	""	"02-May-2014 04:11:54"
"15"	"50603"	"1012"	"213093"	"180778699"	"02-May-2014 04:12:48"
"14"	"50603"	"1012"	"139793"	""	"02-May-2014 04:13:07"
"14"	"50603"	"1012"	"372886"	""	"02-May-2014 04:13:11"
"14"	"50603"	"1012"	"480831"	"235345"	"02-May-2014 03:04:03"
"14"	"50603"	"1012"	"183007"	""	"02-May-2014 03:03:53"

# 12  
Old 05-14-2014
You said that there would never be embedded tab characters in a field in your input file, but there is a tab in the middle of field 5 in the two line record on lines 6 and 7 in your latest sample input file.

As long as there aren't any embedded tab characters immediately before or after a double quote character, the following seems to do what you want. (However, it is strange that your input file has a trailing tab character on the first line in your sample input file.)
Code:
awk '
BEGIN {	FS = OFS = "\t"
}
{	# Accumulate lines until we have a line with six fields.
#	printf("Line %d, NF %d: %s\n", NR, NF, $0)
	while(gsub(/\"\t\"/, "&") < 5) {
		rc = (getline nl)
		if(rc != 1) {
			printf("Unexpected EOF: line %d, NF %d: %s\n", NR, NF, $0)
			exit 1
		}
		$0 = $0 nl
#		printf("Line %d added, NF %d, %s\n", NR, NF, $0)
	}
	# Convert embedded tabs...
	if(gsub(/[^"]\t|\t[^"]/, "<tab>")) {
#		printf("embedded tabs replaced: %s\n", $0)
	}
	if(NR > 1 && $5 !~ /^"[0-9]*"$/) $5 = "\"\""
	print
}' file2

# 13  
Old 05-14-2014
THANKS a lot DON!! The code works perfect & gives the expected result.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output file with <Tab> or <Space> Delimited

Input file: xyz,pqrs.lmno,NA,NA,NA,NA,NA,NA,NA abcd,pqrs.xyz,NA,NA,NA,NA,NA,NA,NA Expected Output: xyz pqrs.lmno NA NA NA NA NA NA NA abcd pqrs.xyz NA NA NA NA NA NA NA Command Tried so far: awk -F"," 'BEGIN{OFS=" ";} {print}' $File_Path/File_Name.csv Issue:... (5 Replies)
Discussion started by: TechGyaann
5 Replies

2. Shell Programming and Scripting

Remove blank columns from a tab delimited text file

Hello, I have some tab delimited files that may contain blank columns. I would like to delete the blank columns if they exist. There is no clear pattern for when a blank occurs. I was thinking of using sed to replace instances of double tab with blank, sed 's/\t\t//g' All of the examples... (2 Replies)
Discussion started by: LMHmedchem
2 Replies

3. Shell Programming and Scripting

How to convert space&tab delimited file to CSV?

Hello, I have a text file with space and tab (mixed) delimited file and need to convert into CSV. # cat test.txt /dev/rmt/tsmmt32 HP Ultrium 6-SCSI J3LZ 50:03:08:c0:02:72:c0:b5 F00272C0B5 0/0/6/1/1.145.17.255.0.0.0 /dev/rmt/c102t0d0BEST /dev/rmt/tsmmt37 ... (6 Replies)
Discussion started by: prvnrk
6 Replies

4. Shell Programming and Scripting

How to make tab delimited file to space delimited?

Hi How to make tab delimited file to space delimited? in put file: ABC kgy jkh ghj ash kjl o/p file: ABC kgy jkh ghj ash kjl Use code tags, thanks. (1 Reply)
Discussion started by: jagdishrout
1 Replies

5. UNIX for Dummies Questions & Answers

Changing only the first space to a tab in a space delimited text file

Hi, I have a space delimited text file but I only want to change the first space to a tab and keep the rest of the spaces intact. How do I go about doing that? Thanks! (3 Replies)
Discussion started by: evelibertine
3 Replies

6. Shell Programming and Scripting

Remove the special characters from field

Hi, In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file. source dataExample: Address1="XDERFTG * HYJUYTG" how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies

7. Shell Programming and Scripting

Merging files into a single tab delimited file with a space separating

I have a folder that contains say 50 files in a sequential order: cdf_1.txt cdf_2.txt cdf_3.txt cdf_3.txt . . . cdf_50.txt. I need to merge these files in the same order into a single tab delimited file. I used the following shell script: for x in {1..50}; do cat cdf_${x}.txt >>... (3 Replies)
Discussion started by: Lucky Ali
3 Replies

8. UNIX for Dummies Questions & Answers

Insert Field into a tab-delimited file

Hello, I have about 100 files in a directory with fields which are tab delimited. I would like to append the file name as the first field and it has to be done as many times as the total lines in the file. For example, myFile1.txt has the following data: 1 x y z 2 a b ... (5 Replies)
Discussion started by: Gussifinknottle
5 Replies

9. Shell Programming and Scripting

insert a field into a tab delimited file

Hello, Can someone help me to do this with awk or sed? I have a file with multiple lines, each line has many fields separated with a tab. I would like to add one more field holding 'na' in between the first and second fields. old file looks like, 1, field1 field2 field3 ... 2, field1... (7 Replies)
Discussion started by: ssshen
7 Replies

10. UNIX for Dummies Questions & Answers

Converting Space delimited file to Tab delimited file

Hi all, I have a file with single white space delimited values, I want to convert them to a tab delimited file. I tried sed, tr ... but nothing is working. Thanks, Rajeevan D (16 Replies)
Discussion started by: jeevs81
16 Replies
Login or Register to Ask a Question