Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Removal of multiple characters with in double quotes

Shell Programming and Scripting


Reply    
 
Thread Tools Search this Thread Display Modes
    #8  
Old Unix and Linux 1 Week Ago   -   Original Discussion by Jag_1981
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 21 January 2018, 11:05 PM EST
Location: San Jose, CA, USA
Posts: 10,929
Thanks: 611
Thanked 3,819 Times in 3,263 Posts
Hi Jag_1981,
No. You changed the format of the input data in between post #4 and post #6!

In your sample input data in post #4 in this thread, records are separated by a blank line. But, the sample data that you say is not working has no separator between records and, therefore, there is no way to determine where one record ends and the next begins.

Your sample output also differs by removing the blank lines between output records.

Please do not blame RudiC for providing you with bad code when the code he provided works perfectly with the description of what was to be done on the sample input you originally provided.

Note also that your output samples are not consistent with your input samples in either post #4 or post #6. You say you want double-quoted <newline>s and <vertical-bar>s to be removed, but that is only true part of the time. In the sample outputs that you have shown us some of those characters are replaced by <space> characters instead of being removed. For an example look at Tour World versus TourWorld in both of those posts and RudiC's code removes all of them as requested in your problem statements, but it doesn't match (and can't match) the sample output you provided in either case. If some <newline> characters are to be replaced by <space> instead of being removed, you need to CLEARLY specify the logic that can be used to determine which action is to be taken. You also sometimes replace a single <space> with two adjacent <space> characters in one place in post #4.

Similarly, if there aren't any blank lines between records in your input file, you need to clearly specify how the end of a record is supposed to be identified. Please help us help you by clearly specifying what is supposed to happen and by providing sample inputs and outputs that match the behavior that you describe.
Sponsored Links
    #9  
Old Unix and Linux 1 Week Ago   -   Original Discussion by Jag_1981
Jag_1981's Unix or Linux Image
Jag_1981 Jag_1981 is offline
Registered User
 
Join Date: Jan 2018
Last Activity: 13 January 2018, 2:05 PM EST
Posts: 4
Thanks: 1
Thanked 0 Times in 0 Posts
Dear Don/RudiC,

My sincere thanks for being patience with me as well as helping me with my need.

I understand fully now that by sharing incorrect or partial input/output file without paying full attention to the same, I am wasting your valuable time.

I am attempting to summarize again my need with below details.

1. My input file is Pipe (|) Delimited CSV file.
2. It has multiple records and end of record is identified by new line character.
3. There is no blank lines between each record ( either in input or output file)
4. I want only double-quoted <newline>s and <vertical-bar>s to be removed. (replaced by Null)
5. The double quotes itself should be removed. (Replaced by Null)

Sample Input File:



Code:
111|"IKJA - SPORTS"|00IIQ|Normal|100 Hall Road|
123|"ABCD RENT-A-
CAR XYZ LTD"|00N0H|Enterprise Lake|"
100 View Way"|
244|"DEFG Travel | Tour
World LTD"|"AK|0Q"|Praire Lake|"
105 NE Main St"|

Expected Output file:



Code:
111|IKJA - SPORTS|00IIQ|Normal|100 Hall Road|
123|ABCD RENT-A-CAR XYZ LTD|00N0H|Enterprise Lake|100 View Way|
244|DEFG Travel  TourWorld LTD|AK0Q|Praire Lake|105 NE Main St|

Sponsored Links
    #10  
Old Unix and Linux 1 Week Ago   -   Original Discussion by Jag_1981
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 21 January 2018, 11:05 PM EST
Location: San Jose, CA, USA
Posts: 10,929
Thanks: 611
Thanked 3,819 Times in 3,263 Posts
With the sample input shown in post #9 stored in a file named file.csv, the following code:


Code:
awk -F'\n' -v dq='"' '
{	record = record $0
	#printf("record in:\n%s\n", record)
}
(n = split(record, f, dq)) % 2 {
	#printf("split into %d fields\n", n)
	for(i = 2; i <= n; i += 2) {
		gsub(/[|]/, "", f[i])
		#printf("f[%d] updated to: \"%s\"\n", i, f[i])
	}
	for(i = 1; i <= n; i++)
		printf("%s%s", f[i], (i == n) ? ORS : "")
	record = ""
}' file.csv

produces the output requested in post #9:


Code:
111|IKJA - SPORTS|00IIQ|Normal|100 Hall Road|
123|ABCD RENT-A-CAR XYZ LTD|00N0H|Enterprise Lake|100 View Way|
244|DEFG Travel  TourWorld LTD|AK0Q|Praire Lake|105 NE Main St|

But note that it removes <newline> and <vertical-bar> characters found between pairs of <double-quote> characters; it does NOT replace them with <NUL> characters. Replacing those characters with <NUL> characters would give you a binary file instead of a text file.

If someone else wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

If you uncomment the commented out printf() statements, you can get an inside view at how it accumulates records and removes unwanted <vertical-bar>s and <newline>s.
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Removal of comma within double quotes H_bansal Shell Programming and Scripting 1 03-12-2016 05:27 AM
Replace Double quotes within double quotes in a column with space while loading a CSV file mlavanya Shell Programming and Scripting 6 05-12-2015 01:05 AM
Multiple double quotes reddyr Shell Programming and Scripting 1 04-11-2011 05:57 PM
Removal of new line character in double quotes vsairam Shell Programming and Scripting 7 05-19-2010 04:44 PM
Removal of comma(,) present inbetween double quotes(" ") vsairam Shell Programming and Scripting 12 07-17-2009 02:03 PM



All times are GMT -4. The time now is 04:30 AM.