Sponsored Content
Top Forums Shell Programming and Scripting Removing Embedded Newline from Delimited File Post 302269525 by bbetteridge on Thursday 18th of December 2008 12:08:21 AM
Old 12-18-2008
Removing Embedded Newline from Delimited File

Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out.

Solutions that I have tried so far:

1) From a thread here in 2005:
{record = record $0
if (gsub(/"/,"&", record) % 2 )
{ record = record " "
next
}
}
{
print record
record = ""
}

Problems - This worked beautifully on the test data, Then, it was working just fine on the main data... until I received the following error:
"The result [...] of the gsub function cannot be longer than 3000 bytes.

Did I mention that the field with embedded newline characters is going to be loaded as a character large object into the database? Granted, it's only going to be about 6k at max, but that's still more than gsub can handle.

Another note - the test data didn't have any embedded double-quotes. I doubt that this would cause a problem, but in the interest of full disclosure, I should state it.

2) Monster regex:

sed -n '
H
g
s/\n//g
h
/^"\(\(""\)*[^"]*\)*"\(;"\(\(""\)*[^"]*\)*"\)*$/{p;s/.*//g;h;d;}
$p
' filename

Problem - this removes EVERY newline from the script, not just the in-line ones. Definitely can't use this to load the data. Plus, it take a pretty substantial chunk of CPU to run through it.

Other issues with the data:
The "good" newlines at the end of each record are in the same format as the embedded newlines. The FTP client that they use must auto-apply dos2unix when it detects a "text" filetype.

Any help with this would be appreciated. If you need me to clear anything up, let me know.

-Brandon
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help needed in removing intermediate segments from a pipe delimited segment file

Hi, I just stuckup in doing some regular expressions on a file. I have data which has multiple FHS and BTS segments like: FHS|12121|LOCAL|2323 MSH|10101|POTAMAS|2323 PID|121221|THOMAS|DAVID|23432 OBX|2342|H1211|3232 BTS|0000|MERSTO|LIABLE FHS|12121|LOCAL|2323 MSH|10101|POTAMAS|2323... (3 Replies)
Discussion started by: naren_0101bits
3 Replies

2. UNIX for Advanced & Expert Users

Issue with Removing Carriage Return (^M) in delimited file

Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like a,b, c c,d,e The delimited file is generated using sh script by outputing a SQL query result to... (7 Replies)
Discussion started by: sirahc
7 Replies

3. Shell Programming and Scripting

Removing blanks in a text tab delimited file

Hi Experts I am very new to perl and need to make a script using perl. I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below : Input: 117 102 650 652 654 656 117 93 95... (3 Replies)
Discussion started by: Faisal Riaz
3 Replies

4. Shell Programming and Scripting

Read Embedded Newline characters with read (builtin) in KSH93

Hi Guys, Happy New Year to you all! I have a requirement to read an embedded new-line using KSH's read builtin. Here is what I am trying to do: run_sql "select guestid, address, email from guest" | while read id addr email do ## Biz logic goes here done I can take care of any... (6 Replies)
Discussion started by: a_programmer
6 Replies

5. Shell Programming and Scripting

Removing ^M and the newline that follows it.

Hi Gurus, Apologies as I feel like this must be answered already on here somewhere but I just can't find it. I find many people looking to remove all \n and \r (CR and LF) or one or the other but the only times I've found someone trying to remove them only when both are together they've found... (7 Replies)
Discussion started by: Leedor
7 Replies

6. Shell Programming and Scripting

Help with removing embedded linefeeds

Greetings all, i have csv file with pipe separated columns SSN|NAME|ADDRESS|FILLER 123|abc|myaddress|xxx 234|BBB|my add ress broken up|yyy In the example above, the second record is broken into multiple lines. I need to keep going until I find a "|" since this issue is with the... (14 Replies)
Discussion started by: stayalive
14 Replies

7. UNIX for Dummies Questions & Answers

Removing empty lines at the end of a Tab-delimited file

I'm trying to remove all of the empty lines at the end of a Tab delimited file. They have no data just tabs. I've tried may things, here are a couple: sed /^\t.\t/d File1 > File2 sed /^\t{44}/d File1 > File2 What am I missing? (9 Replies)
Discussion started by: SirHenry1
9 Replies

8. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of... (6 Replies)
Discussion started by: gimley
6 Replies

9. Shell Programming and Scripting

Script for removing newline character from file

Hi below is my file. cat input.dat 101,abhilash,1000 102,prave en,2000 103,partha,4 000 10 4,naresh,5000 (its just a example file) and my output should be: 101,abhilash,1000 102,praveen,2000 103,partha,4000 104,naresh,5000 below is my code cat input.dat |tr -d '\n' >... (6 Replies)
Discussion started by: abhilash_nakka
6 Replies

10. Shell Programming and Scripting

Removing duplicates from delimited file based on 2 columns

Hi guys,Got a bit of a bind I'm in. I'm looking to remove duplicates from a pipe delimited file, but do so based on 2 columns. Sounds easy enough, but here's the kicker... Column #1 is a simple ID, which is used to identify the duplicate. Once dups are identified, I need to only keep the one... (2 Replies)
Discussion started by: kevinprood
2 Replies
has_ic(3XCURSES)					  X/Open Curses Library Functions					  has_ic(3XCURSES)

NAME
has_ic, has_il - determine insert/delete character/line capability SYNOPSIS
cc [ flag... ] file... -I /usr/xpg4/include -L /usr/xpg4/lib -R /usr/xpg4/lib -lcurses [ library... ] c89 [ flag... ] file... -lcurses [ library... ] #include <curses.h> bool has_ic(void); bool has_il(void); DESCRIPTION
The has_ic() function determines whether or not the terminal has insert/delete character capability. The has_il() function determines whether or not the terminal has insert/delete line capability. RETURN VALUES
The has_ic() function returns TRUE if the terminal has insert/delete character capability and FALSE otherwise. The has_il() function returns TRUE if the terminal has insert/delete line capability and FALSE otherwise. ERRORS
None. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ |MT-Level |Unsafe | +-----------------------------+-----------------------------+ SEE ALSO
libcurses(3XCURSES), attributes(5), standards(5) SunOS 5.10 5 Jun 2002 has_ic(3XCURSES)
All times are GMT -4. The time now is 05:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy