Sponsored Content
Top Forums Shell Programming and Scripting Delimited data contains line feeds where they shouldn't be Post 302507778 by ericdp63 on Thursday 24th of March 2011 06:38:33 PM
Old 03-24-2011
Delimited data contains line feeds where they shouldn't be

I have some data, each record (line) ends with a line feed (\n). Each field is pipe (|) delimited.
Code:
  1|short desc|long text|2001-01-01 01:01
  2|short desc| long
  text |2002-02-02 02:02
  3|short desc|  long  text  | 2003-03-03 03:03
  4|short desc
  |  long  text    | 2004-04-04 04:04

Note that ID #2 and #4 have an extra line feed between the field delimiters. I know that awk can read multi-line data. But the examples I found are for very strictly structured multi-line data, such as addresses. In this case it is only a few rows out of a hundred thousand that are bad. The data source somehow allows for line feeds in some of the text columns. But for my purposes, I don't want/need them.

I need to clean this up before I can load it into a database. The process I use to load into the database will trim any leading and trailing spaces, so they are not an issue for this clean up here. Unfortunately I can't get that to recognize that some of the text columns might also have a \n.

Any ideas? Is there a way to tell awk that I have x number of fields and that it should keep reading until it has that many, ignoring any line feeds until the actual end of the record data?

Thanks
Eric
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

carriage return/line feeds

Hello, I have a file that has got carriage returns in it and I want to take them out. Anyone know how I can do this in a ksh? thanks (4 Replies)
Discussion started by: pitstop
4 Replies

2. Shell Programming and Scripting

Remove line feeds

Hi, I have a fixed width flat file which has 1 as the first char and E as the last character. Some of the records have a carriage return /line feeds . how do I remove them? Let me know. Thanks VSK (8 Replies)
Discussion started by: vsk
8 Replies

3. Shell Programming and Scripting

line feeds in csv

:confused: hi all, i have csv file with three comma separated columns i/p file First_Name, Address, Last_Name XXX, "456 New albany \n newyork, Unitedstates \n 45322-33", YYY\n ZZZ, "654 rifle park \n toronto, canada \n 43L-w3b", RRR\n is there any way i can remove \n (newline) from... (10 Replies)
Discussion started by: gowrish
10 Replies

4. Shell Programming and Scripting

Spurious line feeds

Hi all, I know this is **awfully** general but..... I have a script which does, basically... for file in `find command`; do some stuff more stuff echo '.\c' done I want to output the '.' char just to give an idea of progress. However, it works fine for a while and then I... (2 Replies)
Discussion started by: ajcannon
2 Replies

5. Shell Programming and Scripting

supressing carrige returns/line feeds

Hi gurus I am stripping lots of email addresses from a file with this grep "^To" file.log |awk '{print "1,"$2}' > recipients.out file.log looks something like this: oasndfoasnosf To: person@email.co.uk lsdfjosd sdlfnmsopdfwer dtlghodrgn To: person2@emailsss.com sldfnsdf I... (5 Replies)
Discussion started by: terry2009
5 Replies

6. UNIX for Dummies Questions & Answers

.properties file and new line feeds

Hi, I have a .properties file that a read in some values in an .sh file but everytime I put it out on the server it fails. If I copy and paste the values of the .properties file on my local machine to the .properties file on the server it works just fine. Someone mentioned to see if it has dos... (3 Replies)
Discussion started by: vsekvsek
3 Replies

7. Shell Programming and Scripting

remove line feeds followed by character

Hi everyone, I'm very new to using sed, run through some tutorials and everything but I've hit a problem that I'm unable to solve by myself. I need to remove all linefeeds that are followed by a particular character (in this case a semicolon). So basically, all lines starting with a semicolon... (5 Replies)
Discussion started by: fluffdasheep
5 Replies

8. Shell Programming and Scripting

useless line feeds in ldapsearch output. Howto remove with shell script?

Hi $ cat ad.sh ldapsearorg -x -LLL -h sb1131z.testbadbigcorp.org -D "CN=ADMINZZ,OU=AdminRoles,DC=testbadbigcorp,DC=org" -w "UT3w4f57lll--4...4" -b "OU=Test,DC=testbadbigcorp,DC=org" "(&(&(&(&(objectCategory=person)(objectClass=user)(lockoutTime:1.2.840.113556.1.4.804:=4294967295)))))" dn$... (3 Replies)
Discussion started by: slashdotweenie
3 Replies

9. Shell Programming and Scripting

Removing carriage return/line feeds on multiple lines

I would like to remove carriage returns/line feeds in a text file, but in a specific cadence: Read first line (Header Line 1), remove cr/lf at the end (replace it with a space ideally); Read the next line (Line of Text 2), leave the cr/lf intact; Read the next line, remove the cr/lf; Read... (14 Replies)
Discussion started by: tomr2012
14 Replies

10. Shell Programming and Scripting

How to remove new line characters from data rows in a Pipe delimited file?

I have a file as below Emp1|FirstName|MiddleName|LastName|Address|Pincode|PhoneNumber 1234|FirstName1|MiddleName2|LastName3| Add1 || ADD2|123|000000000 2345|FirstName2|MiddleName3|LastName4| Add1 || ADD2| 234|000000000 OUTPUT : ... (1 Reply)
Discussion started by: styris
1 Replies
Text::Wrap(3pm) 					 Perl Programmers Reference Guide					   Text::Wrap(3pm)

NAME
Text::Wrap - line wrapping to form simple paragraphs SYNOPSIS
Example 1 use Text::Wrap $initial_tab = " "; # Tab before first line $subsequent_tab = ""; # All other lines flush left print wrap($initial_tab, $subsequent_tab, @text); print fill($initial_tab, $subsequent_tab, @text); @lines = wrap($initial_tab, $subsequent_tab, @text); @paragraphs = fill($initial_tab, $subsequent_tab, @text); Example 2 use Text::Wrap qw(wrap $columns $huge); $columns = 132; # Wrap at 132 characters $huge = 'die'; $huge = 'wrap'; $huge = 'overflow'; Example 3 use Text::Wrap $Text::Wrap::columns = 72; print wrap('', '', @text); DESCRIPTION
"Text::Wrap::wrap()" is a very simple paragraph formatter. It formats a single paragraph at a time by breaking lines at word boundries. Indentation is controlled for the first line ($initial_tab) and all subsequent lines ($subsequent_tab) independently. Please note: $ini- tial_tab and $subsequent_tab are the literal strings that will be used: it is unlikley you would want to pass in a number. Text::Wrap::fill() is a simple multi-paragraph formatter. It formats each paragraph separately and then joins them together when it's done. It will destory any whitespace in the original text. It breaks text into paragraphs by looking for whitespace after a newline. In other respects it acts like wrap(). OVERRIDES
"Text::Wrap::wrap()" has a number of variables that control its behavior. Because other modules might be using "Text::Wrap::wrap()" it is suggested that you leave these variables alone! If you can't do that, then use "local($Text::Wrap::VARIABLE) = YOURVALUE" when you change the values so that the original value is restored. This "local()" trick will not work if you import the variable into your own namespace. Lines are wrapped at $Text::Wrap::columns columns. $Text::Wrap::columns should be set to the full width of your output device. In fact, every resulting line will have length of no more than "$columns - 1". It is possible to control which characters terminate words by modifying $Text::Wrap::break. Set this to a string such as '[s:]' (to break before spaces or colons) or a pre-compiled regexp such as "qr/[s']/" (to break before spaces or apostrophes). The default is simply 's'; that is, words are terminated by spaces. (This means, among other things, that trailing punctuation such as full stops or commas stay with the word they are "attached" to.) Beginner note: In example 2, above $columns is imported into the local namespace, and set locally. In example 3, $Text::Wrap::columns is set in its own namespace without importing it. "Text::Wrap::wrap()" starts its work by expanding all the tabs in its input into spaces. The last thing it does it to turn spaces back into tabs. If you do not want tabs in your results, set $Text::Wrap::unexapand to a false value. Likewise if you do not want to use 8-character tabstops, set $Text::Wrap::tabstop to the number of characters you do want for your tabstops. If you want to separate your lines with something other than " " then set $Text::Wrap::seporator to your preference. When words that are longer than $columns are encountered, they are broken up. "wrap()" adds a " " at column $columns. This behavior can be overridden by setting $huge to 'die' or to 'overflow'. When set to 'die', large words will cause "die()" to be called. When set to 'overflow', large words will be left intact. Historical notes: 'die' used to be the default value of $huge. Now, 'wrap' is the default value. EXAMPLE
print wrap(" ","","This is a bit of text that forms a normal book-style paragraph"); AUTHOR
David Muir Sharnoff <muir@idiom.com> with help from Tim Pierce and many many others. perl v5.8.0 2002-06-01 Text::Wrap(3pm)
All times are GMT -4. The time now is 07:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy