Delimited records splitted into different lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delimited records splitted into different lines
# 1  
Old 05-05-2016
Delimited records splitted into different lines

Hi

I am using delimited sequence file. Delimter we are using is pipe .But for some of the records for one of the column the values are getting split into different lines as shown below

Code:
"113"|"0155"|"2016-04-27 07:59:04"|"1930"|"TEST@TEST"|"2016-04-27 11:04:04.357000000"|"BO"|"Hard BO"|"10"|"5.1.0 e This is a permanent error. Please verify the address(es) and try again.
<TEST>:
123.123.12 does not like recipient.
Remote host said: 123 Address rejected test@test
Giving up on 1200.
--- "|"I"|"191"|"212"|"DAM"|"PIl"

But my expectation is it should come in sing line before processing the file

Code:
"113"|"0155"|"2016-04-27 07:59:04"|"1930"|"TEST@TEST"|"2016-04-27 11:04:04.357000000"|"BO"|"Hard BO"|"10"|"5.1.0 e This is a permanent error. Please verify the address(es) and try again.<TEST>:123.123.12 does not like recipient.Remote host said: 123 Address rejected test@testGiving up on 1200.--- "|"I"|"191"|"212"|"DAM"|"PIl"

Please help me on this
# 2  
Old 05-05-2016
How are those files produced?
# 3  
Old 05-05-2016
Its a file which we are getting from different source.Not very sure how they are getting produced.Usually the error comes for the 10h column data
# 4  
Old 05-07-2016
could some one please help on this
# 5  
Old 05-07-2016
Not much to go on. You have spurious line feeds in the middle of the line. I am guessing.
Assumption: if the FS = "|" and the first column is a quoted number, this is the correct start of the the line. Anything else is bad.

Is this correct? And is the file from a Windows application or changed by anything like windows FTP? Why I am asking -> because the carriage control may be messed up as well.
# 6  
Old 05-07-2016
If you provide a clear description of your problem (instead of just one sample of a problem with no description), you will be more likely to get a response.

What are the line delimiters in your input file? UNIX single newline character delimiters or DOS carriage-return/newline character pair delimiters? Are the delimiters on split lines the same as in a correctly formatted line? Or, is the delimiter on split lines different from the delimiter on correctly formatted lines? What delimiters do you need in your output file?

How are we supposed to know when a line is complete?

Is your verification program supposed to know that there should be a specific number of fields in each input line? Is that number of fields the same for every file your verification program will process?

Is an invalid input line ALWAYS split between a pair of double quotes? Can we assume that a line needs to be combined with the next line from the input file if and only if an input line does not end with a double quote character?

Are invalid records always split by converting a space to a newline? How do we know whether or not a character needs to be added when lines are joined? Is a space character ALWAYS supposed to be added when lines are joined?

What operating system and shell are you using?

What have you tried to solve this problem on your own?
# 7  
Old 05-08-2016
The standard way of tackling this would be something like this:
Code:
awk -F\| 'NF<15{while(getline s>0) $0=$0 s}1'  file

Which should work with your sample..

However as others have pointed out, unless you provide more information it will be difficult to tell if this would be a solution to your problem..


---
Note that this will not work if the newlines appear in the very last field.

Also note that the csv format allows for newlines within quoted fields, so the sample you posted seems to within specification, so by joining the lines you are effectively changing the content by removing the newlines..

If you want to process the file you do not need remove the newlines in order to process the file. For example to print field 10 (without the enclosing double quotes), you could do something like this:
Code:
$> awk -F\| 'NF<15{while(getline s>0) $0=$0 RS s}{gsub(/^"|"$/,x,$10); print $10}' file
5.1.0 e This is a permanent error. Please verify the address(es) and try again.
<TEST>:
123.123.12 does not like recipient.
Remote host said: 123 Address rejected test@test
Giving up on 1200.
--- 
$>


Last edited by Scrutinizer; 05-08-2016 at 08:35 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need help on an old post - How to convert a comma delimited string to records or lines of text?

Hi, Apologies in advance to the moderator if I am posting this the wrong way. I've searched and found the solution to an old post but as it is a very old post, I don't see an option to update it with additional question. The question I have is in relation to the following post: How to... (6 Replies)
Discussion started by: newbie_01
6 Replies

2. UNIX for Dummies Questions & Answers

How to convert a comma delimited string to records or lines of text?

Hi, I am not sure if I've posted this question before. Anyway, I previously asked about converting lines of text into a comma delimited string. Now I am needing to do the other way around ... :( :o Can anyone advise how is this possible? Example as below: Converting records/lines to... (2 Replies)
Discussion started by: newbie_01
2 Replies

3. Shell Programming and Scripting

Script to match strings that sometimes are splitted in 2 lines

Hello to all, I have an hexdump -C format as below: 31 54 47 55 48 4c 52 31 5f 52 31 32 31 31 32 ff 44 00 00 0E 01 32 14 56 42 17 47 48 0f ff ff ff 44 00 00 01 32 14 56 00 23 83 95 2f 42 17 47 48 00 0f ff ff 00 15 00 0a 48 00 01 5a 00 02 17 00 00 2f 00 00 30 00 00 31 00 00 ff 34 ff 44 00... (23 Replies)
Discussion started by: Ophiuchus
23 Replies

4. UNIX for Advanced & Expert Users

Wanted best way to validate delimited file records

actually i post about this issue before but many folkz miss-understood with my quesion, We are checking for the delimited file records validation Delimited file will have data like this: Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg| Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|... (3 Replies)
Discussion started by: Seshendranath
3 Replies

5. UNIX for Dummies Questions & Answers

Removing empty lines at the end of a Tab-delimited file

I'm trying to remove all of the empty lines at the end of a Tab delimited file. They have no data just tabs. I've tried may things, here are a couple: sed /^\t.\t/d File1 > File2 sed /^\t{44}/d File1 > File2 What am I missing? (9 Replies)
Discussion started by: SirHenry1
9 Replies

6. Shell Programming and Scripting

Create new lines using a delimited string.

Hi I have a text file called 'fileA' which contains the follwoing line examples 01:rec1:25,50,75,100 02:rec2:30,60 03:rec3:20,40 I would like to create a new file where each of the comma separated values appears on a new line but prefixed with the first two fields e.g. 01:rec1:25... (3 Replies)
Discussion started by: mackmb
3 Replies

7. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Hi, I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number. File1 ==== name,desgnation,doj,project #header#... (7 Replies)
Discussion started by: machomaddy
7 Replies

8. Shell Programming and Scripting

how to Insert values in multiple lines(records) within a pipe delimited text file in specific cols

this is Korn shell unix. The scenario is I have a pipe delimited text file which needs to be customized. say for example,I have a pipe delimited text file with 15 columns(| delimited) and 200 rows. currently the 11th and 12th column has null values for all the records(there are other null columns... (4 Replies)
Discussion started by: vasan2815
4 Replies

9. UNIX for Dummies Questions & Answers

Extract records by column value - file non-delimited

the data in my file is has no delimiters. it looks like this: H52082320024740010PH333200612290000930 0.0020080131 D5208232002474000120070306200703060580T1502 TT 1.00 H52082320029180003PH333200702150001 30 100.0020080205 D5208232002918000120070726200707260580T1502 ... (3 Replies)
Discussion started by: jclanc8
3 Replies

10. Shell Programming and Scripting

Delete Duplicate records from a tilde delimited file

Hi All, I want to delete duplicate records from a tilde delimited file. Criteria is considering the first 2 fields, the combination of which has to be unique, below is a sample of records in the input file 1620000010338~2446694087~0~20061130220000~A00BCC1CT... (5 Replies)
Discussion started by: irshadm
5 Replies
Login or Register to Ask a Question