Delimited records splitted into different lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delimited records splitted into different lines
# 8  
Old 05-08-2016
Above works well for the one record sample in post#1 as indicated. If you want to apply it to multiple records, try a small adaption (still far from bullet proof):
Code:
awk -F\| '{while (NF < 15 && getline s>0) $0=$0 RS s}1'  file

And, as stated before in this thread, a more precise and detailed specification would help to taylor a solution for you
This User Gave Thanks to RudiC For This Post:
# 9  
Old 05-08-2016
Yes indeed that is just silly. Thanks RudiC. That is of course how it should be (but then RS should not be there)

Code:
awk -F\| '{while (NF < 15 && getline s>0) $0=$0 s}1' file

or the 2nd example with RS maintained:

Code:
awk -F\| '{while(NF<15 && getline s>0) $0=$0 RS s}{gsub(/^"|"$/,x,$10); print $10}'


Last edited by Scrutinizer; 05-08-2016 at 08:36 AM..
# 10  
Old 05-08-2016
If the "records" are not only defined as an amount of a fixed number of fields but also being of fixed length (each record has the same number of characters) and, of course, if the line breaks are real UNIX-linefeeds instead of may DOS-CR/LFs or whatever - then this is the standard textbook application for the fmt utility, no?

Code:
fmt -<number of characters the record is supposed to have> /path/to/file

I hope this helps.

bakunin
# 11  
Old 05-08-2016
Hi All

Sorry for not providing the correct information.Please find the details below of my input file.I am getting encrypted file where I used to decrypt and do a DOS2UNIX conversion.Once I did that I will load it using my job

File Details
-------------
1. Line Delimiter is \n
2. Record Delimiter is |
3. Line Delimiter on the split lines is same as in a correctly formatted line.
4. In my output file I need the same line delimiter as comes in the input file which is \n
5. The number of fields are same for all the records which is 15 and all the column values are coming with double quotes.
6. Is an invalid input line ALWAYS split between a pair of double quotes? Can we assume that a line needs to be combined with the next line from the input file if and only if an input line does not end with a double quote character?--We cant do like that as all the records are coming with double quotes.
7. Are invalid records always split by converting a space to a newline? How do we know whether or not a character needs to be added when lines are joined? Is a space character ALWAYS supposed to be added when lines are joined?--Space dose not matter, as I just want the split lines to append to a single line

Hope this information will help.I am pretty new to this process,so please let me know if you need more details
# 12  
Old 05-08-2016
If the broken records always break inside a double quoted string as in your sample input, an easy fix is just to use:
Code:
awk '{printf("%s%s", $0, (substr($0, length, 1) == "\"") ? "\n" : "")}' file

If the broken records sometimes break immediately before or after a pipe symbol (as long as it isn't after the 14th pipe symbol, you can use most of the other suggestions in this thread.

You didn't answer my question about operating system and shell. So, if you're using a Solaris/SunOS operating system, you'll need to change the above suggestion to use /usr/xpg4/bin/awk or nawk instead of awk.
# 13  
Old 05-08-2016
Save as cleaner.pl
Run as perl cleaner.pl ginrkf.file > ginrkf.filtered
Code:
#!/usr/bin/env perl

use strict;
use warnings;

my $len = 15;
{
    local $/ = '|';
    my $field;
    while(<>) {
        /\|/ and ++$field;
        s/\n//g unless $field == $len;
        print;
        $field = 0 if $field == $len;
    }
}


Last edited by Aia; 05-08-2016 at 11:32 PM..
# 14  
Old 05-10-2016
Thanks a lot for all the help.Below code is working for my issue

Code:
awk '{printf("%s%s", $0, (substr($0, length, 1) == "\"") ? "\n" : "")}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need help on an old post - How to convert a comma delimited string to records or lines of text?

Hi, Apologies in advance to the moderator if I am posting this the wrong way. I've searched and found the solution to an old post but as it is a very old post, I don't see an option to update it with additional question. The question I have is in relation to the following post: How to... (6 Replies)
Discussion started by: newbie_01
6 Replies

2. UNIX for Dummies Questions & Answers

How to convert a comma delimited string to records or lines of text?

Hi, I am not sure if I've posted this question before. Anyway, I previously asked about converting lines of text into a comma delimited string. Now I am needing to do the other way around ... :( :o Can anyone advise how is this possible? Example as below: Converting records/lines to... (2 Replies)
Discussion started by: newbie_01
2 Replies

3. Shell Programming and Scripting

Script to match strings that sometimes are splitted in 2 lines

Hello to all, I have an hexdump -C format as below: 31 54 47 55 48 4c 52 31 5f 52 31 32 31 31 32 ff 44 00 00 0E 01 32 14 56 42 17 47 48 0f ff ff ff 44 00 00 01 32 14 56 00 23 83 95 2f 42 17 47 48 00 0f ff ff 00 15 00 0a 48 00 01 5a 00 02 17 00 00 2f 00 00 30 00 00 31 00 00 ff 34 ff 44 00... (23 Replies)
Discussion started by: Ophiuchus
23 Replies

4. UNIX for Advanced & Expert Users

Wanted best way to validate delimited file records

actually i post about this issue before but many folkz miss-understood with my quesion, We are checking for the delimited file records validation Delimited file will have data like this: Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg| Aaaa|sdfhxfgh|sdgjhxfgjh|sdgjsdg|sgdjsg|... (3 Replies)
Discussion started by: Seshendranath
3 Replies

5. UNIX for Dummies Questions & Answers

Removing empty lines at the end of a Tab-delimited file

I'm trying to remove all of the empty lines at the end of a Tab delimited file. They have no data just tabs. I've tried may things, here are a couple: sed /^\t.\t/d File1 > File2 sed /^\t{44}/d File1 > File2 What am I missing? (9 Replies)
Discussion started by: SirHenry1
9 Replies

6. Shell Programming and Scripting

Create new lines using a delimited string.

Hi I have a text file called 'fileA' which contains the follwoing line examples 01:rec1:25,50,75,100 02:rec2:30,60 03:rec3:20,40 I would like to create a new file where each of the comma separated values appears on a new line but prefixed with the first two fields e.g. 01:rec1:25... (3 Replies)
Discussion started by: mackmb
3 Replies

7. Shell Programming and Scripting

Print records which do not have expected number of fields in a comma delimited file

Hi, I have a comma (,) delimited file, in which few fields are enclosed with in double quotes " ". I have to print the records in the file which donot have expected number of field with the line number. File1 ==== name,desgnation,doj,project #header#... (7 Replies)
Discussion started by: machomaddy
7 Replies

8. Shell Programming and Scripting

how to Insert values in multiple lines(records) within a pipe delimited text file in specific cols

this is Korn shell unix. The scenario is I have a pipe delimited text file which needs to be customized. say for example,I have a pipe delimited text file with 15 columns(| delimited) and 200 rows. currently the 11th and 12th column has null values for all the records(there are other null columns... (4 Replies)
Discussion started by: vasan2815
4 Replies

9. UNIX for Dummies Questions & Answers

Extract records by column value - file non-delimited

the data in my file is has no delimiters. it looks like this: H52082320024740010PH333200612290000930 0.0020080131 D5208232002474000120070306200703060580T1502 TT 1.00 H52082320029180003PH333200702150001 30 100.0020080205 D5208232002918000120070726200707260580T1502 ... (3 Replies)
Discussion started by: jclanc8
3 Replies

10. Shell Programming and Scripting

Delete Duplicate records from a tilde delimited file

Hi All, I want to delete duplicate records from a tilde delimited file. Criteria is considering the first 2 fields, the combination of which has to be unique, below is a sample of records in the input file 1620000010338~2446694087~0~20061130220000~A00BCC1CT... (5 Replies)
Discussion started by: irshadm
5 Replies
Login or Register to Ask a Question