Removing inserted newlines from a fileld of fixed width file.

08-18-2009

Registered User

3, 0

Join Date: Aug 2009

Last Activity: 19 August 2009, 5:21 PM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Removing inserted newlines from a fileld of fixed width file.

Hi champs!

I have a fixed width file in which the records appear like this

Code:

11111 <fixed spaces such as 6> description for 11111 <fixed spaces such as 6> some more field to the record of 11111
22222 <fixed spaces such as 6> description for 22222 <fixed spaces such as 6> some more field to the record of 22222
33333 <fixed spaces such as 6> description 
for 33333 <fixed spaces such as 6> some more field to the record of 33333
44444 <fixed spaces such as 6> description for 44444 <fixed spaces such as 6> some more field to the record of 44444

As you see, the record for 33333 is split into two records because of newline inserted in description of 33333. I want these extraneous newlines from description field to be removed for records where ever they appear in the file.
Clues can be : check the file for length 11 -32 for each record and if newline is present strip it off.
Any other solution is welcome too.
I want the output to be :

Code:

11111 <fixed spaces such as 6> description for 11111 <fixed spaces such as 6> some more field to the record of 22222
22222 <fixed spaces such as 6> description for 22222 <fixed spaces such as 6> some more field to the record of 22222
33333 <fixed spaces such as 6> description for 22222 <fixed spaces such as 6> some more field to the record of 33333
44444 <fixed spaces such as 6> description for 44444 <fixed spaces such as 6> some more field to the record of 44444

- it is not fixed that line break will appear after 'description' only..it can appear anywhere in the second field.But it is sure that it will appear in second field only, incase it appears.

- This is just the sample record for understanding, code should not be dependent on it.The code can be dependent on positioning if required.
It is a fixed width file that means each filed is identified by length in the record.

Please let me know if you need more clarification.

Last edited by enigma_1; 08-18-2009 at 06:55 PM.. Reason: code tags, PLEASE!

enigma_1

View Public Profile for enigma_1

Find all posts by enigma_1

08-18-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

---------- Post updated at 05:54 PM ---------- Previous update was at 05:38 PM ----------

something to start with...

'len' is a known/expected length of ALL the records (assuming they are of the same length) - defaulted to '73'.

Assumption: there's only ONE extra new-line per 'broken' record.
nawk -f enigma.awk myFile
OR
nawk -v len=63 -f enigma.awk myFile

enigma.awk:

Code:

BEGIN {
  len=(!len)?73:len
}
length < len {
   if (length(s)) { print s OFS $0;s=""}
   else s=$0
   next
}
1

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

08-19-2009

Registered User

3, 0

Join Date: Aug 2009

Last Activity: 19 August 2009, 5:21 PM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks vgersh !!

The code you provided worked for me for the records broken into two.
But I have some more problems. Hope you can help.
As ytou mentioned in your assumption that record is divided into two records only.
Unfortunately In my file I have just one record which is divided into three records.

Sample:

Code:

33333 <fixed spaces such as 6> description 
for 
33333 <fixed spaces such as 6> some more field to the record of 33333

which needs to be :

Code:

33333 <fixed spaces such as 6> description for 33333 <fixed spaces such as 6> some more field to the record of 33333

Can we have some modification to the enigma.awk program to take care of record break to three records?? If I can ask for more, Can we have the code to take care of any level of record break heirarchy for each record?
I guess you need some identification for each records start.

In my file each new record starts from column(length)= 16. If any record starts from before length 16, it is continuation of previous record.

Thank you once again!

enigma_1

View Public Profile for enigma_1

Find all posts by enigma_1

08-19-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

enigma.awk:

Code:

BEGIN {
  len=(!len)?73:len
}
length < len {
   if (length(s)) { s=s OFS $0}
   else s=$0
   if (length(s) == len) { print s; s=""}
   next
}
1

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

08-19-2009

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

What is the length of a record?

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

08-19-2009

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

I might be wrong, but isn't this the very type of problems the "fmt" simple optimal formatter tool was created for?

"fmt -w <your desired line length here>" should do the trick.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

08-19-2009

Moderator

8,825, 1,112

Join Date: Feb 2005

Last Activity: 23 August 2021, 11:26 AM EDT

Location: Foxborough, MA

Posts: 8,825

Thanks Given: 579

Thanked 1,112 Times in 1,003 Posts

Quote:

Originally Posted by bakunin

I might be wrong, but isn't this the very type of problems the "fmt" simple optimal formatter tool was created for?

"fmt -w <your desired line length here>" should do the trick.

I hope this helps.

bakunin

good tip - forgot about fmt - thanks.

vgersh99

View Public Profile for vgersh99

Find all posts by vgersh99

Shell Programming and Scripting

Removing inserted newlines from a fileld of fixed width file.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Alter Fixed Width File

Discussion started by: vinus

2. UNIX for Dummies Questions & Answers

Length of a fixed width file

Discussion started by: Amrutha24

3. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Discussion started by: saj

4. Shell Programming and Scripting

Comparing two fixed width file

Discussion started by: anshul_er

5. Shell Programming and Scripting

Fixed-Width file from Oracle

Discussion started by: Amit.Sagpariya

6. Shell Programming and Scripting

Removing \n within a fixed width record

Discussion started by: CKT_newbie88

7. UNIX Desktop Questions & Answers

Help with Fixed width File Parsing

Discussion started by: sate911

8. Shell Programming and Scripting

Changing particular field in fixed width file

Discussion started by: dsravan

9. Shell Programming and Scripting

adding delimiter to a fixed width file

Discussion started by: sumeet

10. UNIX for Dummies Questions & Answers

Fixed Width file using AWK

Discussion started by: alok.benjwal