How to fix line breaks format text for huge files?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to fix line breaks format text for huge files?
# 8  
Old 01-11-2012
Here you have a perl script, I hope it works now
Code:
$ cat infile
HEADER474687
D1356jkl ugbliuybikb 879870
898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh
kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008
$ cat script.pl
use warnings;
use strict;

die qq[Usage: perl $0 <input-file>\n] unless @ARGV == 1;

my @lines;

while ( <> ) {
        chomp;

        ## First line: Print and read next.
        if ( $. == 1 ) {
                printf qq[%s\n], $_;
                next;
        }

        ## Lines beginning with 'D': Remove previous saved lines from
        ## array to print them and save current line.
        if ( m/\AD/ ) {
                if ( @lines ) {
                        printf qq[%s\n], join qq[ ], splice @lines, 0;
                }
                push @lines, $_;
                next;
        }

        ## Last line: Print saved lines plus current one.
        if ( eof ) {
                printf qq[%s\n%s\n], join( qq[ ], splice( @lines, 0 ) ), $_;
                next;
        }

        ## Lines NOT beginning with 'D': Save them in array to print them
        ## later.
        push @lines, $_;
}
$ perl script.pl infile
HEADER474687
D1356jkl ugbliuybikb 879870 898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 9  
Old 01-12-2012
Hi Birei,

It works like a charm... Thanks a ton for all your help
# 10  
Old 01-12-2012
Hi... a solution using awk
Code:
awk '{
if(NR == 1)
print $0;
else if(NR != 1) 
{
if($0 !~ /^TRAILER/)
{
if($0 ~ /^D/)
{
printf "%s\n", $0 ; next 
}
else
{
}
}
else
{
last=$0;
}
}
}
END {
print last;
}' f5

Regards,
A!

Last edited by Franklin52; 01-12-2012 at 04:51 AM.. Reason: Please use code tags for code and data samples and indent your code, thank you
This User Gave Thanks to archimedes For This Post:
# 11  
Old 01-12-2012
This code works great... thanks a lot Archimedes!

---------- Post updated at 09:18 AM ---------- Previous update was at 09:03 AM ----------

Hi Archimedes,

At the first look, I could not find out the flaw in the output. Actually, it is entirely removing the broken line. However, my requirement is that the broken line should be appended to the end of the last line after a space.

As for example, if the input file is like...
Code:
HEADER474687
D1356jkl ugbliuybikb 879870
898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh
kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

The output should be..
Code:
HEADER474687
D1356jkl ugbliuybikb 879870 898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

However, with the awk code, it is giving me...
Code:
HEADER474687
D1356jkl ugbliuybikb 879870
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh
D8796870 kjlhuigiyig
TRAILER0008

---------- Post updated at 11:47 AM ---------- Previous update was at 09:18 AM ----------

---------- Post updated at 11:52 AM ---------- Previous update was at 11:47 AM ----------

Hi Birei,
I was just curious to see if your sed script was not working in ksh shell alone. So I tried it in Bash shell and boom - it's just perfect!
In that case, i guess it should be my ksh shell which is throwing this sed: command garbled error.

Wondering if there is any other way to code the same sed script in ksh... Smilie
# 12  
Old 01-12-2012
Awk solution:
Code:
awk '
NR==1{print}  
/^TRAILER/{if(a) print a; print ; exit} 
NR>1 && !/^D/{print a" "$0; a=""}
/^D/{ if(a) print a; a=$0 }' infile

This User Gave Thanks to mirni For This Post:
# 13  
Old 01-12-2012
@kikionline:

Try if this way works for any shell. It's same program but run from a file.
Code:
$ cat script.sed
1 { 
        p
        b
}

/^D/ {
        x
        s/^\n//
        s/\n/ /g
        /./ p
        b
}

/^D/! {
        $! {
                H
                b
        }
}

$ {
        H
        x
        s/^\n//
        p
}
$ sed -n -f script.sed infile
HEADER474687
D1356jkl ugbliuybikb 879870 898976098 9687680
D77656757 uhgliug liygoiygig
D98679hjh kjbgihguygfu ugliyh kbygfluy9809
D8796870 kjlhuigiyig
TRAILER0008

Regards,
Birei
# 14  
Old 01-13-2012
Hi Mirni,

The code gives the perfect output. Thanks a lot.

---------- Post updated at 08:09 AM ---------- Previous update was at 08:06 AM ----------

Hi Birei,

I am getting the following error when running the sed command from a file:

Code:
# sed -n -f script.sed temp.txt
Unrecognized command: /^D/! {

Thanks anyway Birei. I got the alternative way by using awk and perl. Thanks a lot for all your help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to add line breaks to perl command with large text in single quotes?

Below code extracts multiple field values from XML into array and prints all in one line. perl -nle '@r=/(?: jndiName| authDataAlias| value| minConnections| maxConnections| connectionTimeout| name)="(+)/g and print join ",",$ENV{tIPnSCOPE},$ENV{pr ovider},$ENV{impClassName},@r' server.xml ... (4 Replies)
Discussion started by: kchinnam
4 Replies

2. UNIX for Dummies Questions & Answers

Page breaks and line breaks

Hi All, Need an urgent solution to an issue . We have created a ksh file or shell script which generates 1 DAT file. the DAT file contains extract of a select statement . Now the issue is , when we are executing the ksh file , the output is coimng with page breaks and line breaks . We have... (4 Replies)
Discussion started by: Ayaskant
4 Replies

3. UNIX for Dummies Questions & Answers

Convert UNIX text file in Windows to recognize line breaks

Hi all, I have some text files that I prepared in vi some time ago, and now I want to open and edit them with Windows Notepad. I don't have a Unix terminal at the moment so I need to do the conversion in Windows. Is there a way to do this? Or just reinsert thousands of line breaks again :eek: ? (2 Replies)
Discussion started by: frys_hp
2 Replies

4. Windows & DOS: Issues & Discussions

Convert UNIX text file in Windows to recognize line breaks

Hmmm I think I found the correct subforum to ask my question... I have some text files that I prepared in vi some time ago, and now I want to open and edit them with Windows Notepad. I don't have a Unix terminal at the moment so I need to do the conversion in Windows. Is there a way to do this?... (1 Reply)
Discussion started by: frys_hp
1 Replies

5. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

6. UNIX for Dummies Questions & Answers

VIM search and replace with line breaks in both the target and replacement text

Hi, Ive spent ages trying to find an explanation for how to do this on the web, but now feel like I'm :wall: I would like to change each occurence (there are many within my script) of the following: to in Vim. I know how to search and replace when it is just single lines... (2 Replies)
Discussion started by: blueade7
2 Replies

7. Shell Programming and Scripting

Comparing 2 huge text files

I have this 2 files: k5login sanwar@systems.nyfix.com jjamnik@systems.nyfix.com nisha@SYSTEMS.NYFIX.COM rdpena@SYSTEMS.NYFIX.COM service/backups-ora@SYSTEMS.NYFIX.COM ivanr@SYSTEMS.NYFIX.COM nasapova@SYSTEMS.NYFIX.COM tpulay@SYSTEMS.NYFIX.COM rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies

8. UNIX for Advanced & Expert Users

Best way to search for patterns in huge text files

I have the following situation: a text file with 50000 string patterns: abc2344536 gvk6575556 klo6575556 .... and 3 text files each with more than 1 million lines: ... 000000 abc2344536 46575 0000 000000 abc2344536 46575 4444 000000 abc2344555 46575 1234 ... I... (8 Replies)
Discussion started by: andy2000
8 Replies

9. Shell Programming and Scripting

Fix the breaks

The file FTP'd got few breaks and the data looks like: ABCTOM NYMANAGER ABCDAVE NJ PROGRAMMER ABCJIM CTTECHLEAD ABCPETERCA HR and i want the output like: ABCTOM NYMANAGER ABCDAVE NJPROGRAMMER ABCJIM CTTECHLEAD ABCPETERCAHR can you please help me in writing the shell... (8 Replies)
Discussion started by: rlmadhav
8 Replies

10. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:) (5 Replies)
Discussion started by: madoatz
5 Replies
Login or Register to Ask a Question