How to remove page breaks from a flat file???


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to remove page breaks from a flat file???
# 1  
Old 04-16-2007
How to remove page breaks from a flat file???

Hi All,

I get a flat file with its last field data splitting onto a new line.I got this program from Vgersh which when run would cancatenate the split data back to the end of the previous records.But this program fails when it encounters a page break between the split data and the previous record.So if these page breaks are removed,then the program works fine.

Program

Code:
#!/usr/bin/ksh

BEGIN {
  FS=OFS="|"

  FLD_max=11
  
  stderr="cat 2>&1" 
}
(fld + NF-1) > FLD_max {
       if (fld == FLD_max)
          print rec
       else
          printf("Incomplete record: [%d] :: [%s]\n", FNR, rec) | stderr
       rec=$0; fld=NF;next
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec $0 : $0; fld+=(NF-1);next }
{rec=$0; fld=NF}
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

Input...

000000|Apr 14 2007 7:59:58:376AM| |ASDFASFSDA |000000|0|0|0|3111|SDFSDF|æPP:?µß?
/*there is a page break here(a kind of straight line shown in Ultra Edit,but not showing here.This needs to be removed*/
ÚÐý?K
000004|Apr 14 2007 7:59:58:790AM| |ASFASFAS|000000|0|0|0|111|DSFSDF|?Í¢º²c?
ÄÜ?Îd
000000|Apr 14 2007 7:59:59:970AM| |ASFAFASA |00000|0|0|0|1111|SFDSFSD|?ÒÎקóR¢¢Ò RS?
00000|Apr 14 2007 8:00:01:693AM| |ASFSAFAS |000000|0|0|0|111SDFSDF|Âh>`= Û?èäN?´ÈH
000000|Apr 14 2007 8:00:02:350AM| |ASFAFA|00000|0|0|0111|SDFSD1|?®
???ø»à濦«?
000000|Apr 14 2007 8:00:02:700AM| |ASFSAFASSA |00000|0|0|0|9964|SDFSD|3`
á"Ô:`ÓÏI¤?9V?

Output:

000000|Apr 14 2007 7:59:58:376AM| |ASDFASFSDA |000000|0|0|0|3111|SDFSDF|æPP:?µß?ÚÐý?K
000004|Apr 14 2007 7:59:58:790AM| |ASFASFAS|000000|0|0|0|111|DSFSDF|?Í¢º²c?
ÄÜ?Îd000000|Apr 14 2007 7:59:59:970AM| |ASFAFASA |00000|0|0|0|1111|SFDSFSD|?ÒÎקóR¢¢ÒRS?
00000|Apr 14 2007 8:00:01:693AM| |ASFSAFAS |000000|0|0|0|111SDFSDF|Âh>`=Û?èäN?´ÈH
000000|Apr 14 2007 8:00:02:350AM| |ASFAFA|00000|0|0|0111|SDFSD1|?®???ø»à濦«?
000000|Apr 14 2007 8:00:02:700AM| |ASFSAFASSA |00000|0|0|0|9964|SDFSD|3`á"Ô:`ÓÏI¤?9V?

Thanks
Kumar

Last edited by vino; 04-16-2007 at 09:42 AM.. Reason: Please put your code within code tags.
# 2  
Old 04-16-2007
_If I understand correctly the requirement_ with GNU Awk (on Linux, for example) you could try something like this (if all the records start with 0):

Code:
awk '$1=$1' RS="\n0"  inputfile

# 3  
Old 04-16-2007
Quote:
Originally Posted by radoulov
_If I understand correctly the requirement_ with GNU Awk (on Linux, for example) you could try something like this (if all the records start with 0):

Code:
awk '$1=$1' RS="\n0"  inputfile

The records doesnt start with 0.In order to mask the actual data,i just put some dummy values while maintaining the structure of the records.The record start with two numeric formats...like 100**** and 99****

Regards,
Kumar
# 4  
Old 04-16-2007
So, what about (with GNU Awk):

Code:
awk '$1=$1{print $0 RT}' ORS= RS="\n(100|99)" inputfile

# 5  
Old 04-16-2007
Code:
#!/usr/bin/nawk -f

BEGIN {
  FS=OFS="|"

  FLD_max=11

  FF=sprintf("\f")
  
  stderr="cat 2>&1" 
}
$0 ~ FF { gsub(FF, ""); $1=$1 }

(fld + NF-1) > FLD_max {
       if (fld == FLD_max)
          print rec
       else
          printf("Incomplete record: [%d] :: [%s]\n", FNR, rec) | stderr
       rec=$0; fld=NF;next
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec $0 : $0; fld+=(NF-1);next }
{rec=$0; fld=NF}
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

# 6  
Old 04-17-2007
Quote:
Originally Posted by vgersh99
Code:
#!/usr/bin/nawk -f

BEGIN {
  FS=OFS="|"

  FLD_max=11

  FF=sprintf("\f")
  
  stderr="cat 2>&1" 
}
$0 ~ FF { gsub(FF, ""); $1=$1 }

(fld + NF-1) > FLD_max {
       if (fld == FLD_max)
          print rec
       else
          printf("Incomplete record: [%d] :: [%s]\n", FNR, rec) | stderr
       rec=$0; fld=NF;next
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec $0 : $0; fld+=(NF-1);next }
{rec=$0; fld=NF}
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

vgersh99

You are an absolute genius,i feel.It works really great.Thank you so much.

Regards,
Kumar
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove first NULL Character in Flat File

We have a flat file with below data : ^@^@^@^@00000305^@^@^@^@^@^@430^@430^@^@^@^@^@^@^@^@^@09079989530As we can see ^@ is Null character in this file I want to remove only the first few null characters before string 00000305 How can we do that, any idea. I want a new file without first few... (5 Replies)
Discussion started by: simpltyansh
5 Replies

2. UNIX for Dummies Questions & Answers

Page breaks and line breaks

Hi All, Need an urgent solution to an issue . We have created a ksh file or shell script which generates 1 DAT file. the DAT file contains extract of a select statement . Now the issue is , when we are executing the ksh file , the output is coimng with page breaks and line breaks . We have... (4 Replies)
Discussion started by: Ayaskant
4 Replies

3. UNIX for Advanced & Expert Users

Remove duplicates in flat file

Hi all, I have a issues while loading a flat file to the DB. It is taking much time. When analyzed i found out that there are duplicates entry in the flat file. There are 2 type of Duplicate entry. 1) is entire row is duplicate. ( i can use sort | uniq) to remove the duplicated entry. 2) the... (4 Replies)
Discussion started by: samjoshuab
4 Replies

4. Shell Programming and Scripting

script for adding page number before page breaks

Hi, If there is an expert that can help: I have many txt files that are produced from pdftotext that include page breaks the page breaks seem to be unix style hex 0C. I want to add page numbers before each page break as in : Page XXXX Regards antman (9 Replies)
Discussion started by: antman
9 Replies

5. Shell Programming and Scripting

Remove line breaks in csv file using shell script

Hi All, I've a csv file in which the record is getting break into 1 line or more than one line. I want to combine those splits into one line and remove the unwanted character existing in the record i.e. double quote symbol ("). The line gets break only when the record contains double... (4 Replies)
Discussion started by: rajak.net
4 Replies

6. UNIX for Dummies Questions & Answers

How to remove numeric characters in the flat file

HI, can any one help me please .. i have flat file like qwer123rt ass3242ccf jjk654 kjh838ppp nhdg453ok hdkk34 i want remove numeric characters in the flat file i want output like this qwerrt assccf jjk kjhppp nhdgok hdkk help me... (4 Replies)
Discussion started by: rafimd1985
4 Replies

7. UNIX for Dummies Questions & Answers

how to remove the first line from a flat file ?

Hi, I want to remove the first line from a flat file using unix command as simple as possible. Can anybody give me a hand ? Thanks in advance. xli (21 Replies)
Discussion started by: xli
21 Replies

8. Shell Programming and Scripting

Help on page breaks

Hi, I am new to Unix (AIX). I have a header (in a text file) that needs to be wrtitten on all the pages of a result file (text file). After the header is written, data needs to be read from a file A(text file) and inserted to the result file. If the number of lines reaches 80 in a page, page... (1 Reply)
Discussion started by: simhasuri
1 Replies

9. Programming

Page Breaks

Hi, I have a program in Pro*c when I run it I have no problem with the output but when it runs via the at command and except the output has page breaks every 66 lines. I don't want those page breaks to be in the output. any idea? (9 Replies)
Discussion started by: rama71
9 Replies

10. Shell Programming and Scripting

Insert page breaks into .csv file

I have large .csv files that I need to get page breaks into. I am taking comma delimited files of over a million records and putting them into a pdf file. Is there a way, using sed or otherwise, to insert some type of page break character into my file? (2 Replies)
Discussion started by: welsht
2 Replies
Login or Register to Ask a Question