If it is a "fixed width file" as you said, then you can identify the broken ones by the deviating number of characters in it.
The regexp:
Code:
.
matches exactly a single character. So you just have to repeat this the number of times you expect the line to be wide to find all non-broken lines:
Code:
.\{n\}
where "n" is the number of characters. Now you just reverse the search by using i.e "grep -v":
Code:
grep -v '^.\{n\}$' /path/to/inputfile
But you probably want to correct this circumstance, so "grep" is not the right tool - but "sed" is, and it works with the same syntax:
Code:
sed ': start
s/\n//
/^.\{n\}$/! {
N
b start
}
p' /path/to/inputfile
First every line has line feeds deleted. Then every line NOT consisting of n characters (the "!") - that is: the broken ones - will cause the next line to be read in and added to the line before. Then control branches to the beginning of the script again. If the line still is too short, even the next line will be read in, etc.., until the correct line length is reached. Then the line is printed in the last statement.
Gurus,
I am struggling with a issue and thought I could use some of your expertise.
Need Help with this
I have a flat file that has millions of records
24|john|account ~ info |56|
25|kuo|account ~ journal |58|
27|kim|account ~ journal |59|
28|San|account ~
journal |60|... (3 Replies)
I have a plain test file with a delimeter ''. In this file some lines are broken into two. The first part of these broken line will have 6 columns and the second part will have 4. These broken lines will be consicutive.
I want to join the two consicutive lines which are having 6 fields and 4... (8 Replies)
Hi everybody,
i have this biological situation to fix:
> Id.1
ACGTACANNNNNNNNNNNACGTGCNNNNNNNACTGTGGT
>Id.2
ACGGGT
>Id.3
ACGTNNNNNNNNNNNNACTGGGGG
>Id.4
ACGTGCGNNNNNNNNGGTCANNNNNNNNCGTGCAAANNNNN
........
....
These are nucleotidic sequences with some "NNNN..." always of the same... (4 Replies)
i have a file of this type:
SEAT-RES¦$D0317.PBOUC32A.GURD3591 ¦00000100¦201203161000¦B¦32 ¦2WN¦EUS-¦MAN¦VAS¦4827¦TTL011 ¦
SEAT-RES¦$D0317.PBOUC32A.GURD3591 ¦00000101¦201203161000¦B¦25 ¦2WN¦EUS-¦MAN¦VAS¦4827¦TTL011 ¦ ... (22 Replies)
Hi,
I have a huge file with sql broken statements like:
PP3697HB @@@@0
<<<<<<Record has been deleted as per PP3697HB>>>>>>
FROM sys.xtab_ref rc,sys.xtab_sys f,sys.domp ur WHE
RE rc.milf = ur.milf AND rc.molf = f.molf AND ur.dept = 'SWIT'AND ur
.department = 'IND' AND share = '2' AND... (4 Replies)
Hi,
I am running my pipeline and capturing all stout from multiple programs to a .txt file. I want to go into that .txt file and search for specific lines, and finally print those lines in a second .txt file.
I can do this using grep, awk, or sed for each line, but have not been able to get... (2 Replies)
Hi - I have req to join broken lines and remove empty lines but should NOT be in one line. It has to be as is line by line. The challenge here is there is no end of line/start of line char.
thanks in advance
Source:-
2003-04-34024|04-10-2003|Claims|Claim|01-13-2003|Air Bag:Driver;... (7 Replies)
Gurus,
I have a data file which has a certain number of columns say 101. It has one description column which contains foreign characters and due to this some times, those special characters are translated to new line character and resulting in failing the process.
I am using the following awk... (4 Replies)
In the perl one-liner below I am identifying the runs of 6a or 6A in each line starting with >. The code seems close but it prints each > line no matter if it has 6a or 6A in it. Only the line with the 6a or 6A needs to be printed.
So using the input file, only the >hg19_refGene_NM_001918_3... (10 Replies)
Discussion started by: cmccabe
10 Replies
LEARN ABOUT FREEBSD
g3cat
g3cat(1) mgetty+sendfax manual g3cat(1)NAME
g3cat - concatenate multiple g3 documents
SYNOPSIS
g3cat [-l] [-a] g3-file1 ...
DESCRIPTION
g3cat concatenates g3 files. These can either be 'raw', that is, bitmaps packed according to the CCITT T.4 standard for one-dimensional
bitmap encoding, or 'digifax' files, created by GNU's GhostScript package with the digifax drivers. Its output is a concatenation of all
the input files, in raw G3 format, with two white lines in between.
If a - is given as input file, stdin is used.
If the input data is malformed, a warning is printed to stderr, and the output file will have a blank line at this place.
OPTIONS -l separate files with a one-pixel wide black line.
-h <blank lines>
specifies the number of blank lines g3cat should prepend to each page. Default is 0.
-L <lines>
limit lenght of output page to maximum <lines> lines.
SPECIAL-CASE OPTIONS-w <width>
specifies the desired page width in pixels per line. Default is 1728 PELs, and this is mandatory if you want to send the fax to a
standard fax machine. If one of the input files doesn't match this line width (for example because it was created by a broken G3
creator), a warning is printed, and the line width is transparently fixed.
-a byte-align the end-of-line codes (EOL) in the file. Every EOL will end at a byte boundary, that is, with a 01 byte.
-p <pad>
specifies a minimum number of bytes that each output line must be padded to. Padding is done with 0-bits before the EOL code.
-R suppress output of end-of-page code (RTC).
Example
The following example will put a header line on a given g3 page, 'page1' and put the result into 'page2':
echo '$header' | pbmtext | pbm2g3 | g3cat - page1 >page2
FILES --
BUGS
Hopefully none :-).
SEE ALSO g32pbm(1), sendfax(8), faxspool(1)AUTHORS
g3cat is Copyright (C) 1993 by Gert Doering, <gert@greenie.muc.de>
greenie 27 Oct 93 g3cat(1)