How to identify broken lines in a file?

Login or Register to Ask a Question and Join Our Community

How to identify broken lines in a file?

Tags

Top Forums UNIX for Dummies Questions & Answers How to identify broken lines in a file?

06-18-2012

Registered User

15, 0

Join Date: Feb 2012

Last Activity: 20 August 2015, 10:28 AM EDT

Posts: 15

Thanks Given: 0

Thanked 0 Times in 0 Posts

How to identify broken lines in a file?

Hi,
I have a 100 byte length fixed width file . In that three rows are broken and went off to next line. How can I identify the broken lines?

E.g.
ABCD1234MNRD4321

abcd1234mnrd
4321

As you can see in my example my second row with small case alphabets is broken and went off to next line. How can I identify the broken line.

okkadu

View Public Profile for okkadu

Find all posts by okkadu

06-18-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

If it is a "fixed width file" as you said, then you can identify the broken ones by the deviating number of characters in it.

The regexp:

Code:

matches exactly a single character. So you just have to repeat this the number of times you expect the line to be wide to find all non-broken lines:

Code:

.\{n\}

where "n" is the number of characters. Now you just reverse the search by using i.e "grep -v":

Code:

grep -v '^.\{n\}$' /path/to/inputfile

But you probably want to correct this circumstance, so "grep" is not the right tool - but "sed" is, and it works with the same syntax:

Code:

sed ': start
     s/\n//
     /^.\{n\}$/! {
          N
          b start
     }
     p' /path/to/inputfile

First every line has line feeds deleted. Then every line NOT consisting of n characters (the "!") - that is: the broken ones - will cause the next line to be read in and added to the line before. Then control branches to the beginning of the script again. If the line still is too short, even the next line will be read in, etc.., until the correct line length is reached. Then the line is printed in the last statement.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

06-18-2012

Registered User

440, 71

Join Date: Oct 2009

Last Activity: 26 June 2018, 6:52 PM EDT

Location: spaceBAR Central

Posts: 440

Thanks Given: 0

Thanked 71 Times in 70 Posts

This should list the lines that are less than 100 characters in length:

Code:

while read line
do
  pos=`echo ${#line}`;
  if [ "$pos" -lt "100" ]; then
    echo $line;
  fi
done < test.txt

spacebar

View Public Profile for spacebar

Find all posts by spacebar

06-19-2012

Registered User

1,155, 93

Join Date: Dec 2007

Last Activity: 28 December 2019, 12:50 PM EST

Posts: 1,155

Thanks Given: 5

Thanked 93 Times in 90 Posts

display all lines with less then 100 characters prefixed with the line number.

Code:

awk 'length($0)<100{print NR,$0}' foo.bar

frank_rizzo

View Public Profile for frank_rizzo

Find all posts by frank_rizzo

06-19-2012

Registered User

833, 187

Join Date: Jul 2008

Last Activity: 9 March 2016, 9:36 AM EST

Posts: 833

Thanks Given: 9

Thanked 187 Times in 177 Posts

One more ..

Code:

$ sed '/^.\{100\}/d' infile

jayan_jay

View Public Profile for jayan_jay

Find all posts by jayan_jay

06-19-2012

Registered User

15, 0

Join Date: Feb 2012

Last Activity: 20 August 2015, 10:28 AM EDT

Posts: 15

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by frank_rizzo

display all lines with less then 100 characters prefixed with the line number.

Code:

awk 'length($0)<100{print NR,$0}' foo.bar

Simple and perfect
thank you.

okkadu

View Public Profile for okkadu

Find all posts by okkadu

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl to identify specific runs in input and print only lines identified

In the perl one-liner below I am identifying the runs of 6a or 6A in each line starting with >. The code seems close but it prints each > line no matter if it has 6a or 6A in it. Only the line with the 6a or 6A needs to be printed. So using the input file, only the >hg19_refGene_NM_001918_3...

2. Shell Programming and Scripting

Identify lines with wrong format in a file and fix

Gurus, I have a data file which has a certain number of columns say 101. It has one description column which contains foreign characters and due to this some times, those special characters are translated to new line character and resulting in failing the process. I am using the following awk...

3. Shell Programming and Scripting

Joining broken lines and removing empty lines

Hi - I have req to join broken lines and remove empty lines but should NOT be in one line. It has to be as is line by line. The challenge here is there is no end of line/start of line char. thanks in advance Source:- 2003-04-34024|04-10-2003|Claims|Claim|01-13-2003|Air Bag:Driver;...

4. Shell Programming and Scripting

Help with Shell Script to identify lines in file1 and write them to file2

Hi, I am running my pipeline and capturing all stout from multiple programs to a .txt file. I want to go into that .txt file and search for specific lines, and finally print those lines in a second .txt file. I can do this using grep, awk, or sed for each line, but have not been able to get...

5. Shell Programming and Scripting

Joining broken lines with awk or perl

Hi, I have a huge file with sql broken statements like: PP3697HB @@@@0 <<<<<<Record has been deleted as per PP3697HB>>>>>> FROM sys.xtab_ref rc,sys.xtab_sys f,sys.domp ur WHE RE rc.milf = ur.milf AND rc.molf = f.molf AND ur.dept = 'SWIT'AND ur .department = 'IND' AND share = '2' AND...

6. Shell Programming and Scripting

Merge broken lines

i have a file of this type: SEAT-RES�$D0317.PBOUC32A.GURD3591 �00000100�201203161000�B�32 �2WN�EUS-�MAN�VAS�4827�TTL011 � SEAT-RES�$D0317.PBOUC32A.GURD3591 �00000101�201203161000�B�25 �2WN�EUS-�MAN�VAS�4827�TTL011 � ...

7. Shell Programming and Scripting

Scripting help to identify words count in lines

Hi everybody, i have this biological situation to fix: > Id.1 ACGTACANNNNNNNNNNNACGTGCNNNNNNNACTGTGGT >Id.2 ACGGGT >Id.3 ACGTNNNNNNNNNNNNACTGGGGG >Id.4 ACGTGCGNNNNNNNNGGTCANNNNNNNNCGTGCAAANNNNN ........ .... These are nucleotidic sequences with some "NNNN..." always of the same...

8. Shell Programming and Scripting

Joining broken lines

I have a plain test file with a delimeter ''. In this file some lines are broken into two. The first part of these broken line will have 6 columns and the second part will have 4. These broken lines will be consicutive. I want to join the two consicutive lines which are having 6 fields and 4...

9. Shell Programming and Scripting

awk / shell - Fix broken lines and data

Gurus, I am struggling with a issue and thought I could use some of your expertise. Need Help with this I have a flat file that has millions of records 24|john|account ~ info |56| 25|kuo|account ~ journal |58| 27|kim|account ~ journal |59| 28|San|account ~ journal |60|...

Login or Register to Ask a Question