Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 06-18-2012
Registered User
 
Join Date: Feb 2012
Posts: 13
Thanks: 0
Thanked 0 Times in 0 Posts
How to identify broken lines in a file?

Hi,
I have a 100 byte length fixed width file . In that three rows are broken and went off to next line. How can I identify the broken lines?

E.g.
ABCD1234MNRD4321

abcd1234mnrd
4321

As you can see in my example my second row with small case alphabets is broken and went off to next line. How can I identify the broken line.
Sponsored Links
    #2  
Old 06-18-2012
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
 
Join Date: May 2005
Location: In the leftmost byte of /dev/kmem
Posts: 3,301
Thanks: 27
Thanked 456 Times in 355 Posts
If it is a "fixed width file" as you said, then you can identify the broken ones by the deviating number of characters in it.

The regexp:


Code:
.

matches exactly a single character. So you just have to repeat this the number of times you expect the line to be wide to find all non-broken lines:


Code:
.\{n\}

where "n" is the number of characters. Now you just reverse the search by using i.e "grep -v":


Code:
grep -v '^.\{n\}$' /path/to/inputfile

But you probably want to correct this circumstance, so "grep" is not the right tool - but "sed" is, and it works with the same syntax:


Code:
sed ': start
     s/\n//
     /^.\{n\}$/! {
          N
          b start
     }
     p' /path/to/inputfile

First every line has line feeds deleted. Then every line NOT consisting of n characters (the "!") - that is: the broken ones - will cause the next line to be read in and added to the line before. Then control branches to the beginning of the script again. If the line still is too short, even the next line will be read in, etc.., until the correct line length is reached. Then the line is printed in the last statement.


I hope this helps.

bakunin
Sponsored Links
    #3  
Old 06-18-2012
spacebar's Avatar
Registered User
 
Join Date: Oct 2009
Location: spaceBAR Central
Posts: 303
Thanks: 0
Thanked 59 Times in 59 Posts
This should list the lines that are less than 100 characters in length:

Code:
while read line
do
  pos=`echo ${#line}`;
  if [ "$pos" -lt "100" ]; then
    echo $line;
  fi
done < test.txt

    #4  
Old 06-18-2012
Resident BOFH
 
Join Date: Dec 2007
Posts: 1,129
Thanks: 2
Thanked 88 Times in 85 Posts
Java

display all lines with less then 100 characters prefixed with the line number.


Code:
awk 'length($0)<100{print NR,$0}' foo.bar

Sponsored Links
    #5  
Old 06-19-2012
jayan_jay's Avatar
Forum Advisor
 
Join Date: Jul 2008
Posts: 831
Thanks: 9
Thanked 185 Times in 176 Posts
One more ..

Code:
$ sed '/^.\{100\}/d' infile

Sponsored Links
    #6  
Old 06-19-2012
Registered User
 
Join Date: Feb 2012
Posts: 13
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by frank_rizzo View Post
display all lines with less then 100 characters prefixed with the line number.


Code:
awk 'length($0)<100{print NR,$0}' foo.bar

Simple and perfect
thank you.
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Merge broken lines ashwin_winwin Shell Programming and Scripting 22 04-18-2012 11:31 AM
Scripting help to identify words count in lines Giorgio C Shell Programming and Scripting 4 11-10-2011 09:59 AM
Joining broken lines ratheeshjulk Shell Programming and Scripting 8 06-22-2011 10:08 AM
awk / shell - Fix broken lines and data rimss Shell Programming and Scripting 3 06-01-2006 03:02 AM



All times are GMT -4. The time now is 06:46 PM.