Breaking long lines into (characters, newline, space) groups


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Breaking long lines into (characters, newline, space) groups
# 1  
Old 05-14-2009
Breaking long lines into (characters, newline, space) groups

Hello,

I am currently trying to edit an ldif file. The ldif specification states that a newline followed by a space indicates the subsequent line is a continuation of the line. So, in order to search and replace properly and edit the file, I open the file in textwrangler, search for "\r " and remove it, thus making all continued lines into single lines. Thats the first step. I make my changes to the ldif file at that point.

Now, after editing, I want to break any lines with more than 79 characters, (some of which are hundreds of characters long) into this: 79 characters, newline, space, next 79 characters, newline, space, next 79 characters, newline, space, etc.

using this simple sed command:

Code:
sed 's/./\
 /80' myfile > newfile

works for the first 79 characters of line x, breaks it properly, but then moves on to the next line in the ldif, leaving line x broken into: 79 characters, newline, space, remaining chunk of line x which is hundreds of characters, next line in ldif. Only partial success!

So heres the question. Is there a way to use sed to run this command every 79th character until the end of the line? If not, alternately, should I use a loop in the script using some sort of conditional statement like, if there are lines longer than 79 characters, rerun the sed command. (so that it will go and now break the remaining hundreds of characters that were not broken in the original sed run. and continue looping till all lines are broken into (79 character, newline, space) chunks? How could I set up that condition? I dont know how to search for lines longer than x characters.

Thanks a lot for any help on this!
# 2  
Old 05-14-2009
I came up with this script. It seems hackish and very inefficient, but it works. I would love for someone to help me come up with a better way since this script takes almost 10 full minutes to parse a text file into less than 7000 lines.

Code:
#!/bin/ksh

echo "where is the ldif file located that you would like to parse?"
read response
ldiffile=$response

while read line
do

x=`echo $line | wc -c`

while [ $x -gt 79 ]
do

sed 's/./\
 /79' $ldiffile > /test.ldif
mv /test.ldif $ldiffile
x=$x-79

done

done < $ldiffile

I just realized this script is substituting the 79th character with the newline and space. From what Ive been reading, I can add an ampersand before the newline escape in the sed replacement pattern. However when I put an ampersand there, it ruins the ldif file, cutting lines and inserting groups of blank lines. Ive searched all through a million forums, mostly suggesting using escaped parentheses to remember a pattern and then \1 to recall it with the newline after that. It doesnt work for me. Any which way I try to recall the 79th character in the replacement string and add to it, I get this crazy blank line effect on my file. I am on os x 10.4.11 server. Frustrating! How do I make it so the newline will come after the 79th character and not as a substitute?

Thanks again for any help you can offer!

Last edited by rowie718; 05-14-2009 at 07:53 PM..
# 3  
Old 05-14-2009
Quote:
Originally Posted by rowie718
I came up with this script. It seems hackish and very inefficient, but it works. I would love for someone to help me come up with a better way since this script takes almost 10 full minutes to parse a text file into less than 7000 lines.
[/code]

For a file that size, you should use awk.
Quote:
[code]
Code:
#!/bin/ksh

echo "where is the ldif file located that you would like to parse?"
read response
ldiffile=$response


Why not simply:

Code:
read ldiffile

Quote:
Code:
while read line
do

x=`echo $line | wc -c`


You don't need an external command to get the length of a variable's contents:

Code:
x=${#line}

Quote:
Code:
while [ $x -gt 79 ]
do

sed 's/./\
 /79' $ldiffile > /test.ldif
mv /test.ldif $ldiffile
x=$x-79

done

done < $ldiffile


Code:
awk 'length > 79 { while ( length($0) > 79 ) {
    printf "%s\n ", substr($0,1,79)
    $0 = substr($0,80)
  }
  if (length) print
  next
}
{print}' "$FILE"

# 4  
Old 05-15-2009
Thank you so much for your help cfajohnson!

I put together your suggestions and tested them. Its almost there, but there were 2 problems. The first I fixed fairly easily. The 79th character is the newline in the orignal ldif, so I shouldve expressed it as wanting 78 characters. I deducted 1 from anywhere I saw 79 or 80 in your awk command and that seemed to do the trick. The second problem is trickier. Take a 240 character line as an example. When the awk command breaks it, and adds the space in the second chunk, it does not take into account that the last character of that second chunk should be at the same ending position as the first chunk. As it is currently written, all chunks after the first break align 1 character to the right because of the space.

Example:
Code:
123456789012345678901234567890.....(240 character long string repeating)

currently breaks into :
123456789012345678901234567890123456789012345678901234567890123456789012345678
 901234567890123456789012345678901234567890123456789012345678901234567890123456
 789012345678901234567890123456789012345678901234567890123456789012345678901234
 567890

but should actually end up more like this, so that every line has 78 characters, 
plus newline (including the space we've added):

123456789012345678901234567890123456789012345678901234567890123456789012345678
 90123456789012345678901234567890123456789012345678901234567890123456789012345
 67890123456789012345678901234567890123456789012345678901234567890123456789012
 34567890

The script currently looks like this:

#!/bin/ksh

echo "where is the ldif file located that you would like to parse?"
read ldiffile

awk 'length > 78 { while ( length($0) > 78 ) {
    printf "%s\n ", substr($0,1,78)
    $0 = substr($0,79)
  }
  if (length) print
  next
}
{print}' $ldiffile > /out.txt

Thanks again for your help, I really appreciate it.
# 5  
Old 05-15-2009
if you have Python, here's an alternative solution
Code:
import textwrap
t=textwrap.TextWrapper(subsequent_indent=" ",width=78)
for line in open("file"):
    for i in t.wrap(line):
        print i

output
Code:
# more file
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

# ./test.py
123456789012345678901234567890123456789012345678901234567890123456789012345678
 90123456789012345678901234567890123456789012345678901234567890123456789012345
 67890123456789012345678901234567890123456789012345678901234567890123456789012
 34567890

# 6  
Old 05-15-2009
Thank you ghostdog,

First let me state that I am totally unfamiliar with python. However, if it solves this problem for me, I would be glad to learn a bit and use it. There are a few issues I noticed upon trying the code you provided, ranked in order of importance:

1) The code seems to eliminate blank lines from the source text. I need it to not do that. Example:

1111

2222

becomes

1111
2222

2) I dont know how to output to a file rather than the standard output. I apologize for the rookie question here.

3) Ideally I would like for there to be a way to interactively input the location of the file so it doesnt need to be hardcoded. If this is too much to ask though, I can live without it.

Generally I would prefer to use sed/awk since I have some familiarity with them and bash scripting, however I will use whatever solutions are presented that fully solve this problem. I really appreciate the assistance.

Cheers.
# 7  
Old 05-15-2009

Code:
awk 'length > 79 {
    n=1
    while ( length($0) > 78 + n ) {
    printf "%s\n ", substr($0,1,78 + n)
    $0 = substr($0,79 + n)
    n=0
  }
  if (length) print
  next
}
{print}' "$FILE"

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mailx appending exclamation mark and newline in a long line

Hi, I have a shell script which automates reporting and at times, requires the report line to be very long (sometimes as long as 2131 chars). The output I get is similar to this: XXXX XXXXXXX 16:15 3.24% 5.07% 3.69% 5.23% 3.68% 4.06% 3.57% 5.03% 4.31% 5.11% 3.49% 4.19% 4.31% ... (2 Replies)
Discussion started by: gilberteu
2 Replies

2. Shell Programming and Scripting

Newline characters in fields of a file

My source file is pipe delimeted file with 53 fields.In 33 rd column i am getting mutlple new line characters,dule to that record is breaking into multiple records. Note : here record delimter also \n sample Source file with 6 fields : 1234|abc| \nabcd \n bvd \n cde \n |678|890|900\n ... (6 Replies)
Discussion started by: lakshmi001
6 Replies

3. Shell Programming and Scripting

Breaking lines which contains more than 50 characters in a file

Hi, I have a file which contains many lines. Some of them are longer than 50 chars. I want to break those lines but I don't want to break words, e.g. the file This is an exemplary text which should be broken aaaaaa bbbbb ccccc This is the second line This line should also be broken... (3 Replies)
Discussion started by: wenclu
3 Replies

4. Shell Programming and Scripting

awk: searching for non-breaking-space

This code shal search for the non-breaking space 0xA0 though it returns the error "fatal: attempt to use scalar 'nbs' as array" Can somebody help? awk --non-decimal-data -v nbs="0xA0" '{if($0 in nbs) {print FILENAME, NR}}' *.txt (1 Reply)
Discussion started by: sdf
1 Replies

5. Shell Programming and Scripting

cutting long text by special char around 100 byte and newline

Regard, How can i cut the text by special char(|) around 100 byte and write the other of the text at newline using Perl. ... (3 Replies)
Discussion started by: Shawn, Lee
3 Replies

6. UNIX for Dummies Questions & Answers

Breaking up a text file into lines

Hi, I have a space delimited text file that looks like the following: BUD31 YRI 2e-06:CXorf15 YRI 3e-06:CREB1 YRI 4e-06 FLJ21438 CEU 3e-07:ETS1 CEU 8e-07:FGD3 CEU 2e-06 I want to modify the text file so that everytime there is a ":", a new line is introduced so that the document looks... (3 Replies)
Discussion started by: evelibertine
3 Replies

7. Ubuntu

Disk Space lost mysteriously upon breaking a process.

Hi All, Today when I was working on a script to generate custom wordlist. So I ran a script and the output was directed to /tmp. The disk space was around 19 gb. While the script was running, I decided to direct the o/p file to my 1TB drive. So I broke the run using Ctrl + C. Now when I... (4 Replies)
Discussion started by: morningSunshine
4 Replies

8. Shell Programming and Scripting

Replace long space to become one space?

Hi, i have the log attached. Actually i want the long space just become 1 space left like this : Rgds, (12 Replies)
Discussion started by: justbow
12 Replies

9. UNIX for Dummies Questions & Answers

non-breaking space question

Might anyone know how to make a nbsp (160|0xA0) character? I am using a Dell Latitude D620 running Windows XP and then starting Exceed 9.0 defaulting to native window emulation for my X (us.kbf keymapping) (Latin-1 symbol set I believe) and calling an xterm (fontdefault, whatever that might be)... (1 Reply)
Discussion started by: runmeat6
1 Replies

10. Shell Programming and Scripting

remove trailing newline characters

Hello , I have the folowing scenario : I have a text file as follows : (say name.txt) ABC DEF XYZ And I have one more xml file as follows : (say somexml.xml) <Name>ABC</Name> <Age>12</Age> <Class>D</Class> <Name>XYZ</Name> <Age>12</Age> <Class>D</Class> <Name>DEF</Name>... (7 Replies)
Discussion started by: shweta_d
7 Replies
Login or Register to Ask a Question