Breaking long lines into (characters, newline, space) groups


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Breaking long lines into (characters, newline, space) groups
# 8  
Old 05-15-2009
Awesome! That works perfectly cfajohnson...thanks a bunch for your help with this. Awk is clearly the way to go, parses it in a split second Smilie

For anyone who comes across this, and happens to be dealing with ldif files, here is the final script for parsing the file:

Code:
#!/bin/ksh

echo "Where is the ldif file located that you would like to parse?"
read source_ldif
echo "Where would you like to output the parsed version to?"
read out_ldif

awk 'length > 78 {
    n=1
    while ( length($0) > 77 + n ) {
    printf "%s\n ", substr($0,1,77 + n)
    $0 = substr($0,78 + n)
    n=0
  }
  if (length) print
  next
}
{print}' "$source_ldif" > "$out_ldif"

# 9  
Old 05-15-2009

This version accepts an arbitrary line length:

Code:
## adjust length to taste
## or prompt for value
## or get value from environment
## or on the command line
## or wherever
length=66

awk -v x=${length:-79} 'length > x {
    n=1
    while ( length($0) > x - 1 + n ) {
    printf "%s\n ", substr($0,1,x - 1 + n)
    $0 = substr($0,x + n)
    n=0
  }
  if (length($0)) print
  next
}
{print}'


Last edited by cfajohnson; 05-15-2009 at 11:54 PM..
# 10  
Old 05-16-2009
I studied up a bit on awk, and generally I understand the script. There is one part I dont get though, maybe you could explain:

Code:
if (length ($0)) print
next

What does this do and why is it necessary? I get what the words mean, but I dont understand its purpose in the script. I tested various source text files with various text content, with lines of all lengths, but no matter what, if i remove this code, the script still seems to work perfectly.

In the variable length script, I didnt understand this either:

Code:
x=${length:-79}



What does the colon minus 79 mean? If you want to allow the user to set the length, could you "read x" and then set the variable as x=x-2? If youre setting the length to something else, why does the number 79 enter this version of the script?

Thanks for any explanation, Id like to understand awk better in general. I will be studying up on it.

# 11  
Old 05-16-2009
Quote:
Originally Posted by rowie718
I studied up a bit on awk, and generally I understand the script. There is one part I dont get though, maybe you could explain:

Code:
if (length ($0)) print
next

What does this do and why is it necessary? I get what the words mean, but I dont understand its purpose in the script. I tested various source text files with various text content, with lines of all lengths, but no matter what, if i remove this code, the script still seems to work perfectly.

It doesn't work perfectly is the line is missing. Given this text file:

Code:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXY
abcdefghijklmnopqrstuvwxyz

If you call the script with a length of 13, this is the result:

Code:
abcdefghijklm
 nopqrstuvwxy
 z
ABCDEFGHIJKLM
 NOPQRSTUVWXY
abcdefghijklm
 nopqrstuvwxy
 z

If you remove that line, the result is:

Code:
abcdefghijklm
 nopqrstuvwxy
 ABCDEFGHIJKLM
 abcdefghijklm
 nopqrstuvwxy

It prints whatever is left after all lines of length have been removed.
Quote:
In the variable length script, I didnt understand this either:

Code:
x=${length:-79}

[FONT=monospace]

What does the colon minus 79 mean?

That's shell parameter expansion, not part of awk. If length isn't defined, it substitutes 79.
Quote:
If you want to allow the user to set the length, could you "read x" and then set the variable as x=x-2? If youre setting the length to something else, why does the number 79 enter this version of the script?

It's a default value.
# 12  
Old 05-17-2009
Thanks for the explanation. Im learning, but strangely, it seems that our systems are treating this code differently.

This script (without the if and next statements towards the end):

Code:
#!/bin/ksh

echo "How many characters per line?"
read length
echo "Where is the ldif file located that you would like to parse?"
read source_ldif
echo "Where would you like to output the parsed version to?"
read out_ldif

awk -v x=${length:-79} 'length > x {
    n=1
    while ( length($0) > x - 1 + n ) {
    printf "%s\n ", substr($0,1,x - 1 + n)
    $0 = substr($0,x + n)
    n=0

  }

}
{print}' "$source_ldif" > "$out_ldif"

applied to your source text:
Code:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXY
abcdefghijklmnopqrstuvwxyz

with a length of 13, results in properly parsed text:
Code:
abcdefghijklm
 nopqrstuvwxy
 z
ABCDEFGHIJKLM
 NOPQRSTUVWXY
abcdefghijklm
 nopqrstuvwxy
 z

I tried to figure out which version of awk I have, but apparently it is not easy to do so. I am running os x client 10.4.11. On Apple's opensource distribution page, they list "awk-7" as an available download. I wonder how I can find out the version I am using, and if it makes sense that it's a different version of awk that accounts for the difference in output.
# 13  
Old 05-17-2009

You're right; it is unnecessary; this works as is:

Code:
awk -v x=${1:-79} 'length > x {
    n=1
    while ( length($0) > x - 1 + n ) {
    printf "%s\n ", substr($0,1,x - 1 + n)
    $0 = substr($0,x + n)
    n=0
  }
}
{print}' "$file"

The next exercise is to modify it so that the amount of indent can be specified.
# 14  
Old 05-17-2009
Quote:
Originally Posted by cfajohnson

The next exercise is to modify it so that the amount of indent can be specified.
Code:
if [ $# -eq 0 ]
then
  echo "USAGE: ${0##*/} FILE [width [ indent ]]"
  exit 1
fi

file=$1

awk -v width=${2:-79} -v indent=${3:-1} '
length > width {
    n = width
    while ( length($0) > n ) {
    printf "%s\n%" indent "s", substr($0,1, n), " "
    $0 = substr($0, n)
    n = width - indent
  }
}
{print}' "$file"


Last edited by cfajohnson; 05-17-2009 at 11:17 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Mailx appending exclamation mark and newline in a long line

Hi, I have a shell script which automates reporting and at times, requires the report line to be very long (sometimes as long as 2131 chars). The output I get is similar to this: XXXX XXXXXXX 16:15 3.24% 5.07% 3.69% 5.23% 3.68% 4.06% 3.57% 5.03% 4.31% 5.11% 3.49% 4.19% 4.31% ... (2 Replies)
Discussion started by: gilberteu
2 Replies

2. Shell Programming and Scripting

Newline characters in fields of a file

My source file is pipe delimeted file with 53 fields.In 33 rd column i am getting mutlple new line characters,dule to that record is breaking into multiple records. Note : here record delimter also \n sample Source file with 6 fields : 1234|abc| \nabcd \n bvd \n cde \n |678|890|900\n ... (6 Replies)
Discussion started by: lakshmi001
6 Replies

3. Shell Programming and Scripting

Breaking lines which contains more than 50 characters in a file

Hi, I have a file which contains many lines. Some of them are longer than 50 chars. I want to break those lines but I don't want to break words, e.g. the file This is an exemplary text which should be broken aaaaaa bbbbb ccccc This is the second line This line should also be broken... (3 Replies)
Discussion started by: wenclu
3 Replies

4. Shell Programming and Scripting

awk: searching for non-breaking-space

This code shal search for the non-breaking space 0xA0 though it returns the error "fatal: attempt to use scalar 'nbs' as array" Can somebody help? awk --non-decimal-data -v nbs="0xA0" '{if($0 in nbs) {print FILENAME, NR}}' *.txt (1 Reply)
Discussion started by: sdf
1 Replies

5. Shell Programming and Scripting

cutting long text by special char around 100 byte and newline

Regard, How can i cut the text by special char(|) around 100 byte and write the other of the text at newline using Perl. ... (3 Replies)
Discussion started by: Shawn, Lee
3 Replies

6. UNIX for Dummies Questions & Answers

Breaking up a text file into lines

Hi, I have a space delimited text file that looks like the following: BUD31 YRI 2e-06:CXorf15 YRI 3e-06:CREB1 YRI 4e-06 FLJ21438 CEU 3e-07:ETS1 CEU 8e-07:FGD3 CEU 2e-06 I want to modify the text file so that everytime there is a ":", a new line is introduced so that the document looks... (3 Replies)
Discussion started by: evelibertine
3 Replies

7. Ubuntu

Disk Space lost mysteriously upon breaking a process.

Hi All, Today when I was working on a script to generate custom wordlist. So I ran a script and the output was directed to /tmp. The disk space was around 19 gb. While the script was running, I decided to direct the o/p file to my 1TB drive. So I broke the run using Ctrl + C. Now when I... (4 Replies)
Discussion started by: morningSunshine
4 Replies

8. Shell Programming and Scripting

Replace long space to become one space?

Hi, i have the log attached. Actually i want the long space just become 1 space left like this : Rgds, (12 Replies)
Discussion started by: justbow
12 Replies

9. UNIX for Dummies Questions & Answers

non-breaking space question

Might anyone know how to make a nbsp (160|0xA0) character? I am using a Dell Latitude D620 running Windows XP and then starting Exceed 9.0 defaulting to native window emulation for my X (us.kbf keymapping) (Latin-1 symbol set I believe) and calling an xterm (fontdefault, whatever that might be)... (1 Reply)
Discussion started by: runmeat6
1 Replies

10. Shell Programming and Scripting

remove trailing newline characters

Hello , I have the folowing scenario : I have a text file as follows : (say name.txt) ABC DEF XYZ And I have one more xml file as follows : (say somexml.xml) <Name>ABC</Name> <Age>12</Age> <Class>D</Class> <Name>XYZ</Name> <Age>12</Age> <Class>D</Class> <Name>DEF</Name>... (7 Replies)
Discussion started by: shweta_d
7 Replies
Login or Register to Ask a Question