Breaking long lines into (characters, newline, space) groups
Hello,
I am currently trying to edit an ldif file. The ldif specification states that a newline followed by a space indicates the subsequent line is a continuation of the line. So, in order to search and replace properly and edit the file, I open the file in textwrangler, search for "\r " and remove it, thus making all continued lines into single lines. Thats the first step. I make my changes to the ldif file at that point.
Now, after editing, I want to break any lines with more than 79 characters, (some of which are hundreds of characters long) into this: 79 characters, newline, space, next 79 characters, newline, space, next 79 characters, newline, space, etc.
using this simple sed command:
works for the first 79 characters of line x, breaks it properly, but then moves on to the next line in the ldif, leaving line x broken into: 79 characters, newline, space, remaining chunk of line x which is hundreds of characters, next line in ldif. Only partial success!
So heres the question. Is there a way to use sed to run this command every 79th character until the end of the line? If not, alternately, should I use a loop in the script using some sort of conditional statement like, if there are lines longer than 79 characters, rerun the sed command. (so that it will go and now break the remaining hundreds of characters that were not broken in the original sed run. and continue looping till all lines are broken into (79 character, newline, space) chunks? How could I set up that condition? I dont know how to search for lines longer than x characters.
I came up with this script. It seems hackish and very inefficient, but it works. I would love for someone to help me come up with a better way since this script takes almost 10 full minutes to parse a text file into less than 7000 lines.
I just realized this script is substituting the 79th character with the newline and space. From what Ive been reading, I can add an ampersand before the newline escape in the sed replacement pattern. However when I put an ampersand there, it ruins the ldif file, cutting lines and inserting groups of blank lines. Ive searched all through a million forums, mostly suggesting using escaped parentheses to remember a pattern and then \1 to recall it with the newline after that. It doesnt work for me. Any which way I try to recall the 79th character in the replacement string and add to it, I get this crazy blank line effect on my file. I am on os x 10.4.11 server. Frustrating! How do I make it so the newline will come after the 79th character and not as a substitute?
I came up with this script. It seems hackish and very inefficient, but it works. I would love for someone to help me come up with a better way since this script takes almost 10 full minutes to parse a text file into less than 7000 lines.
[/code]
For a file that size, you should use awk.
Quote:
[code]
Why not simply:
Quote:
You don't need an external command to get the length of a variable's contents:
I put together your suggestions and tested them. Its almost there, but there were 2 problems. The first I fixed fairly easily. The 79th character is the newline in the orignal ldif, so I shouldve expressed it as wanting 78 characters. I deducted 1 from anywhere I saw 79 or 80 in your awk command and that seemed to do the trick. The second problem is trickier. Take a 240 character line as an example. When the awk command breaks it, and adds the space in the second chunk, it does not take into account that the last character of that second chunk should be at the same ending position as the first chunk. As it is currently written, all chunks after the first break align 1 character to the right because of the space.
Example:
Thanks again for your help, I really appreciate it.
First let me state that I am totally unfamiliar with python. However, if it solves this problem for me, I would be glad to learn a bit and use it. There are a few issues I noticed upon trying the code you provided, ranked in order of importance:
1) The code seems to eliminate blank lines from the source text. I need it to not do that. Example:
1111
2222
becomes
1111
2222
2) I dont know how to output to a file rather than the standard output. I apologize for the rookie question here.
3) Ideally I would like for there to be a way to interactively input the location of the file so it doesnt need to be hardcoded. If this is too much to ask though, I can live without it.
Generally I would prefer to use sed/awk since I have some familiarity with them and bash scripting, however I will use whatever solutions are presented that fully solve this problem. I really appreciate the assistance.
Hi,
I have a shell script which automates reporting and at times, requires the report line to be very long (sometimes as long as 2131 chars). The output I get is similar to this:
XXXX XXXXXXX 16:15 3.24% 5.07% 3.69% 5.23% 3.68% 4.06% 3.57% 5.03% 4.31% 5.11% 3.49% 4.19% 4.31% ... (2 Replies)
My source file is pipe delimeted file with 53 fields.In 33 rd column i am getting mutlple new line characters,dule to that record is breaking into multiple records.
Note : here record delimter also \n
sample Source file with 6 fields :
1234|abc| \nabcd \n bvd \n cde \n |678|890|900\n
... (6 Replies)
Hi,
I have a file which contains many lines. Some of them are longer than 50 chars. I want to break those lines but I don't want to break words, e.g. the file
This is an exemplary text which should be broken aaaaaa bbbbb ccccc
This is the second line
This line should also be broken... (3 Replies)
This code shal search for the non-breaking space 0xA0 though it returns the error "fatal: attempt to use scalar 'nbs' as array" Can somebody help?
awk --non-decimal-data -v nbs="0xA0" '{if($0 in nbs) {print FILENAME, NR}}' *.txt (1 Reply)
Hi,
I have a space delimited text file that looks like the following:
BUD31 YRI 2e-06:CXorf15 YRI 3e-06:CREB1 YRI 4e-06
FLJ21438 CEU 3e-07:ETS1 CEU 8e-07:FGD3 CEU 2e-06
I want to modify the text file so that everytime there is a ":", a new line is introduced so that the document looks... (3 Replies)
Hi All,
Today when I was working on a script to generate custom wordlist. So I ran a script and the output was directed to /tmp.
The disk space was around 19 gb. While the script was running, I decided to direct the o/p file to my 1TB drive. So I broke the run using Ctrl + C.
Now when I... (4 Replies)
Might anyone know how to make a nbsp (160|0xA0) character? I am using a Dell Latitude D620 running Windows XP and then starting Exceed 9.0 defaulting to native window emulation for my X (us.kbf keymapping) (Latin-1 symbol set I believe) and calling an xterm (fontdefault, whatever that might be)... (1 Reply)
Hello ,
I have the folowing scenario :
I have a text file as follows : (say name.txt)
ABC
DEF
XYZ
And I have one more xml file as follows : (say somexml.xml)
<Name>ABC</Name>
<Age>12</Age>
<Class>D</Class>
<Name>XYZ</Name>
<Age>12</Age>
<Class>D</Class>
<Name>DEF</Name>... (7 Replies)