sed random \n for "n" range of character occurrences


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed random \n for "n" range of character occurrences
# 8  
Old 07-20-2016
I don't really know what I'm doing with that sed command; it's just showing my thought process as to what I want to achieve.

Input example is a blob of text with periods (.) marking sentences. The following output example would have random \n\n breaks, between every random 3 - 6 periods (for example):
Code:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nam nibh. Nunc  varius facilisis eros. Sed erat. 

In in velit quis arcu ornare laoreet.  Curabitur adipiscing luctus massa. Integer ut purus ac augue commodo  commodo. Nunc nec mi eu justo tempor consectetuer. Etiam vitae nisl. In  dignissim lacus ut ante. 

Cras elit lectus, bibendum a, adipiscing vitae,  commodo et, dui. Ut tincidunt tortor. Donec nonummy, enim in lacinia  pulvinar, velit tellus scelerisque augue, ac posuere libero urna eget  neque. 

Cras ipsum. Vestibulum pretium, lectus nec venenatis volutpat,  purus lectus ultrices risus, a condimentum risus mi et quam.  Pellentesque auctor fringilla neque. Duis eu massa ut lorem iaculis  vestibulum. Maecenas facilisis elit sed justo.

Moderator's Comments:
Mod Comment Please use CODE tags for all sample input, sample output, and code segments (as required by forum rules).

Last edited by Don Cragun; 07-21-2016 at 12:12 AM.. Reason: Add CODE tags.
# 9  
Old 07-21-2016
Given that the input is one single line of text, try
Code:
awk '
        {N = split ($0, T, ".")
         CNT = 1
         while (CNT <= N)       {RND = int(5*(1+rand()))
                                 for (i=CNT; i<CNT+RND && i<N; i++) printf "%s.", T[i]
                                 printf "\n\n"
                                 CNT += RND
                                }
        }
' file
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nam nibh. Nunc  varius facilisis eros. Sed erat. In in velit quis arcu ornare laoreet.

  Curabitur adipiscing luctus massa. Integer ut purus ac augue commodo  commodo. Nunc nec mi eu justo tempor consectetuer. Etiam vitae nisl. In  dignissim lacus ut ante. Cras elit lectus, bibendum a, adipiscing vitae,  commodo et, dui.

 Ut tincidunt tortor. Donec nonummy, enim in lacinia  pulvinar, velit tellus scelerisque augue, ac posuere libero urna eget  neque. Cras ipsum. Vestibulum pretium, lectus nec venenatis volutpat,  purus lectus ultrices risus, a condimentum risus mi et quam.  Pellentesque auctor fringilla neque. Duis eu massa ut lorem iaculis  vestibulum. Maecenas facilisis elit sed justo.

This User Gave Thanks to RudiC For This Post:
# 10  
Old 07-21-2016
Thank you RudiC, that is awesome! I see what you did here
Code:
{RND = int(5*(1+rand()))

as rand() is between 0 and 1 (so between 5 and 5*2 periods). I can adjust values to create new ranges. T (or a) is array field of split().
# 11  
Old 07-21-2016
Yes, split($0, %, ".") creates an array named T with each element of T[] containing the text between periods in the input line. If you'd like to get rid of leading whitespace characters at the start of each paragraph, you might want to consider this slight modification of RudiC's suggestion:
Code:
awk '
{	N = split ($0, T, ".")
	CNT = 1
	while (CNT <= N) {
		RND = int(5*(1+rand()))
		for (i=CNT; i<CNT+RND && i<N; i++) {
			if(i == CNT) sub(/^[[:space:]]*/, "", T[i])
			printf "%s.", T[i]
		}
		printf "\n\n"
		CNT += RND
	}
}' file

If you'd like to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk.

Note that on some versions of awk neither of these suggestions will work if the single-line input file is longer than 2048 bytes (or whatever the command:
Code:
getconf LINE_MAX

returns on your system if it isn't 2048).
These 2 Users Gave Thanks to Don Cragun For This Post:
# 12  
Old 07-22-2016
Thanks very much Don. That's perfect; replaces this step I was doing:
Code:
awk '{$1=$1}1' in > left-trim

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing character "|" in given character range

Hi I am having file : 1|2443094 |FUNG SIU TO |CLEMENT 2|2443095 |FUNG KIL FO |REMENT This file contains only 3 fields delimeted by "|". Last field is a decsription filed and it contains character "|". Due to this my output if breaking in 4 fields. I... (7 Replies)
Discussion started by: krsnadasa
7 Replies

2. Shell Programming and Scripting

how to use "cut" or "awk" or "sed" to remove a string

logs: "/home/abc/public_html/index.php" "/home/abc/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" "/home/xyz/public_html/index.php" how to use "cut" or "awk" or "sed" to get the following result: abc abc xyz xyz xyz (8 Replies)
Discussion started by: timmywong
8 Replies

3. Shell Programming and Scripting

Using sed to find text between a "string " and character ","

Hello everyone Sorry I have to add another sed question. I am searching a log file and need only the first 2 occurances of text which comes after (note the space) "string " and before a ",". I have tried sed -n 's/.*string \(*\),.*/\1/p' filewith some, but limited success. This gives out all... (10 Replies)
Discussion started by: haggismn
10 Replies

4. Shell Programming and Scripting

sed escape character for comment string "/*"

Good afternoon all, I'm hoping my newbie question can help bolster someone's street_cred.sh today. I'm trying to "fingerprint" SQL on its way into the rdbms for a benchmarking process (so I can tie the resource allocation back to the process more precisely). To do this, I'm essentially... (4 Replies)
Discussion started by: toeharp
4 Replies

5. Shell Programming and Scripting

How to print range of lines using sed when pattern has special character "["

Hi, My input has much more lines, but few of them are below pin(IDF) { direction : input; drc_pinsigtype : signal; pin(SELDIV6) { direction : input; drc_pinsigtype : ... (3 Replies)
Discussion started by: nehashine
3 Replies

6. Shell Programming and Scripting

Command Character size limit in the "sh" and "bourne" shell

Hi!!.. I would like to know what is maximum character size for a command in the "sh" or "bourne" shell? Thanks in advance.. Roshan. (1 Reply)
Discussion started by: Roshan1286
1 Replies

7. UNIX for Advanced & Expert Users

Command Character size limit in the "sh" and "bourne" shell

Hi!!.. I would like to know what is maximum character size for a command in the "sh" or "bourne" shell? Thanks in advance.. Roshan. (1 Reply)
Discussion started by: Roshan1286
1 Replies

8. UNIX for Dummies Questions & Answers

Command Character size limit in the "sh" and "bourne" shell

Hi!!.. I would like to know what is maximum character size for a command in the "sh" or "bourne" shell? Thanks in advance.. Roshan. (1 Reply)
Discussion started by: Roshan1286
1 Replies

9. Shell Programming and Scripting

cat $como_file | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g'

hi All, cat file_name | awk /^~/'{print $1","$2","$3","$4}' | sed -e 's/~//g' Can this be done by using sed or awk alone (4 Replies)
Discussion started by: harshakusam
4 Replies

10. Shell Programming and Scripting

removing the "\" and "\n" character using sed or tr

Hi All, I'm trying to write a ksh script to parse a file. When the "\" character is encountered, it should be removed and the next line should be concatenated with the current line. For example... this is a test line #1\ should be concatenated with line #2\ and line number 3 when this... (3 Replies)
Discussion started by: newbie_coder
3 Replies
Login or Register to Ask a Question