Join Lines every paragraph in a file.txt


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join Lines every paragraph in a file.txt
# 1  
Old 10-09-2016
Join Lines every paragraph in a file.txt

Hi all,

Is there any idea on how to automate convert the paragraph in one line in a file, this will happen after OCR the documents, OCR split every paragraph. I need to join all the paragraph in one line.

Code:
#cat file.txt

Code:
The Commission on Higher Education (CHED) was created through Republic Act 7722
otherwise known as the Higher Education Act of 1994. Consistent with the governmentâs
position of making education the central strategy for poverty reduction, the CHED shall
pursue the following objectives: (a) relevant and quality higher education within an
international milieu; (b) accessible and affordable higher education programs; (c) resolute
academic freedom to promote intellectual growth, learning, research and development to
produce high quality leaders and professionals; and (d) moral ascendancy for better
governance within its ranks and the entire higher education system.

Higher Education Budget.  The total 2015 proposed budget for Social Services will
amount to P938.8 billion of which P450.2 billion (48.0%) is for Education, Culture and
Manpower Development.  The proposed 2015 budget of CHED at P3.6 billion accounts for
0.4% of Social Services budget.  On the other hand, the 2015 proposed budget for State
Universities and Colleges (SUCs) amounting to P43.3 billion is roughly 4.6% of total Social
Services budget (Table 1).

On the aggregate, higher education budget for 2015 shall increase by 1.8% from the 2014
level although this increase is much lower from the 2013-2014 increase of 14.3%.  While
CHED experienced a substantial increase of 165.2% in 2013-2014, its proposed budget for
2015 implies a 55.6% decline from 2014.  Meanwhile, SUCsâ proposal for 2015 shall expand
higher education institutionsâ (HEIs) budget by 13.8%, a significant increase from 2013-2014
expansion of 2.1% (Table 1).

Result should be this:

Code:
The Commission on Higher Education (CHED) was created through Republic Act 7722 otherwise known as the Higher Education Act of 1994. Consistent with the governmentâs position of making education the central strategy for poverty reduction, the CHED shall pursue the following objectives: (a) relevant and quality higher education within an international milieu; (b) accessible and affordable higher education programs; (c) resolute academic freedom to promote intellectual growth, learning, research and development to produce high quality leaders and professionals; and (d) moral ascendancy for better governance within its ranks and the entire higher education system.

Higher Education Budget.  The total 2015 proposed budget for Social Services will amount to P938.8 billion of which P450.2 billion (48.0%) is for Education, Culture and Manpower Development.  The proposed 2015 budget of CHED at P3.6 billion accounts for 0.4% of Social Services budget.  On the other hand, the 2015 proposed budget for State Universities and Colleges (SUCs) amounting to P43.3 billion is roughly 4.6% of total Social Services budget (Table 1).

On the aggregate, higher education budget for 2015 shall increase by 1.8% from the 2014 level although this increase is much lower from the 2013-2014 increase of 14.3%.  While CHED experienced a substantial increase of 165.2% in 2013-2014, its proposed budget for 2015 implies a 55.6% decline from 2014.  Meanwhile, SUCsâ proposal for 2015 shall expand higher education institutionsâ (HEIs) budget by 13.8%, a significant increase from 2013-2014 expansion of 2.1% (Table 1).

Thanks
# 2  
Old 10-09-2016
Hi,

Can you please try the following one?
Code:
sed ':a;N;$!ba;s/\n\n/XZX/g;s/\n/ /g;s/XZX/\n\n/g' file

i tested for given input and gives your desired output.

if you think you might get XZX part of file contents, can try below one:
Code:
sed -e '/./{H;$!d;}' -e 'x;s/\n//g;s/$/\n/' file

In awk,
Code:
awk 'BEGIN{ ORS=RS="\n\n";} {gsub(/\n/," ")}1'  file


Last edited by greet_sed; 10-09-2016 at 07:35 AM.. Reason: add other solutions
# 3  
Old 10-09-2016
Hi,

This line works perfectly on your sample file :
Code:
awk -v RS='\n\n' '{$1=$1}1' file.txt  > ocr.tmp

# 4  
Old 10-09-2016
thanks for the reply anyway i have sample file in my google drive public share Update your browser to use Google Drive - Drive Help

When I apply the script to the sample text files from google drive public share. file content is in one line only
# 5  
Old 10-09-2016
yes, as per your post#1 input, between paragraphs you had blank line. However in post#4, files shared in googledrive, has a space in those lines. Hence those commands didnt perform what you wanted.

Here is what i have tried from 1181.txt ( part of the contents taken from this file ).
Code:
sed -e '/[a-zA-Z0-9]/{H;$!d;}' -e 'x;s/\n//g;s/$/\n/' ~/Downloads/1181.txt > OUT

I got the desired output. Modify the command as per your need.
This User Gave Thanks to greet_sed For This Post:
# 6  
Old 10-09-2016
Quote:
Originally Posted by blastit.fr
Hi,

This line works perfectly on your sample file :
Code:
awk -v RS='\n\n' '{$1=$1}1' file.txt  > ocr.tmp

Note: only GNU awk and mawk can use a Regex expression for the RS variable. Other awks only take a single (leftmost) character
for RS...

An alternative that should work in any modern Posix awk and nawk:
Code:
awk -v RS="" '{$1=$1}1' file.txt

These 2 Users Gave Thanks to Scrutinizer For This Post:
# 7  
Old 10-09-2016
Thanks I will Try this tomorrow
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,... (3 Replies)
Discussion started by: budz26
3 Replies

2. Shell Programming and Scripting

Getting lines from .txt file

Hi I have a file with contents: NAMES John carrey williams How can I get all the names and store them in seperate variables(or arrays) please keep in mind that the no. of such names is not known.Three here is a bogus value ~thanks (4 Replies)
Discussion started by: leghorn
4 Replies

3. UNIX for Dummies Questions & Answers

Join lines in a file????

Hello UNIX gurus, I am new to the world of UNIX. Hopefully I am submitting my question at the right forum. Here is my dilemma - I have a file with contents like this - "line1","Hello","World","Today is a wonderful day","yes it is" "line2","Hello","World","Today is a beautiful day","oh... (8 Replies)
Discussion started by: foolishbar
8 Replies

4. Shell Programming and Scripting

join lines in file

I have a file like this: --------------------------------------------------------------- 26 00:04:48,440 --> 00:04:51,440 I don't know why he can't just do the Great Apache Flaming Arrow Act. 27 00:04:52,440 --> 00:04:54,839 Didn't you tell him to use the gopher snake? 28... (1 Reply)
Discussion started by: thailand
1 Replies

5. UNIX for Dummies Questions & Answers

find lines in file1.txt not found in file2.txt memory problem

I have a diff command that does what I want but when comparing large text/log files, it uses up all the memory I have (sometimes over 8gig of memory) diff file1.txt file2.txt | grep '^<'| awk '{$1="";print $0}' | sed 's/^ *//' Is there a better more efficient way to find the lines in one file... (5 Replies)
Discussion started by: raptor25
5 Replies

6. Shell Programming and Scripting

merging two .txt files by alternating x lines from file 1 and y lines from file2

Hi everyone, I have two files (A and B) and want to combine them to one by always taking 10 rows from file A and subsequently 6 lines from file B. This process shall be repeated 40 times (file A = 400 lines; file B = 240 lines). Does anybody have an idea how to do that using perl, awk or sed?... (6 Replies)
Discussion started by: ink_LE
6 Replies

7. Shell Programming and Scripting

sed to cp lines x->y from 1.txt into lines a->b in file2.txt

I have one base file, and multiple target files-- each have uniform line structure so no need to use grep to find things-- can just define sections by line number. My question is quite simple-- can I use sed to copy a defined block of lines (say lines 5-10) from filename1.txt to overwrite an... (3 Replies)
Discussion started by: czar21
3 Replies

8. Shell Programming and Scripting

how do i join two lines from a input file

HI all, i have a input file,i have to join 2nd line into firstline and 4th line into 2nd line and so on.. for this we also consider odd no. of line.It's operate on original input file but output file should temp file. like .. filename=cdr.cfg line1 line2 line3 line4Desired output should be... (9 Replies)
Discussion started by: suryanarayan
9 Replies

9. UNIX for Dummies Questions & Answers

Output text from 1st paragraph in file w/ a specific string through last paragraph of file w/ string

Hi, I'm trying to output all text from the first paragraph in a file that contains a specific string through the last paragraph in that file that contains that string. Previously, I was outputting just each paragraph with that search string with: cat in_file | nawk '{RS=""; FS="\n";... (2 Replies)
Discussion started by: carpenn
2 Replies

10. Shell Programming and Scripting

join on a file with multiple lines, fields

I've looked at the join command which is able to perform what I need on two rows with a common field, however if I have more than two rows I need to join all of them. Thus I have one file with multiple rows to be joined on an index number: 1 randomtext1 2 rtext2 2 rtext3 3 rtext4 3 rtext5... (5 Replies)
Discussion started by: crimper
5 Replies
Login or Register to Ask a Question