from one word for line to plain text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting from one word for line to plain text
# 1  
Old 12-17-2010
from one word for line to plain text

Hello!
I've got a very big file (from tokenization) which has one word for line.
How is it possible then to rebuild the "original" text, knowing that <s> and </s> are the sentence-delimiters?

My file looks like this:
Code:
<s>
&&
tanzania
na
Afrika
kwa
ujumla
ambiwa
na
taifa
kubwa
tajiri
zinduka
na
piga
mwendo
hima
saka
maendeleo
.
</s>
<s>
agizwa
na
fundishwa
fuata
wayo
za
nchi
endelea
wezesha
fikia
hapo
zili
;
</s>
<s>
nayo
itika
wito
huo
.
</s>
<s>
itika
kwa
sauti
kubwa
na
nidhamu
pya
kiasi
kwamba
wakati
moja
rais
wa
serikali
ya
awamu
ya
tatu
,
mheshimiwa
Benjamin
William
pa
,
tunukiwa
na
nchi
hizo
heshima
ya
wa
mwenyekiti
mwenza
wa
tume
ya
utandawazi
pamoja
na
waziri
kuu
wa
nchi
tajiri
ya
Finland
,
bibi
tarja
halonen
.
</s>
<s>
ingi
ona
kwamba
undwa
kwa
tume
hiyo
ni
moja
ya
mbinu
za
ingiza
nchi
(
maskini
)
za
dunia
ya
tatu
katika
mfumo
wa
ubepari
wa
taifa
,
kwa
kauli
mbiu
ya
*
ubia
katika
maendeleo
*
.
</s>
... ... ...

Thanks a lot for any help!
Mjomba

Moderator's Comments:
Mod Comment Use code tags please - you got a PM with guide how to do that, thanks.

Last edited by zaxxon; 12-17-2010 at 07:24 AM.. Reason: code tags
# 2  
Old 12-17-2010
Something like this?
Code:
awk '/<s>/{next} /<\/s>/{$0="\n"}1' ORS=" " file

# 3  
Old 12-17-2010
Thank you very much, Franklin52!
It does exactly what I need.
Just a little modification, if it's possible.
So to get rid of the blank chr which seems to be inserted at the beginning of each line.
Thanks again!
mjomba
# 4  
Old 12-17-2010
Try this:
Code:
awk '/<s>/{next} /<\/s>/{print s;s="";next} {s=length==1?s $0:s?s FS $0:$0}' file

# 5  
Old 12-17-2010
Can you explain this line of code for me?
# 6  
Old 12-17-2010
Quote:
Originally Posted by codecaine
Can you explain this line of code for me?
Code:
awk '/<s>/{next} /<\/s>/{print s;s="";next} {s=length==1?s $0:s?s FS $0:$0}' file

Explanation:
Code:
/<s>/{next} Skip the line with <s>
/<\/s>/{print s;s="";next} If the line matches with </s> print the variable s, empty the variable s and get the next line
s=length==1?s $0 If the length of the line is 1 (a dot) don't add a space between s and the line
:s?s FS $0:$0 else if s isn't empty add a space and the current line to s else if s is empty, s=the current line

These 2 Users Gave Thanks to Franklin52 For This Post:
# 7  
Old 12-17-2010
Thank you I appreciate it!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Plain text table to csv problems

I´m trying to input a plain text table and I want to output a csv file with semicolon field separator. I have some problems with the \r and the fields with more of one line.. Some gnu util to do this without problems or awk solution? I´m attaching the original plain table file and the needed... (2 Replies)
Discussion started by: faka
2 Replies

2. Shell Programming and Scripting

Modify one line in a plain text file

Hi everyone, I want to know, if there is a way to modify one line in a text file with unix script, with out re-writing all the file. For example, i have this file: CONFIGURATION_1=XXXX CONFIGURATION_2=YYYY CONFIGURATION_3=ZZZZ supose i have a command or function "modify" that... (7 Replies)
Discussion started by: Xedrox
7 Replies

3. UNIX for Dummies Questions & Answers

Script to add text before the first word on a line in a textfile.

How can i make a script to add text before the first word on a line in a textfile : Example: Old line: is my place New line: this is my place Please use and tags when posting code, data or logs etc. to preserve formatting and enhance readability, thanks. (3 Replies)
Discussion started by: mjanssen
3 Replies

4. Shell Programming and Scripting

get the fifth line of a text file into a shell script and trim the line to extract a WORD

FOLKS , i have a text file that is generated automatically of an another korn shell script, i want to bring in the fifth line of the text file in to my korn shell script and look for a particular word in the line . Can you all share some thoughts on this one. thanks... Venu (3 Replies)
Discussion started by: venu
3 Replies

5. Shell Programming and Scripting

How to find and print the last word of each line from a text file

Can any one help us in finding the the last word of each line from a text file and print it. eg: 1st --> aaa bbbb cccc dddd eeee ffff ee 2nd --> aab ered er fdf ere ww ww f the o/p should be a below. ee f (1 Reply)
Discussion started by: naveen_sangam
1 Replies

6. Shell Programming and Scripting

delete " from plain text files

Hi, sorry for bothering with this easy problem but I can't understand... I've a file like this: "4","0x23a3" "5","0x4234" "11","" "20","" "11132","0x6456" I would like to create a file like this: 4,23a3 5,4234 11,999999 20,999999 11132,6456 I've tried: cat INPUT.txt | sed -e... (7 Replies)
Discussion started by: TheMrOrange
7 Replies

7. Shell Programming and Scripting

Help need to cut the first word of a line in text file

Hi All, I would like help with a script which can get rid of the first work of all lines in text file. File 1 The name is Scott. Output : name is Scott ---------- Post updated at 02:38 PM ---------- Previous update was at 02:37 PM ---------- Hi ALL There is typo error in... (3 Replies)
Discussion started by: bubbly
3 Replies

8. Linux

Plain Text printing issues

I'm attempting to print to a networked konica printer. No linux drivers that I know of exist, but we've always used HP 5si drivers and have had good results. We just loaded a box up with CentOS 5, and now when we print any sort of file from the command line (lp -dkonica <filename>), the text is... (0 Replies)
Discussion started by: fender177
0 Replies

9. Shell Programming and Scripting

Can a shell script pull the first word (or nth word) off each line of a text file?

Greetings. I am struggling with a shell script to make my life simpler, with a number of practical ways in which it could be used. I want to take a standard text file, and pull the 'n'th word from each line such as the first word from a text file. I'm struggling to see how each line can be... (5 Replies)
Discussion started by: tricky
5 Replies

10. AIX

email from root sent my passord in plain text.

Root emailed me this message and thats ok it is supposed to. The thing that concerns me is that the ADMIN password came in plain text. I Xed it out for the purpose of this message of course. Is there a way for me to set this so the password comes encrypted? OR is not included at all in the... (4 Replies)
Discussion started by: rocker40
4 Replies
Login or Register to Ask a Question