How to combining awk commands?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to combining awk commands?
# 1  
Old 02-20-2013
How to combining awk commands?

I can achieve two tasks with 2 different awk commands:
1)
Code:
awk -F";;WORD" '{print $2}' file | sed '/^$/d' #to find surface_word

2)
Code:
awk -F"bw:|gloss:" '// {print $2}'  file | sed '/\//!d; s:/[^+]*+*: + :g; s:^+::; s: *+ *$::;'  #to find segmentation of surface_word

Number 1) finds surface_word number x, then I expect 2) to find multiple after surface_word x, and before surface_word x+1.

Example
Code:
;;; SENTENCE A*AbthA
;;WORD A*AbthA
;;MADA: A*AbthA asp:p cas:na enc0:3fs_dobj gen:f mod:i num:s per:3 pos:verb prc0:0 prc1:0 prc2:0 prc3:0 stt:na vox:a
*0.887822 diac:>a*AbatohA lex:>a*Ab_1 bw:+>a*Ab/PV+at/PVSUFF_SUBJ:3FS+hA/PVSUFF_DO:3FS gloss:dissolve;melt;exhaust;consume 
_0.712209 diac:<i*AbatahA lex:<i*Abap_1 bw:+<i*Ab/NOUN+at/NSUFF_FEM_SG+a/CASE_DEF_ACC+hA/POSS_PRON_3FS gloss:dissolution 
_0.691945 diac:<i*AbatihA lex:<i*Abap_1 bw:+<i*Ab/NOUN+at/NSUFF_FEM_SG+i/CASE_DEF_GEN+hA/POSS_PRON_3FS gloss:dissolution 
_0.691778 diac:<i*AbatuhA lex:<i*Abap_1 bw:+<i*Ab/NOUN+at/NSUFF_FEM_SG+u/CASE_DEF_NOM+hA/POSS_PRON_3FS gloss:dissolution 
--------------
SENTENCE BREAK
--------------
;;; SENTENCE A$Abty
;;WORD A$Abty
;;MADA: A$Abty asp:na cas:u enc0:0 gen:f mod:na num:s per:na pos:noun prc0:0 prc1:0 prc2:0 prc3:0 stt:c vox:na
*0.862011 diac:>u$Abatayo lex:>u$Abap_1 bw:+>u$Ab/NOUN+atayo/NSUFF_FEM_DU_GEN_POSS gloss:alloy 
_0.862001 diac:>u$Abatayo lex:>u$Abap_1 bw:+>u$Ab/NOUN+atayo/NSUFF_FEM_DU_ACC_POSS gloss:alloy 
_0.855251 diac:>u$Abatiy lex:>u$Abap_1 bw:+>u$Ab/NOUN+at/NSUFF_FEM_SG+iy/POSS_PRON_1S gloss:alloy 
_0.776236 diac:>u$Abatay~a lex:>u$Abap_1 bw:+>u$Ab/NOUN+atayo/NSUFF_FEM_DU_GEN_POSS+ya/POSS_PRON_1S gloss:alloy 
_0.776235 diac:>u$Abatay~a lex:>u$Abap_1 bw:+>u$Ab/NOUN+atayo/NSUFF_FEM_DU_ACC_POSS+ya/POSS_PRON_1S gloss:alloy 
--------------

Sample desired output:
Code:
A*AbthA
>a*Ab + at + hA
<i*Ab + at + a + hA
<i*Ab + at + i + hA
<i*Ab + at + u + hA
A$Abty
>u$Ab + atayo
>u$Ab + atayo
>u$Ab + at + iy
>u$Ab + atayo + ya
>u$Ab + atayo + ya

It would be helpful to also modify my code to have the output in one line (when relevant), and to have "_+" instead of " + ".
Better output:
Code:
A*AbthA >a*Ab_+at_+hA <i*Ab_+at_+a_+hA <i*Ab_+at_+i_+hA <i*Ab_+at_+u_+hA
A$Abty >u$Ab_+atayo >u$Ab_+atayo >u$Ab_+at + iy >u$Ab_+atayo_+ya >u$Ab_+atayo_+ya


Last edited by Scrutinizer; 02-21-2013 at 03:11 AM.. Reason: Addtional code tags
# 2  
Old 02-20-2013
For output1:

Code:
awk '/;;WORD/ { print $2 }
/ lex:/ {
    sub(/.*:[+]/,"")
    gsub("/[^+]*[+]", " + ")
    sub("/[^+]*$","")
    print }' infile

For output 2:

Code:
awk '/;;WORD/ {printf "%s%s", t++?"\n":"", $2 }
/ lex:/ {
    sub(/.*:[+]/,"")
    gsub("/[^+]*[+]", "_+")
    sub("/[^+]*$","")
    printf "%s", OFS $0 }
END { printf "\n" }' infile


Last edited by Chubler_XL; 02-20-2013 at 08:22 PM.. Reason: Added code for output1
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 02-21-2013
What if I have an input file that is larger than 2 lines?
About 2 millions of ";;WORD"

Thanks!
# 4  
Old 02-21-2013
I can't see that will be a problem, output is generated as it is read in, so no limits should be exceed.

It will take longer to run and the output file will be bigger.
# 5  
Old 02-21-2013
If the position is always left, add '^' for beginning of line to reduce scanning. You could grep up front and pipe into awk so the work is divided.
# 6  
Old 02-23-2013
The only issue here is when I ran a file with >10 ";;WORD", I got the follow output:
Code:
A*AbthA >a*Ab_+at_+hA <i*Ab_+at_+a_+hA <i*Ab_+at_+i_+hA <i*Ab_+at_+u_+hA
A$Abty >u$Ab_+atayo >u$Ab_+atayo >u$Ab_+at_+iy >u$Ab_+atayo_+ya >u$Ab_+atayo_+ya
A*AbwA >a*Ab_+uwA
$A$AbwyAs
A$Abyty
AAd
$A$Ad
A$Ad >a$Ad_+a _0.872887 diac:>u$Adu lex:>a$Ad_1 bw:>u_+$Ad_+u _0.851391 diac:>u$Ad~u lex:$Ad~_1 bw:>u_+$Ad~_+u _0.836867 diac:>u$Ada lex:>a$Ad_1 bw:>u_+$Ad_+a _0.815236 diac:>u$Ad~a lex:$Ad~_1 bw:>u_+$Ad~_+a _0.815182 diac:>u$Ad~a lex:$Ad~_1 bw:>u_+$Ad~_+a
A*Ad
A$AdA >a$Ad_+A
AADAfAt

As you can see, the 4th line before last, I still have line post "bw:", while I only want the token after bw.
The first 3 lines of the output are precisely what I am looking for.
The lines where I have only 1 Word, that means there isn't "bw:|gloss:"
# 7  
Old 02-24-2013
If I got your quite complex requirement correctly, translating your two awk commands in post #1, this might do the job in one go as requested:
Code:
awk     '/;;WORD/       {if (LINE) print LINE           # if LINE already filled (i.e. NOT the first occurrence)
                         LINE = $2}                     # on WORD occurrence start a new LINE
          /bw:/          {gsub (/.*bw:| .*$/, "")       # eliminate everything  before "bw:" and e.th. after first space (greedy regex)
                         gsub (/\/[^+]*(\+|$)/, "_+")   # process "/" and "+" terminated strings
                         gsub (/^\+|_\+ *$/, "")        # eliminate leading and trailing "+"s
                         LINE = LINE" "$0               # add to output LINE
                        }
         END            {print LINE}                    # print last line
        ' file
A*AbthA >a*Ab_+at_+hA <i*Ab_+at_+a_+hA <i*Ab_+at_+i_+hA <i*Ab_+at_+u_+hA
A$Abty >u$Ab_+atayo >u$Ab_+atayo >u$Ab_+at_+iy >u$Ab_+atayo_+ya >u$Ab_+atayo_+ya

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining 2 commands

Hello all, I need to send an attachment and text in the body, both in the same Email. Below are two cammand that send the required data in separate Emails. I need to combine them so that I get just 1 Email containing the attachment & text in the body. uuencode ${filename} "${file_}" |... (6 Replies)
Discussion started by: Junaid Subhani
6 Replies

2. UNIX for Dummies Questions & Answers

awk script combining mutiple commands

Hi, I am pretty new to the unix community and have encountered a problem that I am trying to solve. I have 2 files one of which is called passwd file that looks like the following Sample Output daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh ... (1 Reply)
Discussion started by: raven905
1 Replies

3. UNIX for Dummies Questions & Answers

Help with combining the ls and 'file' commands

I have a directory of 3000 files without extensions (Solaris 5.10). I would like to iterate the file names through the 'file' command and output their mime types (most are pdf or jpg, but a very few might be psd or swf which show simply as 'data') So, I would like the output of the 'ls'... (2 Replies)
Discussion started by: pwallace
2 Replies

4. UNIX for Dummies Questions & Answers

Combining resukts of ls commands

Hi, I have a directory with some XML files in it. I can use wildcards to get the list of XMLs I want say I have following XMLs in same dir Employee1.xml Employee2.xml Employee3.xml and Salary1.xml Salary2.xml Salary3.xml apart from other .txt .dat files etc I want to write a unix... (7 Replies)
Discussion started by: dsrookie
7 Replies

5. UNIX for Dummies Questions & Answers

Combining two commands that use sar.

hey can anyone tell me how can i combine these two commands so that it is executed only once, but gives me both the results. IDLE=`sar 30 6 | grep Average | awk '{print $1 $5}' ` sar 30 120 | awk '{print $1" "$5}' >> mailx -m -s "$MSG" xyz@abc.com. (5 Replies)
Discussion started by: Ankur Khatri
5 Replies

6. Shell Programming and Scripting

Combining multiple commands

Hi Guys, I am looking to optimze these 5 SSH lines to a single SSH to get my machine to not hang! lol! cat hosts.lst | xargs -n1 -t -i echo 'home/util/timeout 6 0 ssh -q {} top -b > util/{}.top &' >> r_query_info cat hosts.lst | xargs -n1 -t -i echo 'home/util/timeout 6 0 ssh -q {} uname -r... (5 Replies)
Discussion started by: wick3dsunny
5 Replies

7. UNIX for Advanced & Expert Users

Combining two commands.

Is there anyway to achieve "find /home -name "*.bashrc" 2>/dev/null" and "PS1="\n>"" in the same command? I just wanna add a line to the previous command to change the PS1 variable to ">". (1 Reply)
Discussion started by: raidkridley
1 Replies

8. UNIX for Dummies Questions & Answers

combining commands

Hello all, I am trying to list and count all the files of a particular type in any given directory. I can use the commands separately but when I combine them they do not give an output. The command for counting the files is ls -1 | wc -l and for listing all the file of particular type say... (2 Replies)
Discussion started by: BigTool4u2
2 Replies

9. UNIX for Dummies Questions & Answers

combining sed commands

I would like to change the lines: originalline1 originalline2 to: originalline1new originalline1newline originalline2new originalline2newline To do this, id like to combine the commands: sed 's/^/&new/g' file > newfile1 and sed '/^/ a\\ newline\\ \\ (2 Replies)
Discussion started by: Dave724001
2 Replies

10. Shell Programming and Scripting

combining unix commands and awk program

Dear Experts I am trying to find if it is possible to combine unix commands in awk program. For example if it is possible embed rm or ls or any unix command inside the awk program and while it is reading the file besides printing be able to do some unix commands. I am thinking may be just print... (2 Replies)
Discussion started by: Reza Nazarian
2 Replies
Login or Register to Ask a Question