Sponsored Content
Top Forums Shell Programming and Scripting Cleaning through perl or awk a Stemmer dictionary Post 302813247 by Chubler_XL on Sunday 26th of May 2013 10:47:23 PM
Old 05-26-2013
Here is a version that sorts:

Code:
awk '
BEGIN {RS=""}
{ root=$1;
  for(i=1;i<=NF;i++) if($i in LEM) root=LEM[$i]
  for(i=1;i<=NF;i++) if(!($i in LEM)) {
      LEM[$i]=root
      base[root]=base[root] OFS $i
  }
}
END {
  for(w in base) {
    forms=split(base[w], form);
    for(i=0;i<forms;i++)
      if(length(form[i])) print w","form[i];
    print w"?";
  }
}' infile | sort | awk -F, '{ print $2}'

This User Gave Thanks to Chubler_XL For This Post:
 

9 More Discussions You Might Find Interesting

1. AIX

doing some spring cleaning....

USERS="me you jim joe sue" for user in ${USERS}; do rmuser -p $user usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` echo Deleting: $user '\t' REMOVING: $usrdir done This is for AIX ONLY!!! but easily ported to... (0 Replies)
Discussion started by: Optimus_P
0 Replies

2. UNIX for Dummies Questions & Answers

Cleaning text files

I wish to clean a text file of the following characters 1/2, 1/4, o (degrees) I cant display these characters. I have tried ALT+189 etc (my terminal emulator is set to ASCII). How do I display the above ? I am using HP UX 10. (5 Replies)
Discussion started by: ferretman
5 Replies

3. UNIX for Dummies Questions & Answers

AWK Data Cleaning

Hello, I am trying to analyze data I recently ran, and the only way to efficiently clean up the data is by using an awk file. I am very new to awk and am having great difficulty with it. In $8 and $9, for example, I am trying to delete numbers that contain 1. I cannot find any tutorials that... (20 Replies)
Discussion started by: carmar87
20 Replies

4. Shell Programming and Scripting

File cleaning

HI , I am getting the source data as below. Source Data CDR_Data,,,,, F1,F2,F3,F4,F5,F6 5,5,6,7,8,7 6,6,g,,, 7,7,76,,, 8,8,gt,,, 9,9,df ,d,d,d ,,,,, (4 Replies)
Discussion started by: wangkc
4 Replies

5. Shell Programming and Scripting

cleaning the file

Hi, I have a file with multiple rows. each row has 8 columns. Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas. 1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G Thanks, Diya (3 Replies)
Discussion started by: Diya123
3 Replies

6. Shell Programming and Scripting

Cleaning AWK code

Hi I need some help to clean my code used to get city location. wget -q -O - http://www.ip2location.com/ | grep chkRegionCity | awk 'END { print }' | awk -F"" '{print $4}' It gives me the city but have a leading space. I am sure this could all be done by one single AWK Also if possible... (8 Replies)
Discussion started by: Jotne
8 Replies

7. Shell Programming and Scripting

Cleaning output using awk

I have some small problem with my code. data.html <TD class="statuscol2">c</TD> <TD class="statuscol3">18</TD> <TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD> <TD... (4 Replies)
Discussion started by: Jotne
4 Replies

8. Shell Programming and Scripting

OCR text that needs cleaning

Hi, I have OCR'ed text that needs cleaning. Lines are delimited by parts of speech (POS), for example, each line will have either an adj. OR s. f. OR s. m. etc I need to uppercase all text before the POS but all text within parentheses to be lowercase Text after (and including) the POS... (6 Replies)
Discussion started by: safran
6 Replies

9. Shell Programming and Scripting

awk xml dictionary script: could I get some input?

I completely understand if nobody wants to take a look at the ENTIRE code. What I am asking is that if anyone could browse quickly over the code and perhaps see if anything could be improved. You need not run the program, but you can if you want to. I have been using awk for about a week or so,... (2 Replies)
Discussion started by: bedtime
2 Replies
Test::use::ok(3)					User Contributed Perl Documentation					  Test::use::ok(3)

NAME
Test::use::ok - Alternative to Test::More::use_ok SYNOPSIS
use ok 'Some::Module'; DESCRIPTION
According to the Test::More documentation, it is recommended to run "use_ok()" inside a "BEGIN" block, so functions are exported at compile-time and prototypes are properly honored. That is, instead of writing this: use_ok( 'Some::Module' ); use_ok( 'Other::Module' ); One should write this: BEGIN { use_ok( 'Some::Module' ); } BEGIN { use_ok( 'Other::Module' ); } However, people often either forget to add "BEGIN", or mistakenly group "use_ok" with other tests in a single "BEGIN" block, which can create subtle differences in execution order. With this module, simply change all "use_ok" in test scripts to "use ok", and they will be executed at "BEGIN" time. The explicit space after "use" makes it clear that this is a single compile-time action. SEE ALSO
Test::More CC0 1.0 Universal To the extent possible under law, XX has waived all copyright and related or neighboring rights to Test-use-ok. This work is published from Taiwan. <http://creativecommons.org/publicdomain/zero/1.0> POD ERRORS
Hey! The above document had some coding errors, which are explained below: Around line 45: Non-ASCII character seen before =encoding in 'XX'. Assuming UTF-8 perl v5.18.2 2012-09-11 Test::use::ok(3)
All times are GMT -4. The time now is 05:46 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy