Call two files and merge all entities to create all combos


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Call two files and merge all entities to create all combos
# 1  
Old 01-26-2015
Call two files and merge all entities to create all combos

Hello,
I am preparing an expanded verb morphology of Indian languages for the Open Source Community and have developed two files. The first file called root contains the verbal roots and the second (called prefix) contains all the syntactic elements which can be appended to the root file.
An example from English will make this clear: The root file contains all the root verbs of English (around 4000 in number). A sample for -ing enders is given below:
Code:
wrapping
wreaking
wreathing
wrecking
wrenching
wresting
wrestling
wriggling
writhing
yammering
yanking
yapping

The prefix file contains all verbal elements which can be tagged on to the root file. These vary around 75 to hundred such elements. A small sample is given below
Code:
am
is
are
has been
have been
was
were
had been
will be
will have been
am not
is not
are not

The idea is to take the prefix file and appended to each word form of the root, all the prefixes and store the full set of all combinations of prefix+root in a new file. A small example is given below:
Code:
am yanking
is yanking
are yanking
has been yanking
have been yanking
was yanking
am writhing
is writhing
are writhing
has been writhing
have been writhing
was writhing

My expertise in PERL and AWK has been restricted to working with a file and I do not know how to call two files and merge them to generate out all forms. I would appreciate and so would the Community if someone could help me out, (if possible with a commented script, to help me learn) and show me how this can be done.
Many thanks
# 2  
Old 01-26-2015
I do not think you need awk or perl or any other external program - a few lines of shell code should suffice:

Code:
#! /bin/ksh93

IFS='\n\n'
set -A achPrefix $(</path/to/prefixfile)

typeset    fIn="/path/to/verbfile"
typeset    fOut="/path/to/output"
typeset    chVerb=""
typeset -i iPrefixCnt=0

exec 3>"$fOut"

grep -ve '^#' -ve '^$' |\
while read chVerb ; do
     (( iPrefixCnt = 0 ))
     while [ $iPrefixCnt -lt ${#achPrefix[@]} ] ; do
          print -u3 - "${achPrefix[$iPrefixCnt]} $chVerb"
          (( iPrefixCnt += 1 ))
     done
done

exec 3>&-

exit 0

I hope this helps.

bakunin

PS: thanks to hergp for that clever mechanism of reading a file into an array

PPS: here is the commented version:
Code:
#! /bin/ksh93

IFS='\n\n'
set -A achPrefix $(</path/to/prefixfile)                       # create array with prefixes from file

typeset    fIn="/path/to/verbfile"                             # the filename of the verbfile
typeset    fOut="/path/to/output"                              # filename of output file
typeset    chVerb=""                                           # buffer variable
typeset -i iPrefixCnt=0                                        # counter

exec 3>"$fOut"                                                 # open the output file for writing

grep -ve '^#' -ve '^$' |\                                      # this removes commented and empty
                                                               # lines. Add more preprocessing if
                                                               # necessary
while read chVerb ; do                                         # read the output of grep line by line
                                                               # and put the line into variable chVerb
     (( iPrefixCnt = 0 ))                                      # reset the counter
     while [ $iPrefixCnt -lt ${#achPrefix[@]} ] ; do           # cycle through all the elements of the
                                                               # "achPrefix" array, ${#achPrefix[@]} is
                                                               # the number of elements in the array
          print -u3 - "${achPrefix[$iPrefixCnt]} $chVerb"      # print the current prefix plus the current
                                                               # verb to the output file (we opened FD3 for
                                                               # this above and now use it)
          (( iPrefixCnt += 1 ))                                # increment the prefix counter
     done
done

exec 3>&-                                                      # close the output file in the end

exit 0


Last edited by bakunin; 01-26-2015 at 11:30 AM..
This User Gave Thanks to bakunin For This Post:
# 3  
Old 01-26-2015
How about this

Code:
awk 'NR==FNR { a[$0]; next } { for (i in a) print i, $0 }' prefix root

A slower bash solution , perhaps easier to follow

Code:
while read a; 
do 
while read b; 
do 
echo "$b $a"; 
done < prefix; 
done < root


Last edited by senhia83; 01-26-2015 at 11:31 AM..
This User Gave Thanks to senhia83 For This Post:
# 4  
Old 01-26-2015
Hi,
For fun, a xargs + sed solution:
Code:
xargs -IXX -a prefix sed -e 's/$/ XX/' root

Regards.
These 2 Users Gave Thanks to disedorgue For This Post:
# 5  
Old 01-26-2015
For mere mortals:
Code:
#!/bin/sh
while read root                                                               
do
  while read prefix
  do
    printf "%s %s\n" "$prefix" "$root"
  done < prefixfile
done < rootfile

This User Gave Thanks to MadeInGermany For This Post:
# 6  
Old 01-26-2015
Many thanks to all for their kind and ever so prompt help. I tried out all solutions and they all work ever so fast.
# 7  
Old 01-26-2015
I have to apologize, somehow the filename fell victim to the copying and pasting from the terminal to here.

Quote:
Originally Posted by bakunin
Code:
grep -ve '^#' -ve '^$' |\

This line should read

Code:
grep -ve '^#' -ve '^$' "$fIn" |\

of course.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files and generate a resume in two files

Dear Gents, Please I need your help... I need small script :) to do the following. I have a thousand of files in a folder produced daily. I need first to merge all files called. txt (0009.txt, 0010.txt, 0011.txt) and and to output a resume of all information on 2 separate files in csv... (14 Replies)
Discussion started by: jiam912
14 Replies

2. Shell Programming and Scripting

Checking in a directory how many files are present and basing on that merge all the files

Hi, My requirement is,there is a directory location like: :camp/current/ In this location there can be different flat files that are generated in a single day with same header and the data will be different, differentiated by timestamp, so i need to verify how many files are generated... (10 Replies)
Discussion started by: srikanth_sagi
10 Replies

3. Shell Programming and Scripting

Merge CSV files and create a column with the filename from the original file

Hello everyone!! I am not completely new to shell script but I havent been able to find the answer to my problem and I'm sure there are some smart brains here up for the challenge :D. I have several CSV files that I need to combine into one, but I also need to know where each row came from.... (7 Replies)
Discussion started by: fransanchezoria
7 Replies

4. Shell Programming and Scripting

SCRIPT TO TRAP ILLEGAL COMBOS

Hello, I am trying to identify names which are "illegal" in the sense that they do not comply with the spelling norms of a culture. I have written NGrams for initial and final combos which are illegal. These are lists stored in 2 files named Initial and Final. Here are few... (2 Replies)
Discussion started by: gimley
2 Replies

5. Shell Programming and Scripting

How to create and call mysql stored procedure in perl?

Hi, I want to create MySQL stored procedure and call the stored procedure using perl. I tried like this: use DBI; my $dbh = DBI->connect ("DBI:mysql:test", "root", "ibab", { RaiseError => 1, PrintError => 0}); $create_procedure =... (5 Replies)
Discussion started by: vanitham
5 Replies

6. Shell Programming and Scripting

Change Hex character strings to HTML entities

Hi! I am not a whiz at awk and very unsure about the aplication of awk solve my problem. I was hoping for some quick pointers so I can figure this out. I have a file that looks like so: label.Asked=\u8CEA\u554F\u6E08\u307F button.Edit=\u7DE8\u96C6... (3 Replies)
Discussion started by: pinnochio
3 Replies

7. UNIX for Dummies Questions & Answers

Create new simple System Call.

Hello to all! I am new with unix and i need your help to make something.. First of all i want to inform you that i am working under the Minix 3 OS. What i want to make is, a new system call in terms of the process manager. This system call should take an integer as a parameter (input) and... (1 Reply)
Discussion started by: kostis1904
1 Replies

8. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

9. UNIX for Dummies Questions & Answers

How do I get past an HTML::entities discrepancy on an RPM?

I have an RPM that I am trying to install and it keeps coming back with: I know I could kill the bird by throwing a "yum install *perl*" at it, but this seems like hurling a skyscraper at an ant... any better suggestions? (2 Replies)
Discussion started by: jjinno
2 Replies
Login or Register to Ask a Question