I have a large database with English on the left hand side and Indic words on the left hand.
It so happens that since the Indic words have been entered by hand, there are duplicates in the entries.
The structure is as under:
A small sample will explain
As can be seen some duplicates in the Indicword are present:
I wrote an Awk script to remove such duplicates
However when the script runs, it mangles the output file.
What has gone wrong?
Many thanks for your kind help.
---------- Post updated at 12:46 AM ---------- Previous update was at 12:45 AM ----------
Sorry the English is on Lefthand and Indic on right hand separated by
.
Without showing us the output your hope to get from your sample input, without telling us whether or not the order of the indic glosses on the right side of the equal sign matters, without telling us what operating system you're using, and without telling us how the output you are currently getting is "mangled"; we can make lots of assumptions about what might be wrong that have absolutely nothing to do with what might or might not be your actual problem.
But, one thing that is obvious is that with with FS="=" the comma separated string on the right side of the equal sign in each input line is a single field. One might guess that you either want to split $2 on commas or you want to set FS using FS="[=,]" and loop through fields 2, through NF instead of 1 through NF.
Assuming that the order of the order of the indic glosses has to be kept as they appear in the input (only removing duplicated indic glosses), assuming that you're using a version of awk that conforms to the requirements stated by the POSIX standards, you might try replacing your awk code with:
which, with your sample input, produces the output:
If the output order of indic glosses on the right hand side doesn't matter, this code could be simplified.
This User Gave Thanks to Don Cragun For This Post:
Sorry, I should have been more clear.
I work under Windows and hence DOS.
Basically as you can see the dictionary has a structure
as shown in the sample below:
Since the database was made by hand at times, there are words repeated in the Indic glosses as shown in the sample below:
What I needed was an awk script to identify such repeated entries and delete the duplicate entry.
Thus the sample above would be reduced as under
I had written the following awk script to do the job:
However when I ran the script on the sample, it produced a mangled output:
I hope the above clarifies the situation. Identifying dupes visually is both time-consuming and prone to error.
---------- Post updated at 02:00 AM ---------- Previous update was at 01:57 AM ----------
By the time I had posted the clarifications, you had already replied. Many thanks, it worked and swept through a dictionary of 70,000 words and removed all the dupes.
I will now study the script to see where I went wrong
Many thanks. I tested the script and it worked beautifully.
The loop is an interesting feature
Thanks to all who so very kindly give their time to help out.
I would like to create the following script:
run a python script with setsid
python may or may not fail with exception
check if all of the group processes were terminated correctly
if not, kill the remaining processes
How can I do that?
Thanks a lot (3 Replies)
Hi,
since the upgrade to Gnome 3.6 (now i have 3.8) the authentication over LDAP stops working. The whole machine does not start anymore. The machine boot, but no gdm and no X. I can login, with root, but then the tty hangs. When i look at ttyF12 i see a lot of systemd service the runs random,... (1 Reply)
I'm trying to virtualize an instance of SCO Open Server 5.0.2c in VirtualBox (called VM- A) , I can not configure the network (NIC).
The NIC I'm using is PCnet -FAST III (Am79C973 ) (this NIC works with VirtualBox + SCO 5.0.5M)
When I add from ' Add new LAN adapter' I detects the NIC... (2 Replies)
Hi,
Since a year my libvirtd does not work anymore on my Gentoodesktop. In the meantime a used virtualbox. But I would like to have back libvirt. The problem was after libvirt should not only work with root privileges. I deinstalled all things with libvirt an kvm. I removed all things from /var... (4 Replies)
Hi
I am attempting to right a script which will read a table and extract specfic information.
LASTFAILEDJOB=/usr/openv/netbackup/scripts/GB-LDN/Junaid/temp_files/lastfailedjob
cat /usr/openv/netbackup/scripts/GB-LDN/Junaid/temp_files/lastfailedjob
237308646
If i run the following... (5 Replies)
Hi everyone,
I've been struggling with this for a few weeks now. I'm trying to debug a running process with dbx on an AIX box.
The command I'm using is 'dbx -a <pid> core'
There is a function I can perform in my application that crashes this process, but it does not show up as crashed in... (0 Replies)
I have the following data from a manual database dump. I need to format the columns so that I can import them into an excel spread sheet. So far I have been able to get past the hurdles with vi and grep. Now I have one last issue that I can't get past. Here is an example of the data.
Here is... (18 Replies)
I have a script which will take two file as the inputs and take the Value in file1 and search in file2 and give the output in Outputfile.
#!/bin/sh
#. ${HOME}/crossworlds/bin/CWSharedEnv.sh
FILE1=$1
FILE2=$2
for Var in $(cat $FILE1);do
echo $Var
grep -i "$Var" $FILE2
done > Outputfile
I... (2 Replies)
Hey,
I've made a little awk-script which reorders lines.
Works okay, only problem is that is doesn't process the first line correctly.
If I switch lines in the Input file it doesn't proces this first line either.
Somebody please help!
Here's is the code and the input file!
thanx
... (1 Reply)
I install vsftpd server on 2 SUSE 10.2 servers. The first works perfectly, but the second doesn't work how I expect. The second works only over local network and doesn't over internet. The vsftpd.conf and ../xinetd.d/vsftpd are the same in 2 servers. The only different was when I threw to log in... (1 Reply)