Sponsored Content
Top Forums Shell Programming and Scripting Modify script to remove dupes with two delimiters Post 302990221 by gimley on Tuesday 24th of January 2017 12:07:27 AM
Old 01-24-2017
Modify script to remove dupes with two delimiters

Hello,
I have a script which removes duplicates in a database with a single delimiter
Code:
=

The script is given below:
Code:
# script to remove dupes from a row with structure word=word
BEGIN{FS="="}
{for(i=1;i<=NF;i++){a[$i]++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1

How do I modify the script to remove duplicates in a database with two
Code:
=

A small pseudo-sample is given below.
Code:
अ=m=Prefix signifying negation.
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection expressing contempt,unconcern,disbelief.
अंक=m=A figure;a mark.The thigh.An act of a play.
अंकगणित=n=Arithmetic.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection expressing contempt,unconcern,disbelief.
अंक=m=A figure;a mark.The thigh.An act of a play.
अंकगणित=n=Arithmetic.

I tried to modify the delimiter part in the script using
Code:
{FS="=""*'"="}

But it resulted in a totally garbled output
Since the file is very large, normal editors do not remove the dupes and hence the request
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Script to find the number of tab delimiters in a line

Hi, I need to find the number of tab delimiters in the first line of a file.So using word=`head -1 files.txt` I have extracted the first line of file into a variable word.It has 20 tab delimted columns.So can anyone help me in finding the number of delimiters? I am using csh and I am a... (5 Replies)
Discussion started by: poornimajayan
5 Replies

2. Shell Programming and Scripting

Script in SED and AWK so that it treats consecutive delimiters as one

Hi All, I am trying to cut to do a cut operation, but since there are seems to be more than one deltimiters in some occasion I am not able to get the exact field. Can you please provide an SED and AWK script for treating the source file in such a way that all consecutive delimiters are treated... (3 Replies)
Discussion started by: rakesh.su30
3 Replies

3. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Hello, I have two files. File1 or the master file contains two columns separated by a delimiter: a=b b=d e=f g=h File 2 which is the file to be processed has only a single column a h c b What I need is an awk script to identify unique names from file 2 which are not found in the... (6 Replies)
Discussion started by: gimley
6 Replies

4. UNIX for Dummies Questions & Answers

Remove two delimiters, space and double quotes

I would like to know how to replace a space delimiter with a ^_ (\037) delimiter and a double quote delimiter while maintaining the spaces inside the double quotes. The double quote delimiter is only used on text fields. I'd prefer a one-liner, but could handle a function or script that accepts... (4 Replies)
Discussion started by: SteveDWin
4 Replies

5. Shell Programming and Scripting

Script for identifying and deleting dupes in a line

I am compiling a synonym dictionary which has the following structure Headword=Synonym1,Synonym2 and so on, with each synonym separated by a comma. As is usual in such cases manual preparation of synonyms results in repeating the synonym which results in dupes as in the example below:... (3 Replies)
Discussion started by: gimley
3 Replies

6. Shell Programming and Scripting

Help in modifying existing Perl Script to produce report of dupes

Hello, I have a large amount of data with the following structure: Word=Transliterated word I have written a Perl Script (reproduced below) which goes through the full file and identifies all dupes on the right hand side. It creates successfully a new file with two headers: Singletons and Dupes.... (5 Replies)
Discussion started by: gimley
5 Replies

7. Shell Programming and Scripting

Remove newline character between two delimiters

hi i am having delimited .dat file having content like below. test.dat(5 line of records) ====== PT2~Stag~Pt2 Stag Test. Updated~PT2 S T~Area~~UNCEF R20~~2012-05-24 ~2014-05-24~~ PT2~Stag y~Pt2 Stag Test. Updated~PT2 S T~Area~METR~~~2012-05-24~2014-05-24~~test PT2~Pt2 Stag Test~~PT2 S... (4 Replies)
Discussion started by: sushine11
4 Replies

8. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Dear all, I have a large dictionary database which has the following structure source word=target word e.g. book=livre Since the database is very large in spite of all the care taken, it so happens that at times the source word is repeated e.g. book=livre book=tome Since I want to... (7 Replies)
Discussion started by: gimley
7 Replies

9. Shell Programming and Scripting

Remove dupes in a large file

I have a large file 1.5 gb and want to sort the file. I used the following AWK script to do the job !x++ The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted. Any solution to speed up the AWk script or a Perl script would... (4 Replies)
Discussion started by: gimley
4 Replies
Locale::Script(3pm)					 Perl Programmers Reference Guide				       Locale::Script(3pm)

NAME
Locale::Script - standard codes for script identification SYNOPSIS
use Locale::Script; $script = code2script('phnx'); # 'Phoenician' $code = script2code('Phoenician'); # 'Phnx' $code = script2code('Phoenician', LOCALE_CODE_NUMERIC); # 115 @codes = all_script_codes(); @scripts = all_script_names(); DESCRIPTION
The "Locale::Script" module provides access to standards codes used for identifying scripts, such as those defined in ISO 15924. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 15924 four-letter codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying scripts. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $script = code2script('phnx','alpha'); $script = code2script('phnx',LOCALE_SCRIPT_ALPHA); The codesets currently supported are: alpha, LOCALE_SCRIPT_ALPHA This is a set of four-letter (capitalized) codes from ISO 15924 such as 'Phnx' for Phoenician. It also includes additions to this set included in the IANA language registry. The Zxxx, Zyyy, and Zzzz codes are not used. This is the default code set. num, LOCALE_SCRIPT_NUMERIC This is a set of three-digit numeric codes from ISO 15924 such as 115 for Phoenician. ROUTINES
code2script ( CODE [,CODESET] ) script2code ( NAME [,CODESET] ) script_code2code ( CODE ,CODESET ,CODESET2 ) all_script_codes ( [CODESET] ) all_script_names ( [CODESET] ) Locale::Script::rename_script ( CODE ,NEW_NAME [,CODESET] ) Locale::Script::add_script ( CODE ,NAME [,CODESET] ) Locale::Script::delete_script ( CODE [,CODESET] ) Locale::Script::add_script_alias ( NAME ,NEW_NAME ) Locale::Script::delete_script_alias ( NAME ) Locale::Script::rename_script_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Script::add_script_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Script::delete_script_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.unicode.org/iso15924/ Home page for ISO 15924. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE). Copyright (c) 2001-2010 Neil Bowers Copyright (c) 2010-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Script(3pm)
All times are GMT -4. The time now is 07:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy