Removing dupes within 2 delimited areas in a large dictionary file Post: 302740273

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Issue with Removing Carriage Return (^M) in delimited file

Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like a,b, c c,d,e The delimited file is generated using sh script by outputing a SQL query result to...

2. Shell Programming and Scripting

Removing blanks in a text tab delimited file

Hi Experts I am very new to perl and need to make a script using perl. I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below : Input: 117 102 650 652 654 656 117 93 95...

3. Shell Programming and Scripting

Removing Embedded Newline from Delimited File

Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out. Solutions that I have tried...

4. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ...

5. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need...

6. Shell Programming and Scripting

Script Optimization - large delimited file, for loop with many greps

Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions. ls file.list>$split for gsfile in `cat $split`; do csplit...

7. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

8. Shell Programming and Scripting

Merging dupes on different lines in a dictionary

I am working on a homonym dictionary of names i.e. names which are clustered together according to their �sound-alike� pronunciation: An example will make this clear: Since the dictionary is manually constructed it often happens that inadvertently two sets of �homonyms� which should be grouped...

9. UNIX for Advanced & Expert Users

Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|" My requirement- can anyone provide me the fastest and best to get the below results Number of records of the file First column and second Column- Unique counts Thanks for your time Karti ------ Post updated at...

10. Shell Programming and Scripting

Remove dupes in a large file

I have a large file 1.5 gb and want to sort the file. I used the following AWK script to do the job !x++ The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted. Any solution to speed up the AWk script or a Perl script would...

LEARN ABOUT MOJAVE

locale::codes::langfam5.18

Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME

       Locale::Codes::LangFam - standard codes for language extension identification

SYNOPSIS

	  use Locale::Codes::LangFam;

	  $lext = code2langfam('apa');		       # $lext gets 'Apache languages'
	  $code = langfam2code('Apache languages');    # $code gets 'apa'

	  @codes   = all_langfam_codes();
	  @names   = all_langfam_names();

DESCRIPTION

       The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in
       ISO 639-5.

       Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5
       language family codes will be used.

SUPPORTED CODE SETS

       There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a
       constant that is automatically exported by this module.

       For example, the two are equivalent:

	  $lext = code2langfam('apa','alpha');
	  $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA);

       The codesets currently supported are:

       alpha
	   This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages.

	   This is the default code set.

ROUTINES

       code2langfam ( CODE [,CODESET] )
       langfam2code ( NAME [,CODESET] )
       langfam_code2code ( CODE ,CODESET ,CODESET2 )
       all_langfam_codes ( [CODESET] )
       all_langfam_names ( [CODESET] )
       Locale::Codes::LangFam::rename_langfam  ( CODE ,NEW_NAME [,CODESET] )
       Locale::Codes::LangFam::add_langfam  ( CODE ,NAME [,CODESET] )
       Locale::Codes::LangFam::delete_langfam  ( CODE [,CODESET] )
       Locale::Codes::LangFam::add_langfam_alias  ( NAME ,NEW_NAME )
       Locale::Codes::LangFam::delete_langfam_alias  ( NAME )
       Locale::Codes::LangFam::rename_langfam_code  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangFam::add_langfam_code_alias  ( CODE ,NEW_CODE [,CODESET] )
       Locale::Codes::LangFam::delete_langfam_code_alias  ( CODE [,CODESET] )
	   These routines are all documented in the Locale::Codes::API man page.

SEE ALSO

       Locale::Codes
	   The Locale-Codes distribution.

       Locale::Codes::API
	   The list of functions supported by this module.

       http://www.loc.gov/standards/iso639-5/id.php
	   ISO 639-5 .

AUTHOR

       See Locale::Codes for full author history.

       Currently maintained by Sullivan Beck (sbeck@cpan.org).

COPYRIGHT

	  Copyright (c) 2011-2013 Sullivan Beck

       This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.18.2							    2013-11-04					       Locale::Codes::LangFam(3pm)