Removing dupes within 2 delimited areas in a large dictionary file Post: 302740925

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Issue with Removing Carriage Return (^M) in delimited file

Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like a,b, c c,d,e The delimited file is generated using sh script by outputing a SQL query result to...

2. Shell Programming and Scripting

Removing blanks in a text tab delimited file

Hi Experts I am very new to perl and need to make a script using perl. I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below : Input: 117 102 650 652 654 656 117 93 95...

3. Shell Programming and Scripting

Removing Embedded Newline from Delimited File

Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out. Solutions that I have tried...

4. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ...

5. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need...

6. Shell Programming and Scripting

Script Optimization - large delimited file, for loop with many greps

Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions. ls file.list>$split for gsfile in `cat $split`; do csplit...

7. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

8. Shell Programming and Scripting

Merging dupes on different lines in a dictionary

I am working on a homonym dictionary of names i.e. names which are clustered together according to their �sound-alike� pronunciation: An example will make this clear: Since the dictionary is manually constructed it often happens that inadvertently two sets of �homonyms� which should be grouped...

9. UNIX for Advanced & Expert Users

Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|" My requirement- can anyone provide me the fastest and best to get the below results Number of records of the file First column and second Column- Unique counts Thanks for your time Karti ------ Post updated at...

10. Shell Programming and Scripting

Remove dupes in a large file

I have a large file 1.5 gb and want to sort the file. I used the following AWK script to do the job !x++ The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted. Any solution to speed up the AWk script or a Perl script would...

LEARN ABOUT CENTOS

perltw

PERLTW(1)						 Perl Programmers Reference Guide						 PERLTW(1)

NAME

       perltw - XXXX Perl XX

DESCRIPTION

       XXXX Perl XXX!

       X 5.8.0 XXX, Perl XXXXXX Unicode (XXX) XX, XXXXXXXXXXXXXXXXXXX; CJK (XXX) XXXXXXXX.  Unicode XXXXXXX, XXXXXXXXXXXX: XXXX, XXXX, XXXXXXXX
       (XXX, XXXX, XXXX, XXXX, XXX, XXXX, XX). XXXXXXXXXXXXXX (X PC XXXX).

       Perl XXX Unicode XXXX. XXX Perl XXXXXXXXX Unicode XX; Perl XXXXXX (XXXXXXXXX) XXX Unicode XXXX.	XXXXXXX, XXXXX Unicode XXXXXXXXXXXX, Perl
       XXX Encode XXXX, XXXXXXXXXXXXXXXXXXX.

       Encode XXXXXXXXXXXXXXXXX ('big5' XX 'big5-eten'):

	   big5-eten   Big5 XX (XXXXXXX)
	   big5-hkscs  Big5 + XXXXX, 2001 XX
	   cp950       XXX 950 (Big5 + XXXXXXX)

       XXXX, X Big5 XXXXXXX Unicode, XXXXXXXX:

	   perl -Mencoding=big5,STDOUT,utf8 -pe1 < file.big5 > file.utf8

       Perl XXXX "piconv", XXXXX Perl XXXXXXXXXXX, XXXX:

	   piconv -f big5 -t utf8 < file.big5 > file.utf8
	   piconv -f utf8 -t big5 < file.utf8 > file.big5

       XX, XX encoding XX, XXXXXXXXXXXXXXXXX, XXXX:

	   #!/usr/bin/env perl
	   # XX big5 XXXX; XXXXXXXXXXXXX big5 XX
	   use encoding 'big5', STDIN => 'big5', STDOUT => 'big5';
	   print length("XX");	    #  2 (XXXXXXX)
	   print length('XX');	    #  4 (XXXXXXXX)
	   print index("XXXX", "XX"); # -1 (XXXXXXX)
	   print index('XXXX', 'XX'); #  1 (XXXXXXXXX)

       XXXXXXXX, "X" XXXXXXXX "X" XXXXXXXXXX Big5 XX "X"; "X" XXXXXXXXX "X" XXXXXXXXXX "X".  XXXXXX Big5 XXXXXXXXXXX.

   XXXXXXX
       XXXXXXXXXXX, XXX CPAN (<http://www.cpan.org/>) XX Encode::HanExtra XX. XXXXXXXXXXX:

	   cccii       1980 XXXXXXXXXXXX
	   euc-tw      Unix XXXXX, XX CNS11643 XX 1-7
	   big5plus    XXXXXXXXXXXXX Big5+
	   big5ext     XXXXXXXXXXXXX Big5e

       XX, Encode::HanConvert XXXXXXXXXXXXXXXX:

	   big5-simp   Big5 XXXXX Unicode XXXXXX
	   gbk-trad    GBK XXXXX Unicode XXXXXX

       XXX GBK X Big5 XXXX, XXXXXXXXX b2g.pl X g2b.pl XXXX, XXXXXXXXXXX:

	   use Encode::HanConvert;
	   $euc_cn = big5_to_gb($big5); # X Big5 XX GBK
	   $big5 = gb_to_big5($euc_cn); # X GBK XX Big5

   XXXXXX
       XXX Perl XXXXXXXXX (XXXXXXXXX), XXXXXXX Perl XXX, XX Unicode XXXXX. XX, XXXXXXXXX:

   XX Perl XXXXX
       <http://www.perl.com/>
	   Perl XXX (XXXXXXXX)

       <http://www.cpan.org/>
	   Perl XXXXX (Comprehensive Perl Archive Network)

       <http://lists.perl.org/>
	   Perl XXXXXX

   XX Perl XXX
       <http://www.oreilly.com.tw/product_perl.php?id=index_perl>
	   XXXXXXXXX Perl XX

       <http://groups.google.com/groups?q=tw.bbs.comp.lang.perl>
	   XX Perl XXXXX (XXXXX BBS X Perl XXX)

   Perl XXXXX
       <http://www.pm.org/groups/taiwan.html>
	   XX Perl XXXXX

       <irc://irc.freenode.org/#perl.tw>
	   Perl.tw XXXXX

   Unicode XXXX
       <http://www.unicode.org/>
	   Unicode XXXX (Unicode XXXXXX)

       <http://www.cl.cam.ac.uk/%7Emgk25/unicode.html>
	   Unix/Linux XX UTF-8 X Unicode XXX

   XXXXX
       XXXXXXX
	   <http://www.cpatch.org/>

       Linux XXXXXXX
	   <http://www.linux.org.tw/CLDP/>

SEE ALSO

       Encode, Encode::TW, encoding, perluniintro, perlunicode

AUTHORS

       Jarkko Hietaniemi <jhi@iki.fi>

       Audrey Tang (XX) <audreyt@audreyt.org>

perl v5.16.3							    2013-03-04								 PERLTW(1)