Removing Dupes from huge file- awk/perl/uniq Post: 302623155

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Hey Guys, I have file which looks like this, Contig201#numbPA Contig1452#nmdynD6PA dm022p15.r#CG6461PA dm005e16.f#SpatPA IGU001_0015_A06.f#CG17593PA I need to remove duplicates based on the chracter matching upto '#'. for example if we consider this.. Contig201#numbPA...

2. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Hello, I have two files. File1 or the master file contains two columns separated by a delimiter: a=b b=d e=f g=h File 2 which is the file to be processed has only a single column a h c b What I need is an awk script to identify unique names from file 2 which are not found in the...

3. Shell Programming and Scripting

Help in modifying existing Perl Script to produce report of dupes

Hello, I have a large amount of data with the following structure: Word=Transliterated word I have written a Perl Script (reproduced below) which goes through the full file and identifies all dupes on the right hand side. It creates successfully a new file with two headers: Singletons and Dupes....

4. Shell Programming and Scripting

Awk to Count Multiple patterns in a huge file

5. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ...

6. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Hi, I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583...

7. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of...

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !!

9. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Dear all, I have a large dictionary database which has the following structure source word=target word e.g. book=livre Since the database is very large in spite of all the care taken, it so happens that at times the source word is repeated e.g. book=livre book=tome Since I want to...

10. Shell Programming and Scripting

Removing White spaces from a huge file

I am trying to remove whitespaces from a file containing sample data as: 457 <EOFD> Mar 1 2007 12:00:00:000AM <EOFD> Mar 31 2007 12:00:00:000AM <EOFD> system <EORD> 458 <EOFD> Mar 1 2007 12:00:00:000AM<EOFD>agf <EOFD> Apr 20 2007 9:10:56:036PM <EOFD> prodiws<EORD> . Basically these...

LEARN ABOUT DEBIAN

wrap-and-sort

WRAP-AND-SORT(1)					      General Commands Manual						  WRAP-AND-SORT(1)

NAME

       wrap-and-sort - wrap long lines and sort items in Debian packaging files

SYNOPSIS

       wrap-and-sort [options]

DESCRIPTION

       wrap-and-sort  wraps the package lists in Debian control files. By default the lists will only split into multiple lines if the entries are
       longer than 80 characters. wrap-and-sort sorts the package lists in Debian control files and all .install files. Beside that  wrap-and-sort
       removes trailing spaces in these files.

       This  script should be run in the root of a Debian package tree. It searches for control, control.in, copyright, copyright.in, install, and
       *.install in the debian directory.

OPTIONS

       -h, --help
	      Show this help message and exit.

       -a, --wrap-always
	      Wrap all package lists in the Debian control file even if the entries are shorter than 80 characters and could fit in one line line.

       -s, --short-indent
	      Only indent wrapped lines by one space (default is in-line with the field name).

       -b, --sort-binary-packages
	      Sort binary package paragraphs by name.

       -k, --keep-first
	      When sorting binary package paragraphs, leave the first one at the top.  Unqualified debhelper(7) configuration files are applied to
	      the first package.

       -n, --no-cleanup
	      Do not remove trailing whitespaces.

       -d path, --debian-directory=path
	      Location of the debian directory (default: ./debian).

       -f file, --file=file
	      Wrap  and sort only the specified file.  You can specify this parameter multiple times.  All supported files will be processed if no
	      files are specified.

       -v, --verbose
	      Print all files that are touched.

AUTHORS

       wrap-and-sort and this manpage have been written by Benjamin Drung <bdrung@debian.org>.

       Both are released under the ISC license.

DEBIAN
								 Debian Utilities						  WRAP-AND-SORT(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Discussion started by: sharatz83

2. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Discussion started by: gimley

3. Shell Programming and Scripting

Help in modifying existing Perl Script to produce report of dupes

Discussion started by: gimley

4. Shell Programming and Scripting

Awk to Count Multiple patterns in a huge file

Discussion started by: reach.sree@gmai

5. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Discussion started by: lathigara

6. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Discussion started by: Amit Pande

7. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Discussion started by: gimley

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

Discussion started by: Kannan K

9. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Discussion started by: gimley

10. Shell Programming and Scripting

Removing White spaces from a huge file

Discussion started by: amvip

LEARN ABOUT DEBIAN

wrap-and-sort