Sponsored Content
Full Discussion: Remove dupes in a large file
Top Forums Shell Programming and Scripting Remove dupes in a large file Post 303024638 by MadeInGermany on Saturday 13th of October 2018 12:45:46 PM
Old 10-13-2018
In case there is a RAM shortage, the following variant helps (saves some bytes per line).
Code:
awk '!($0 in X) { print; X[$0] }' file > file.dedup

This User Gave Thanks to MadeInGermany For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

remove a large number of user from oracle

Hi on solaris and oracle 10g2, I have number of users created in Oracle, I wonder if I have a list of the usernames will it be possible to remove the users quickly ? I want to keep the users access to system but oracle. some thing like shell script may be ?:confused: I am trying to... (4 Replies)
Discussion started by: upengan78
4 Replies

2. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies

3. Shell Programming and Scripting

remove a specific line in a LARGE file

Hi guys, i have a really big file, and i want to remove a specific line. sed -i '5d' fileThis doesn't really work, it takes a lot of time... The whole script is supposed to remove every word containing less than 5 characters and currently looks like this: #!/bin/bash line="1"... (2 Replies)
Discussion started by: blubbiblubbkekz
2 Replies

4. Shell Programming and Scripting

Remove Duplicate Filenames in 2 very large directories

Hello Gurus, O/S RHEL4 I have a requirement to compare two linux based directories for duplicate filenames and remove them. These directories are close to 2 TB each. I have tried running a: Prompt>diff -r data1/ data2/ I have tried this as well: jason@jason-desktop:~$ cat script.sh ... (7 Replies)
Discussion started by: jaysunn
7 Replies

5. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

6. UNIX for Dummies Questions & Answers

Filtering F-Dupes

Is there an easy way to tell FDupes what filetypes to look at or ignore? (0 Replies)
Discussion started by: furashgf
0 Replies

7. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'... (17 Replies)
Discussion started by: makn
17 Replies

8. Shell Programming and Scripting

remove large portion of web page code between two tags

Hi everybody, I am trying to remove bunch of lines from web pages between two tags: one is <h1> and the other is <table it looks like <h1>Anniversary cards roses</h1> many lines here <table summary="Free anniversary greeting cards." cellspacing="8" cellpadding="8" width="70%">my goal... (5 Replies)
Discussion started by: georgi58
5 Replies

9. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of... (6 Replies)
Discussion started by: gimley
6 Replies

10. Shell Programming and Scripting

Modify script to remove dupes with two delimiters

Hello, I have a script which removes duplicates in a database with a single delimiter = The script is given below: # script to remove dupes from a row with structure word=word BEGIN{FS="="} {for(i=1;i<=NF;i++){a++;}for(i in a){b=b"="i}{sub("=","",b);$0=b;b="";delete a}}1 How do I modify... (6 Replies)
Discussion started by: gimley
6 Replies
Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME
Locale::Codes::LangFam - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangFam; $lext = code2langfam('apa'); # $lext gets 'Apache languages' $code = langfam2code('Apache languages'); # $code gets 'apa' @codes = all_langfam_codes(); @names = all_langfam_names(); DESCRIPTION
The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in ISO 639-5. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5 language family codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langfam('apa','alpha'); $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages. This is the default code set. ROUTINES
code2langfam ( CODE [,CODESET] ) langfam2code ( NAME [,CODESET] ) langfam_code2code ( CODE ,CODESET ,CODESET2 ) all_langfam_codes ( [CODESET] ) all_langfam_names ( [CODESET] ) Locale::Codes::LangFam::rename_langfam ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangFam::add_langfam ( CODE ,NAME [,CODESET] ) Locale::Codes::LangFam::delete_langfam ( CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_alias ( NAME ,NEW_NAME ) Locale::Codes::LangFam::delete_langfam_alias ( NAME ) Locale::Codes::LangFam::rename_langfam_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::delete_langfam_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-5/id.php ISO 639-5 . AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Codes::LangFam(3pm)
All times are GMT -4. The time now is 02:03 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy