find numeric duplicates from 300 million lines.... Post: 302669709

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

duplicates lines with one column different

Hi I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738...

2. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem.

3. Shell Programming and Scripting

Tail 86000 lines from 1.2 million line file?

I have a log file that is about 1.2 million lines long and about 300MB. we need a way to clean up this file and only keep the last few thousand lines. if i use tail command we run our of memory as the file is too big. I do have a key word to match on. example, we want to keep every line...

4. UNIX for Dummies Questions & Answers

Find and Replace random numeric value with non-numeric value

Can someone tell me how to change the first column in a very large 17k line file from a random 10 digit numeric value to a non numeric value. The format of lines in the file is: 1702938475,SNU022,201004 the first 10 numbers always begin with 170

5. UNIX for Dummies Questions & Answers

Only print lines with 3 numeric values

Hey guys & gals, I am hoping for some advice on a sed or awk command that will allow to only print lines from a file that contain 3 numeric values. From previous searches here I saw that ygemici used the sed command to remove lines containing more than 3 numeric values ; however how...

6. UNIX for Dummies Questions & Answers

Help with changing header of tsv with 30 million lines

Hi My 30 million line file has a header chr start end strand ref_context repeat_masked s1_smpl_context s1_c_count s1_ct_count s1_non_ct_count s1_m% s1_score s1_snp s1_indels s2_smpl_context s2_c_count s2_ct_count s2_non_ct_count s2_m% s2_score s2_snp s2_indels ...

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff...

8. Shell Programming and Scripting

Fast processing(mv command) of 1 million+ files using find, mv and xargs

Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another: find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir I learned this line of code from this forum as well and it works fine. However, file movement is kinda...

LEARN ABOUT NETBSD

cap_mkdb

CAP_MKDB(1)						    BSD General Commands Manual 					       CAP_MKDB(1)

NAME

     cap_mkdb -- create capability database

SYNOPSIS

     cap_mkdb [-b | -l] [-v] [-f outfile] file1 [file2 ...]

DESCRIPTION

     cap_mkdb builds a hashed database out of the getcap(3) logical database constructed by the concatenation of the specified files.

     The database is named by the basename of the first file argument and the string ``.db''.  The getcap(3) routines can access the database in
     this form much more quickly than they can the original text file(s).

     The ``tc'' capabilities of the records are expanded before the record is stored into the database.

     The options are as follows:

	   -b	   Use big-endian byte order for database metadata.

	   -f outfile
		   Specify a different database basename.

	   -l	   Use little-endian byte order for database metadata.

	   -v	   Print out the number of capability records in the database.

     The -b and the -l flags are mutually exclusive.  The default byte ordering is the current host order.

FORMAT

     The following is a description of the hashed database created by cap_mkdb.  For a description of the format of the input files see
     termcap(5).

     Each record is stored in the database using two different types of keys.

     The first type is a key which consists of the first capability of the record (not including the trailing colon (``:'')) with a data field
     consisting of a special byte followed by the rest of the record.  The special byte is either a 0 or 1, where a 0 means that the record is
     okay, and a 1 means that there was a ``tc'' capability in the record that couldn't be expanded.

     The second type is a key which consists of one of the names from the first capability of the record with a data field consisting a special
     byte followed by the first capability of the record.  The special byte is a 2.

     In normal operation names are looked up in the database, resulting in a key/data pair of the second type.	The data field of this key/data
     pair is used to look up a key/data pair of the first type which has the real data associated with the name.

EXIT STATUS

     The cap_mkdb utility exits 0 on success and >0 if an error occurs.

SEE ALSO

     dbopen(3), getcap(3), termcap(5)

BSD
								   June 6, 1993 							       BSD