Sponsored Content
Top Forums Shell Programming and Scripting find numeric duplicates from 300 million lines.... Post 302669683 by methyl on Wednesday 11th of July 2012 08:06:39 AM
Old 07-11-2012
Is the data in any particular order? Did it come from a database?

Do you have a database engine? Some processes are just not suitable for Shell tools.
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

duplicates lines with one column different

Hi I have the following lines in a file SANDI108085FRANKLIN WRAP 7285 SANDI109514ZIPLOC STRETCH N SEAL 7285 SANDI110198CHOICE DM 0911 SANDI111144RANDOM WEIGHT BRAND 0704 SANDI111144RANDOM WEIGHT BRAND 0738... (10 Replies)
Discussion started by: dhanamurthy
10 Replies

2. Shell Programming and Scripting

How to delete lines in a file that have duplicates or derive the lines that aper once

Input: a b b c d d I need: a c I know how to get this (the lines that have duplicates) : b d sort file | uniq -d But i need opossite of this. I have searched the forum and other places as well, but have found solution for everything except this variant of the problem. (3 Replies)
Discussion started by: necroman08
3 Replies

3. Shell Programming and Scripting

Tail 86000 lines from 1.2 million line file?

I have a log file that is about 1.2 million lines long and about 300MB. we need a way to clean up this file and only keep the last few thousand lines. if i use tail command we run our of memory as the file is too big. I do have a key word to match on. example, we want to keep every line... (8 Replies)
Discussion started by: robsonde
8 Replies

4. UNIX for Dummies Questions & Answers

Find and Replace random numeric value with non-numeric value

Can someone tell me how to change the first column in a very large 17k line file from a random 10 digit numeric value to a non numeric value. The format of lines in the file is: 1702938475,SNU022,201004 the first 10 numbers always begin with 170 (6 Replies)
Discussion started by: Bahf1s
6 Replies

5. UNIX for Dummies Questions & Answers

Only print lines with 3 numeric values

Hey guys & gals, I am hoping for some advice on a sed or awk command that will allow to only print lines from a file that contain 3 numeric values. From previous searches here I saw that ygemici used the sed command to remove lines containing more than 3 numeric values ; however how... (3 Replies)
Discussion started by: TAPE
3 Replies

6. UNIX for Dummies Questions & Answers

Help with changing header of tsv with 30 million lines

Hi My 30 million line file has a header chr start end strand ref_context repeat_masked s1_smpl_context s1_c_count s1_ct_count s1_non_ct_count s1_m% s1_score s1_snp s1_indels s2_smpl_context s2_c_count s2_ct_count s2_non_ct_count s2_m% s2_score s2_snp s2_indels ... (2 Replies)
Discussion started by: plumb_r
2 Replies

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Hi, I have a file (sorted by sort) with 8 tab delimited columns. The first column contains duplicated fields and I need to merge all these identical lines. My input file: comp100002 aaa bbb ccc ddd eee fff ggg comp100003 aba aba aba aba aba aba aba comp100003 fff fff fff fff fff fff fff... (5 Replies)
Discussion started by: falcox
5 Replies

8. Shell Programming and Scripting

Fast processing(mv command) of 1 million+ files using find, mv and xargs

Hi, I'd like to ask if anybody can help improve my code to move 1 million+ files from a directory to another: find /source/dir -name file* -type f | xargs -I '{}' mv {} /destination/dir I learned this line of code from this forum as well and it works fine. However, file movement is kinda... (6 Replies)
Discussion started by: agentgrecko
6 Replies
LOCATE(1)						    BSD General Commands Manual 						 LOCATE(1)

NAME
locate -- find files SYNOPSIS
locate [-d dbpath] pattern DESCRIPTION
locate searches a database for all pathnames which match the specified pattern. The database is recomputed periodically, and contains the pathnames of all files which are publicly accessible. Shell globbing and quoting characters (``*'', ``?'', ``'', ``['' and ``]'') may be used in pattern, although they will have to be escaped from the shell. Preceding any character with a backslash (``'') eliminates any special meaning which it may have. The matching differs in that no characters must be matched explicitly, including slashes (``/''). As a special case, a pattern containing no globbing characters (``foo'') is matched as though it were ``*foo*''. Options: -d dbpath Sets the list of databases to search to dbpath which can name one or more database files separated by ``:'', an empty component in the list represents the default database. The environment variable LOCATE_PATH has the same effect. EXIT STATUS
locate exits with a 0 if a match is found, and >0 if no match is found or if another problem (such as a missing or corrupted database file) is encountered. FILES
/var/db/locate.database Default database /usr/libexec/locate.updatedb Script to update database. SEE ALSO
find(1), fnmatch(3), weekly.conf(5) Woods, James A., "Finding Files Fast", ;login, 8:1, pp. 8-10, 1983. HISTORY
The locate command appeared in 4.4BSD. BSD
April 5, 2003 BSD
All times are GMT -4. The time now is 07:20 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy