"Find common numbers from two very large files using awk or the like"

Post #302799595 by hanson44 on Friday 26th of April 2013 05:51:50 PM

Code:
$ cat file1
111111111111111
123456000017214
123456000017255
123456000018300
123456000100123
123456000100253
223456000001212
223456000013212

Code:
$ cat file2
123456000017214
123456000017255
123456000018300
123456000100123
123456000100253
223456000001212
223456000013212
999999999999999

Code:
$ cat scottie.sh
sed "s/.*/1 &/" file1 > file1.lbl
sed "s/.*/2 &/" file2 > file2.lbl
cat file1.lbl file2.lbl | sort -n -k 2 > all.lbl
uniq -d -f 1 all.lbl | cut -f 2 -d " "
rm file1.lbl file2.lbl all.lbl

Code:
$ ./scottie.sh
123456000017214
123456000017255
123456000018300
123456000100123
123456000100253
223456000001212
223456000013212

 
Test Your Knowledge in Computers #576
Difficulty: Medium
All programming languages have automatic garbage collection that monitors the dynamically allocated pieces of memory and determine if any variable in the program still references it. If the memory is no longer referenced, it is 'garbage' and becomes eligible to be 'collected'.
True or False?

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Get un common numbers from two files

Hi, I have two files: abc : 50040 123123 31703 cde: 104 97 50040 123123 31703 36609 50534 (3 Replies)
Discussion started by: jingi1234
3 Replies

2. Shell Programming and Scripting

To find all common lines from 'n' no. of files

Hi, I have one situation. I have some 6-7 no. of files in one directory & I have to extract all the lines which exist in all these files. means I need to extract all common lines from all these files & put them in a separate file. Please help. I know it could be done with the help of... (11 Replies)
Discussion started by: The Observer
11 Replies

3. Shell Programming and Scripting

Files common in two sets ??? How to find ??

Suppose we have 2 set of files set 1 set 2 ------ ------ abc hgb def ppp mgh vvv nmk sdf hgb ... (1 Reply)
Discussion started by: skyineyes
1 Replies

4. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

5. UNIX for Dummies Questions & Answers

how to find common words and take them out from two files

Hi, everyone, Let's say, we have xxx.txt A 1 2 3 4 5 C 1 2 3 4 5 E 1 2 3 4 5 yyy.txt A 1 2 3 4 5 B 1 2 3 4 5 C 1 2 3 4 5 D 1 2 3 4 5 E 1 2 3 4 5 First I match the first column I find intersection (A,C, E), then I want to take those lines with ACE out from yyy.txt, like A 1... (11 Replies)
Discussion started by: kaixinsjtu
11 Replies

6. Shell Programming and Scripting

Drop common lines at head/tail of a large set of files

Hi! I have a large set of pairs of text files (each pair in their own subdirectory) and each pair shares head/tail (a couple of first and last lines) but differs in the middle part. I need to delete the heads/tails and keep only the middle portions in which they differ. The lengths of heads/tails... (1 Reply)
Discussion started by: dobryden
1 Replies

7. UNIX for Advanced & Expert Users

Find common Strings in two large files

Hi , I have a text file in the format DB2: DB2: WB: WB: WB: WB: and a second text file of the format Time=00:00:00.473 Time=00:00:00.436 Time=00:00:00.016 Time=00:00:00.027 Time=00:00:00.471 Time=00:00:00.436 the last string in both the text files is of the... (4 Replies)
Discussion started by: kanthrajgowda
4 Replies

8. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

Find common numbers and print yes or no

Hi I have 2 files with following data First file, sp|Q676U5|A16L1_HUMAN, Autophagy-related protein 16-1 OS=Homo sapiens GN=ATG16L1 PE=1 SV=2, Maximum coiled-coil residue probability: 0.657 in position 163. Maximum dimeric residue probability: 0.288 in position 163. ... (1 Reply)
Discussion started by: manigrover
1 Replies
File::Find::Rule::Procedural(3) 			User Contributed Perl Documentation			   File::Find::Rule::Procedural(3)

NAME
File::Find::Rule::Procedural - File::Find::Rule's procedural interface SYNOPSIS
use File::Find::Rule; # find all .pm files, procedurally my @files = find(file => name => '*.pm', in => @INC); DESCRIPTION
In addition to the regular object-oriented interface, File::Find::Rule provides two subroutines for you to use. "find( @clauses )" "rule( @clauses )" "find" and "rule" can be used to invoke any methods available to the OO version. "rule" is a synonym for "find" Passing more than one value to a clause is done with an anonymous array: my $finder = find( name => [ '*.mp3', '*.ogg' ] ); "find" and "rule" both return a File::Find::Rule instance, unless one of the arguments is "in", in which case it returns a list of things that match the rule. my @files = find( name => [ '*.mp3', '*.ogg' ], in => $ENV{HOME} ); Please note that "in" will be the last clause evaluated, and so this code will search for mp3s regardless of size. my @files = find( name => '*.mp3', in => $ENV{HOME}, size => '<2k' ); ^ | Clause processing stopped here ------/ It is also possible to invert a single rule by prefixing it with "!" like so: # large files that aren't videos my @files = find( file => '!name' => [ '*.avi', '*.mov' ], size => '>20M', in => $ENV{HOME} ); AUTHOR
Richard Clamp <richardc@unixbeard.net> COPYRIGHT
Copyright (C) 2003 Richard Clamp. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
File::Find::Rule perl v5.16.3 2011-09-19 File::Find::Rule::Procedural(3)

Featured Tech Videos

All times are GMT -4. The time now is 09:21 PM.
Unix & Linux Forums Content Copyright 1993-2019. All Rights Reserved.
Privacy Policy