Quick way to select many records from a large file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Quick way to select many records from a large file
# 1  
Old 04-27-2015
Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
Code:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...

Another file is a key file, named key.txt, which is the list of some numbers in the first column of file records.txt.
Code:
28433004
28815001
...

There are about 0.2 million numbers in key.txt. Now I am trying to pick out the records from records.txt based on key.txt. I tried scripts below:

pick_records.s
Code:
foreach line (`cat key.txt`)
awk -v key="$line" '$1==key {print $0;exit}' records.txt
end

I ran the scripts by: source pick_records.s > output.txt

The scripts did the job but ran slow. I am wondering if there is more efficient way to achieve this task.

Thanks.

Last edited by vgersh99; 04-27-2015 at 08:08 PM.. Reason: code tags, please!
# 2  
Old 04-27-2015
Code:
awk 'FNR==NR {keys[$1];next} $1 in keys' key.txt records.txt

# 3  
Old 04-27-2015
Hello and welcome to the forum.

Please use code tags for code block examples, as required by the forum rules.

Not sure on the script, as i dont know csh or ksh.
In bash i'd do:
Code:
while read line;do
   awk -v key="$line" {/^key/} {print $0} records.txt
done<key.txt

Then execute it:
Code:
$SHELL pick_records.s > output.txt

Note that i find "pick_" quite irritating, since you dont pick (as in: select a single entry) anything of something, but run through everything (every line) found.

Half a million (or just 200'000) lines/entries do need their time to be done.
You could figure out a difference by adding time in front of executing the script.
Code:
time $SHELL pick-records.sh

Hope this helps (hth)
# 4  
Old 04-28-2015
Hi vgersh99 and sea,

Thanks for your solutions.

I tried vgersh99's one line awk command. It worked great and solved my problem in seconds. AmazingSmilie. Thanks again.
# 5  
Old 04-28-2015
If you can edit your key.txt and prefix each record with ^ and suffix each with $ then maybe you can use this with grep using the appropriate flags:-
Code:
grep -f key.txt records.txt

Does this help? There might be problems over the size of key.txt



Robin
# 6  
Old 05-05-2015
Hi rbatte1,

I tried the grep method too. It didn't work well. Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Select records and fields

Hi All I would like to modify a file like this: >antax gioq21 tris notes abcdefghij klmnopqrs >betax gion32 ter notes2 tuvzabcdef ahgskslsooin this: >tris abcdefghij klmnopqrs >ter tuvzabcdef ahgskslsoo So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

2. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

3. Shell Programming and Scripting

Block of records to select from a file

Hello: I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

4. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

5. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Dear list its my first post and i would like to greet everyone What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters. The file format is something like this 7 100 200 7 100 250 7 100 300 ... (1 Reply)
Discussion started by: Gtolis
1 Replies

6. Shell Programming and Scripting

How to Pick Random records from a large file

Hi, I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies

7. Shell Programming and Scripting

select records from one file based on a second file

Hi all: I have two files: file1: 74 DS 9871 199009871 1 1990 4 1 165200 Sc pr de te sa ox 1.0 1.0 13.0000 35.7560 5.950 3.0 3.0 13.0100 35.7550 5.970 ** 74 DS 99004 74DS99004 6738 1990 4 1 165200 Eb pr de te sa ox 1.0 1.0 13.0000 ... (7 Replies)
Discussion started by: rleal
7 Replies

8. Shell Programming and Scripting

Extract data from large file 80+ million records

Hello, I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file. What will be the besat and fastest way to extract the ne file. sample file format :--... (2 Replies)
Discussion started by: learner16s
2 Replies

9. UNIX for Dummies Questions & Answers

Quick question about finding a large file

what is the correct command for finding the largest file and displaying it without any error information? I can find it, but how do I display it in the same command? (6 Replies)
Discussion started by: raidkridley
6 Replies

10. Shell Programming and Scripting

Using a variable to select records with awk

As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers. The datafile is here > cat infile 12345 1 John Smith ... (5 Replies)
Discussion started by: joeyg
5 Replies
Login or Register to Ask a Question