Sponsored Content
Full Discussion: awk to parse huge files
Top Forums Shell Programming and Scripting awk to parse huge files Post 302852719 by panyam on Thursday 12th of September 2013 09:25:06 AM
Old 09-12-2013
awk to parse huge files

Hello All,

I have a situation as below:

(1) Read a source file (a single file of 1.2 million rows in it )
(2) Read Destination files one by one and replace the content ( few fields in it ) with the corresponding matching field from source file.

I tried as below: ( please note I am not posting the complete code and just a sue-do code )

Code:
awk -F"|" 'NR==FNR { array[$1]=$2;next } {gsub('fields in dest file',array[field positions in dest file]),$0 } 
source_file dest_files*.dat

The flaw in the above code is , irrespective of whether there is a matching string or not , the row is getting printed and performance is also not good.

Any suggestions would be appreciated.

Regards,
Ravi
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 huge files wrt to a key using awk

Hi Folks, I need to compare two very huge file ( i.e the files would contain a minimum of 70k records each) using awk or sed. The comparison needs to be done with respect to a 'key'. For example : File1 ********** 1234|TONY|Y75634|20/07/2008 1235|TINA|XCVB56|30/07/2009... (13 Replies)
Discussion started by: Ranjani
13 Replies

2. Shell Programming and Scripting

Comparing two huge files

Hi, I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies

3. UNIX for Advanced & Expert Users

Huge files manipulation

Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text. I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump) In using HP-UX large servers. Any advice will... (8 Replies)
Discussion started by: Klashxx
8 Replies

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

5. Shell Programming and Scripting

awk script to parse results from TWO files

I am trying to parse two files and get data that does not match in one of the columns ( column 3 in my case ) Data for two files are as follows A.txt ===== abc 10 5 0 1 16 xyz 16 1 1 0 18 efg 30 8 0 2 40 ijk 22 2 0 1 25 B.txt ===== abc... (6 Replies)
Discussion started by: roger67
6 Replies

6. Shell Programming and Scripting

AWK failing to parse on certain files

Dear Unix Gurus, need your expertise to help troubleshoot a certain problem i'm having. I crated a shell script which will ftp get 1 crash log from multiple servers (listed in a text file). Each log will then be parsed by calling an awk script. The problem is, for certain log its parsing... (7 Replies)
Discussion started by: tarj
7 Replies

7. Shell Programming and Scripting

How to parse a huge 600MB zipped file?

I'm new to Unix, trying to parse a huge 600MB zipped file... I need to bzcat this file once and do some calculations (word count) on the lines based on certain criteria (see script) the correct result/output should be: column1=6 column2=4 the problem is that I'm getting column2=0 (see... (16 Replies)
Discussion started by: DeltaComp
16 Replies

8. Shell Programming and Scripting

awk does not work well with huge data?

Dear all , I found that if we work with thousands line of data, awk does not work perfectly. It will cut hundreds line (others are deleted) and works only on the remain data. I used this command : awk '$1==1{$1="Si"}{print>FILENAME}' coba.xyz to change value of first column whose value is 1... (4 Replies)
Discussion started by: ariesto
4 Replies

9. Shell Programming and Scripting

awk Parse And Create Multiple Files Based on Field Value

Hello: I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE. The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the... (7 Replies)
Discussion started by: ec012
7 Replies

10. Shell Programming and Scripting

Parse input of two files to be the same in awk

I have two files that I am going to use diff to find the differences but need to parse them before I do that. I have include the format of each file1 and file2 with the desired output of each (the first 5 fields in each file). The first file has a "chr" before the # that needs to be removed. I... (1 Reply)
Discussion started by: cmccabe
1 Replies
ldns(3) 						     Library Functions Manual							   ldns(3)

NAME
ldns_rr2wire, ldns_pkt2wire, ldns_rdf2wire- SYNOPSIS
#include <stdint.h> #include <stdbool.h> #include <ldns/ldns.h> ldns_status ldns_rr2wire(uint8_t **dest, const ldns_rr *rr, int, size_t *size); ldns_status ldns_pkt2wire(uint8_t **dest, const ldns_pkt *p, size_t *size); ldns_status ldns_rdf2wire(uint8_t **dest, const ldns_rdf *rdf, size_t *size); DESCRIPTION
ldns_rr2wire() Allocates an array of uint8_t at dest, and puts the wireformat of the given rr in that array. The result_size value contains the length of the array, if it succeeds, and 0 otherwise (in which case the function also returns NULL) If the section argument is LDNS_SECTION_QUESTION, data like ttl and rdata are not put into the result dest: pointer to the array of bytes to be created rr: the rr to convert size: the size of the converted result ldns_pkt2wire() Allocates an array of uint8_t at dest, and puts the wireformat of the given packet in that array. The result_size value contains the length of the array, if it succeeds, and 0 otherwise (in which case the function also returns NULL) ldns_rdf2wire() Allocates an array of uint8_t at dest, and puts the wireformat of the given rdf in that array. The result_size value con- tains the length of the array, if it succeeds, and 0 otherwise (in which case the function also returns NULL) dest: pointer to the array of bytes to be created rdf: the rdata field to convert size: the size of the converted result AUTHOR
The ldns team at NLnet Labs. Which consists out of Jelte Jansen and Miek Gieben. REPORTING BUGS
Please report bugs to ldns-team@nlnetlabs.nl or in our bugzilla at http://www.nlnetlabs.nl/bugs/index.html COPYRIGHT
Copyright (c) 2004 - 2006 NLnet Labs. Licensed under the BSD License. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. SEE ALSO
ldns_wire2rr, ldns_wire2pkt, ldns_wire2rdf. And perldoc Net::DNS, RFC1034, RFC1035, RFC4033, RFC4034 and RFC4035. REMARKS
This manpage was automaticly generated from the ldns source code by use of Doxygen and some perl. 30 May 2006 ldns(3)
All times are GMT -4. The time now is 06:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy