awk input large file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk input large file
# 1  
Old 05-21-2013
awk input large file

Hi...Does anyone know how to input huge file about 25 GB to awk

if single file then this works

Code:
awk '{print}' <hugefile

suppose if have to use something like this

Code:
awk FNR==NR{x[++a]=$0;next}{print $0,x[FNR]}' hugefile1 hugefile2

then how to redirect ? and is there any provision to assign memory for awk program ?


those who know kindly answer

Last edited by Akshay Hegde; 05-21-2013 at 07:32 AM..
# 2  
Old 05-21-2013
I am not 100% sure here.. However awk/gzip and other similar utilities uses /var or /tmp filesystem to store the file and awk is largefile aware so you shouldnt get any issues if the place where you are redirecting has enough space..

you getting any error message?
# 3  
Old 05-21-2013
I am not getting any error msg

I tried to redirected like this
Code:
awk FNR==NR{x[++a]=$0;next}{print $0,x[FNR]}' <hugefile1 <hugefile2 >outputfile

output file was empty, so I thought I may be doing something wrong while inputing
# 4  
Old 05-21-2013
<hugefile1 <hugefile2 it should be just hugefile1 hugefile2
# 5  
Old 05-21-2013
Ok I will try, but I guess I will get error msg, do you think array can hold 25GB data ?
# 6  
Old 05-21-2013
Quote:
Originally Posted by Akshay Hegde
Ok I will try, but I guess I will get error msg, do you think array can hold 25GB data ?
Never tried so cant comment SmilieSmilie
# 7  
Old 05-22-2013
I am gonna try now, I will post result..

---------- Post updated at 06:15 AM ---------- Previous update was at 06:07 AM ----------

---------- Post updated at 11:04 PM ---------- Previous update was at 06:15 AM ----------

Hi Vidyadhar85....finally I received this error msg

Code:
cmd. line:1: (FILENAME=t2.tmp FNR=26245271) fatal: dupnode: r->stptr: can't allocate 14 bytes of memory (Cannot allocate memory)

Code:
awk 'FNR==NR{
		FLG[++i]=$0
		next
	    }
	    {
		print $0,FLG[FNR]
	    }' t2.tmp t1.tmp >out.dat

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|" My requirement- can anyone provide me the fastest and best to get the below results Number of records of the file First column and second Column- Unique counts Thanks for your time Karti ------ Post updated at... (3 Replies)
Discussion started by: kartikirans
3 Replies

2. Shell Programming and Scripting

Can't input large file to variable

I have a file that I'm trying to place into a variable. I've tried the following: DATE=`date +%Y%m%d` filez=$(cat /tmp/Test_$DATE.txt)DATE=`date +%Y%m%d` filez=$(</tmp/Test_$DATE.txt)DATE=`date +%Y%m%d` filez=$(`cat /tmp/Test_$DATE.txt`)None of these lines allows the file to go into the... (3 Replies)
Discussion started by: newbie2010
3 Replies

3. Shell Programming and Scripting

Use while loop to read file and use ${file} for both filename input into awk and as string to print

I have files named with different prefixes. From each I want to extract the first line containing a specific string, and then print that line along with the prefix. I've tried to do this with a while loop, but instead of printing the prefix I print the first line of the file twice. Files:... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

4. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

5. Shell Programming and Scripting

sed and awk not working on a large record file

Hi All, I have a very large single record file. abc;date||bcd;efg|......... pqr;stu||record_count;date when i do wc -l on this file it gives me "0" records, coz of missing line feed. my problem is there is an extra pipe that is coming at the end of this record like... (6 Replies)
Discussion started by: Gurkamal83
6 Replies

6. Shell Programming and Scripting

Need to delete large set of files (i.e) close to 100K from a directory based on the input file

Hi all, I need a script to delete a large set of files from a directory under / based on an input file and want to redirect errors into separate file. I have already prepared a list of files in the input file. Kndly help me. Thanks, Prash (36 Replies)
Discussion started by: prash358
36 Replies

7. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

8. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ... (16 Replies)
Discussion started by: JRy
16 Replies

9. Shell Programming and Scripting

Updating a line in a large csv file, with sed/awk?

I have an extremely large csv file that I need to search the second field, and upon matches update the last field... I can pull the line with awk.. but apparently you cant use awk to directly update the file? So im curious if I can use sed to do this... The good news is the field I want to... (5 Replies)
Discussion started by: trey85stang
5 Replies

10. Shell Programming and Scripting

Reading large file, awk and cut

Hello all, I have 2 files, the first (indexFile1) contains start offset and length for each record inside the second file. The second file can be very large, each actual record start offset and length is defined by the entry in indexFile1. Since there are no records separators wc-l returns 0 for... (1 Reply)
Discussion started by: gio001
1 Replies
Login or Register to Ask a Question