|
Just an idea
Why don't you split the file into small files of 1GB each. Then use Pederarbo's awk script to go through each one of the split files. And Awk being a stream editor there can be nothing faster to work on data than working on data streams.
After you are done with the cleansing of the files you could append them into a single file.
About splitting the files is just an idea and might save because you would be handling small sets of data flowing in one continuous stream than one large one of 10GB.
|