Need optimized awk/perl/shell to give the statistics for the Large delimited file


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|"

My requirement- can anyone provide me the fastest and best to get the below results

Number of records of the file
First column and second Column- Unique counts

Thanks for your time
Karti

------ Post updated at 04:03 PM ------

Correction -

Number of records of the file
First column and second Column- Distinct column values , not the counts.
# 3  
Thanks , I need to redirect the Distinct column1 and column2 to dis_col1.txt and dis_col2.txt files. File SIze is Huge ( 24 G). Appreciate for your quick reply and time
# 4  
something like:

Code:
awk -F\| '!a[$1]++ { print $1 > "dis_col1.txt"; } !b[$2]++ { print $2 > "dis_col2.txt"; } END { print NR; }' file

This User Gave Thanks to neutronscott For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Removing dupes within 2 delimited areas in a large dictionary file
gimley
Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of...... Shell Programming and Scripting
6
Shell Programming and Scripting
Script Optimization - large delimited file, for loop with many greps
verge
Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions. ls file.list>$split for gsfile in `cat $split`; do csplit...... Shell Programming and Scripting
17
Shell Programming and Scripting
Extracting a portion of data from a very large tab delimited text file
Lucky Ali
Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need...... Shell Programming and Scripting
2
Shell Programming and Scripting
Large pipe delimited file that I need to add CR/LF every n fields
clintrpeterson
I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ...... Shell Programming and Scripting
2
Shell Programming and Scripting
Trim String in 3rd Column in Tab Delimited File...SED/PERL/AWK?
rickdini
Hey Everybody, I am having much trouble figuring this out, as I am not really a programmer..:mad: Datafile.txt Column0 Column1 Column2 ABC DEF xxxGHI I am running using WGET on a cronjob to grab a datafile, but I need to cut the first three characters from...... UNIX for Dummies Questions & Answers
6
UNIX for Dummies Questions & Answers