Sponsored Content
Homework and Emergencies Emergency UNIX and Linux Support Help to make awk script more efficient for large files Post 302526565 by otheus on Wednesday 1st of June 2011 12:04:42 AM
Old 06-01-2011
lol, yeah, you were reinventing the wheel. Oh well.

sort -s -k 3,6

for instance, sorts on the 3rd through 6th fields of the input. The -s allows the sort to be "stable" so you can make another sort later and the ordering will be consistent.

You can use uniq -c to count the number of times each line appears in a sorted list. Other options give you the ability to count only repeated lines, only count unique lines, skip the first N fields, etc.

comm is pretty useful too, reporting on lines common to both files (or only in the first but not the second, or vice-versa)
 

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Is there a way to make this more efficient

I have the following code. printf "Test Message Report" > report.txt while read line do msgid=$(printf "%n" "$line" | cut -c1-6000| sed -e 's///g' -e 's|.*ex:Msg\(.*\)ex:Msg.*|\1|') putdate=$(printf "%n" "$line" | cut -c1-6000| sed -e 's///g' -e 's|.*PutDate\(.*\)PutTime.*|\1|')... (9 Replies)
Discussion started by: gugs
9 Replies

2. Shell Programming and Scripting

Sed or awk script to remove text / or perform calculations from large CSV files

I have a large CSV files (e.g. 2 million records) and am hoping to do one of two things. I have been trying to use awk and sed but am a newbie and can't figure out how to get it to work. Any help you could offer would be greatly appreciated - I'm stuck trying to remove the colon and wildcards in... (6 Replies)
Discussion started by: metronomadic
6 Replies

3. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Hi, I need some help creating a tidy shell program with awk or other language that will split large length files efficiently. Here is an example dump: <A001_MAIL.DAT> 0001 Ronald McDonald 01 H81 0002 Elmo St. Elmo 02 H82 0003 Cookie Monster 01 H81 0004 Oscar ... (16 Replies)
Discussion started by: mkastin
16 Replies

4. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

5. Programming

Help with make this Fortran code more efficient (in HPC manner)

Hi there, I had run into some fortran code to modify. Obviously, it was written without thinking of high performance computing and not parallelized... Now I would like to make the code "on track" and parallel. After a whole afternoon thinking, I still cannot find where to start. Can any one... (3 Replies)
Discussion started by: P_E_M_Lee
3 Replies

6. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

7. Shell Programming and Scripting

Combining awk command to make it more efficient

VARIABLE="jhovan 5259 5241 0 20:11 ? 00:00:00 /proc/self/exe --type=gpu-process --channel=5182.0.1597089149 --supports-dual-gpus=false --gpu-driver-bug-workarounds=2,45,57 --disable-accelerated-video-decode --gpu-vendor-id=0x80ee --gpu-device-id=0xbeef --gpu-driver-vendor... (3 Replies)
Discussion started by: SkySmart
3 Replies

8. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies
UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME
uniq -- report or filter out repeated lines in a file SYNOPSIS
uniq [-cdu] [-f fields] [-s chars] [input_file [output_file]] DESCRIPTION
The uniq utility reads the standard input comparing adjacent lines, and writes a copy of each unique input line to the standard output. The second and succeeding copies of identical adjacent input lines are not written. Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first. The following options are available: -c Precede each output line with the count of the number of times the line occurred in the input, followed by a single space. -d Don't output lines that are not repeated in the input. -f fields Ignore the first fields in each input line when doing comparisons. A field is a string of non-blank characters separated from adja- cent fields by blanks. Field numbers are one based, i.e. the first field is field one. -s chars Ignore the first chars characters in each input line when doing comparisons. If specified in conjunction with the -f option, the first chars characters after the first fields fields will be ignored. Character numbers are one based, i.e. the first character is character one. -u Don't output lines that are repeated in the input. If additional arguments are specified on the command line, the first such argument is used as the name of an input file, the second is used as the name of an output file. The uniq utility exits 0 on success, and >0 if an error occurs. COMPATIBILITY
The historic +number and -number options have been deprecated but are still supported in this implementation. SEE ALSO
sort(1) STANDARDS
The uniq utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible. BSD
January 6, 2007 BSD
All times are GMT -4. The time now is 08:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy