Sponsored Content
Full Discussion: awk eating too much memory?
Top Forums Shell Programming and Scripting awk eating too much memory? Post 302560822 by binlib on Saturday 1st of October 2011 01:31:39 PM
Old 10-01-2011
Three ways of computing the number of deletions, additions and unchanged, experiment with your data and OS to see which is the best:
Code:
# generate raw data
awk -v n=1e6 '
BEGIN {
  srand()
  while (--n > 0)
    printf("abc%dzzz\n", n*rand()) > ARGV[1 + (rand() < 0.55)]
  exit
}
' old.raw new.raw

printf "method:\tdeleted\tadded\tunchanged\n"

# method 1
awk '
NR == FNR {
  if (!($0 in a)) { ++o; a[$0] = -1 }
  next
}
{
  if ((x = ++a[$0]) > 1) next
  if (x < 1) { ++c; a[$0] = 1 }
  else if (x < 2) ++e
  #print
}
END { printf("awk:\t%d\t%d\t%d\n", o - c, e, c) }
' old.raw new.raw #> n.awku

# method 2
sort -u old.raw > o.sortu
oc=$(wc -l < o.sortu)
sort -u new.raw > n.sortu
nc=$(wc -l < n.sortu)
all=$(sort -mu o.sortu n.sortu |wc -l)
printf "sort:\t%d\t%d\t%d\n" $((all-nc)) $((all-oc)) $((oc+nc-all)) 

# method 3
comm o.sortu n.sortu | awk -F'\t' '
 { if ($1)++a; else if ($2) ++b; else ++c }
 END { printf("comm:\t%d\t%d\t%d\n", a, b, c) }'

This User Gave Thanks to binlib For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Hosting Service Eating Space

Dear Group, I am not much used to UNIX. The company I am hosting wiht refuses to help me with this trouble, but as near as I can see, it is NOT my trouble. I have had this service for over a year. I just renewed for another year and all of a sudden the disk quota has been disappearing. I... (3 Replies)
Discussion started by: cindy
3 Replies

2. UNIX for Dummies Questions & Answers

Eating memory

Hello I run Gentoo Linux on my computer: Athlon XP 1700+ ~1,46 mhz 512 mb ram After a while, my computer works really slow, and when I cat /proc/meminfo, I see that I only have 8mb of 512 mb free! How is that possible? I dont run anything I can think of that eats that amount of... (4 Replies)
Discussion started by: Maestin
4 Replies

3. UNIX for Dummies Questions & Answers

/proc is eating my disk man

hi I have an sun ultra 5 running a firewall which has logging enabled (essential). The disk is sliced up with /proc on / (c0t0d0s0). / is sliced at 3 gig. My problem is this, one afternoon, a manager asked me to retrieve some firewall logs, so i went into the relevant directory (also on the /... (3 Replies)
Discussion started by: hcclnoodles
3 Replies

4. What is on Your Mind?

What are you eating ?

Hi, guys ! I was wondering... how many of you are vegetarians ? and why ? (31 Replies)
Discussion started by: Sergiu-IT
31 Replies

5. Solaris

This application is eating up the CPU

Hi, I am not very much fmiliar with Solaris OS. My main concern for posting is One application is eating 50% of CPU and I cannot run that application, If I perform any action in that application it takes real long time. I have solaris installed on my development machine.I have my application... (11 Replies)
Discussion started by: pandu345
11 Replies

6. Shell Programming and Scripting

Memory exhausted in awk

Dear All, I have executed a awk script in linux box which consists of 21 Million records.And i have two mapping files of 500 and 5200 records.To my surprise i found an error awk: cmd. line:19: (FILENAME=/home/FILE FNR=21031272) fatal: Memory exhausted. Is there any limitation for records... (3 Replies)
Discussion started by: cskumar
3 Replies

7. Solaris

Sendmail is eating high memory

Hi, I have installed sendmail on my solaris server. But sendmail its up high memory. its eat upto around 9-10 GB memory. What to do in this ? Thanks NeeleshG (6 Replies)
Discussion started by: neel.gurjar
6 Replies

8. Shell Programming and Scripting

[bash] IF is eating my loops

Hi! Could someone explain me why the below code is printing the contents of IF block 5 times instead of 0? #!/bin/bash VAR1="something" VAR2="something" for((i=0;i<10;i++)) do if(($VAR1=~$VAR2)) then echo VAR1: $VAR1 echo... (3 Replies)
Discussion started by: machinogodzilla
3 Replies

9. Shell Programming and Scripting

AWK Memory Limit ?

Is there an input file memory limit for awk? I have a 38Mb text file that I am trying to print out certatin lines and add a string to the end of that line. When I excute the script on the 38Mb file the string I am adding is put on a new line. If I do the same with a smaller file the... (3 Replies)
Discussion started by: cold_Que
3 Replies

10. Shell Programming and Scripting

how to find a job which is writing a big file and eating up space?

how to find a job which is writing a big file and eating up space? (3 Replies)
Discussion started by: rush2andy
3 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [ options ] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis- carded. These options are recognized: -an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -e s Replace empty output fields by string s. -jn m Join on the mth field of file n. If n is missing, use the mth field in each file. -o list Each output line comprises the fields specifed in list, each element of which has the form n.m, where n is a file number and m is a field number. -tc Use character c as a separator (tab character). Every appearance of c in a line is significant. SEE ALSO
sort(1), comm(1), awk(1) BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort. The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous. JOIN(1)
All times are GMT -4. The time now is 06:42 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy