10-23-2012
I believe that 'sort -u' saves just the first occurrance of the unique key, so you sort first non-unique to get the right first record saved.
However, I agree it is a bit of a shame to use so much storage and processing when you could just tuck them in an associative array and overwrite any old values, especially in cases where the data starts out sorted in some relevant way. The only drawback is that the speed of shell operations might be a drag on big volume. You can scale up sort in parallel using pipes and sort -m, but for the unsorted lookup solution at machine speed, C++ or at least JAVA can work a hash table faster, and you can pre-size the hash table big enough to get good use of RAM and VM in even a 32 bit app. I like big powers of 2, since a modulus of the hash becomes a lower bit mask. Empty hash table entries are just pointers in an array, 4 or 8 bytes cost each, which is pretty cheap, and does no harm for smaller data sets! Hash beats tree for query and churn speed, but tree does provide sorted output and scales more automatically. Linear hash (tables in power of two sizes that can double the hash table size for congested buckets) has a better dynamic scaling, but slower query and churn than straight hash. I have not found a lot of hash implementations that reveal they are linear.
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi All,
I have a file 1.txt which has the duplicate dns entries as shown:
Name: 000f9fbc6738.net.in|Addresses: 10.241.66.169, 10.84.2.222,212.241.66.170
Name: 001371e8ed3e.net.in|Addresses: 10.241.65.153, 10.84.1.101
Name: 00e06f5bd42a.net.in|Addresses: 10.72.19.218,... (6 Replies)
Discussion started by: imas
6 Replies
2. Shell Programming and Scripting
My data is something like shown below.
date1 date2 aaa bbbb ccccc
date3 date4 dddd eeeeeee ffffffffff ggggg hh
I want the output like this
date1date2 aaa eeeeee
I serached in the forum but didn't find the exact matching solution. Please help. (7 Replies)
Discussion started by: rdhanek
7 Replies
3. Shell Programming and Scripting
Hi to all.
I'm trying to sort this with the Unix command sort.
user1:12345678:3.5:2.5:8:1:2:3
user2:12345679:4.5:3.5:8:1:3:2
user3:12345687:5.5:2.5:6:1:3:2
user4:12345670:5.5:2.5:5:3:2:1
user5:12345671:2.5:5.5:7:2:3:1
I need to get this:
user3:12345687:5.5:2.5:6:1:3:2... (7 Replies)
Discussion started by: daniel.gbaena
7 Replies
4. UNIX for Dummies Questions & Answers
Hi Everybody,
I am just new to UNIX as well as to this forum. I have a text file with 10,000 coloumns and each coloumn contains values separated by space. I want to separate them into new coloumns..the file is something like this
as ad af 1 A
as ad af 1 D
...
...
1 and A are in one... (7 Replies)
Discussion started by: Unilearn
7 Replies
5. UNIX for Advanced & Expert Users
Hello all -
I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies
6. Shell Programming and Scripting
Hi All, Need Suggestion, Want to sort a file using awk & sed to get required, output as below, such that each LUN shows correct WWPN and FA port Numbers correctly:
Required output:
01FB 10000000c97843a2 8C 0
01FB 10000000c96fb279 9C 0
22AF 10000000c97843a2 8C 0
22AF 10000000c975adbd ... (10 Replies)
Discussion started by: aix_admin_007
10 Replies
7. Shell Programming and Scripting
Input file:
100%ABC2 3.44E-12 USA
A2M%H02579 0E0 UK
100%ABC2 5.34E-8 UK
100%ABC2 3.25E-12 USA
A2M%H02579 5E-45 UK
Output file:
100%ABC2 3.44E-12 USA
100%ABC2 3.25E-12 USA
100%ABC2 5.34E-8 UK
A2M%H02579 0E0 UK
A2M%H02579 5E-45 UK
Code try:
sort -k1,1 -g -k2 -r input.txt... (2 Replies)
Discussion started by: perl_beginner
2 Replies
8. Shell Programming and Scripting
Hi Experts,
I have a filelist collected from another server , now want to sort the output using date/time stamp filed.
- Filed 6, 7,8 are showing the date/time/stamp.
Here is the input:
#----------------------------------------------------------------------
-rw------- 1 root ... (3 Replies)
Discussion started by: rveri
3 Replies
9. UNIX for Dummies Questions & Answers
Any good way to check if code has the required output
# /sbin/sysctl net.ipv4.icmp_echo_ignore_broadcasts
net.ipv4.icmp_echo_ignore_broadcasts = 1
/sbin/sysctl net.ipv4.icmp_echo_ignore_broadcasts | grep "= 1"
net.ipv4.icmp_echo_ignore_broadcasts = 1
What I can think of is above, and it... (16 Replies)
Discussion started by: alvinoo
16 Replies
10. Shell Programming and Scripting
I have the below contents in a file after making the below curl call
curl ... | grep -E "state|Rno" | paste -sd',\n' | grep "Disconnected" > test
"state" : "Disconnected",, "Rno" : "5554f1d2"
"state" : "Disconnected",, "Rno" : "10587563"
"state" : "Disconnected",, "Rno" :... (2 Replies)
Discussion started by: Vaibhav H
2 Replies
HASH(3) BSD Library Functions Manual HASH(3)
NAME
hash -- hash database access method
SYNOPSIS
#include <sys/types.h>
#include <db.h>
DESCRIPTION
The routine dbopen() is the library interface to database files. One of the supported file formats is hash files. The general description
of the database access methods is in dbopen(3), this manual page describes only the hash specific information.
The hash data structure is an extensible, dynamic hashing scheme.
The access method specific data structure provided to dbopen() is defined in the <db.h> include file as follows:
typedef struct {
u_int bsize;
u_int ffactor;
u_int nelem;
u_int cachesize;
u_int32_t (*hash)(const void *, size_t);
int lorder;
} HASHINFO;
The elements of this structure are as follows:
bsize The bsize element defines the hash table bucket size, and is, by default, 256 bytes. It may be preferable to increase the page size
for disk-resident tables and tables with large data items.
ffactor
The ffactor element indicates a desired density within the hash table. It is an approximation of the number of keys allowed to accu-
mulate in any one bucket, determining when the hash table grows or shrinks. The default value is 8.
nelem The nelem element is an estimate of the final size of the hash table. If not set or set too low, hash tables will expand gracefully
as keys are entered, although a slight performance degradation may be noticed. The default value is 1.
cachesize
A suggested maximum size, in bytes, of the memory cache. This value is only advisory, and the access method will allocate more mem-
ory rather than fail.
hash The hash element is a user defined hash function. Since no hash function performs equally well on all possible data, the user may
find that the built-in hash function does poorly on a particular data set. User specified hash functions must take two arguments (a
pointer to a byte string and a length) and return a 32-bit quantity to be used as the hash value.
lorder The byte order for integers in the stored database metadata. The number should represent the order as an integer; for example, big
endian order would be the number 4,321. If lorder is 0 (no order is specified) the current host order is used. If the file already
exists, the specified value is ignored and the value specified when the tree was created is used.
If the file already exists (and the O_TRUNC flag is not specified), the values specified for the bsize, ffactor, lorder and nelem arguments
are ignored and the values specified when the tree was created are used.
If a hash function is specified, hash_open() will attempt to determine if the hash function specified is the same as the one with which the
database was created, and will fail if it is not.
Backward compatible interfaces to the older dbm and ndbm routines are provided, however these interfaces are not compatible with previous
file formats.
ERRORS
The hash access method routines may fail and set errno for any of the errors specified for the library routine dbopen(3).
SEE ALSO
btree(3), dbopen(3), mpool(3), recno(3)
Per-Ake Larson, Dynamic Hash Tables, Communications of the ACM, April 1988.
Margo Seltzer, A New Hash Package for UNIX, USENIX Proceedings, Winter 1991.
BUGS
Only big and little endian byte order is supported.
BSD
August 18, 1994 BSD