11-06-2008
Huge files manipulation
Hi , i need a fast way to delete duplicates entrys from very huge files ( >2 Gbs ) , these files are in plain text.
I tried all the usual methods ( awk / sort /uniq / sed /grep .. ) but it always ended with the same result (memory core dump)
In using HP-UX large servers.
Any advice will be very well come.
Thx in advance.
PD:I do not want to split the files.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have two files file A and File B. File A is a error file and File B is source file. In the error file. First line is the actual error and second line gives the information about the record (client ID) that throws error. I need to compare the first field (which doesnt start with '//') of... (11 Replies)
Discussion started by: kmkbuddy_1983
11 Replies
2. UNIX for Dummies Questions & Answers
Hi,
As per my requirement, I need to take difference between two big files(around 6.5 GB) and get the difference to a output file without any line numbers or '<' or '>' in front of each new line.
As DIFF command wont work for big files, i tried to use BDIFF instead.
I am getting incorrect... (13 Replies)
Discussion started by: pyaranoid
13 Replies
3. High Performance Computing
we have one file (11 Million) line that is being matched with (10 Billion) line.
the proof of concept we are trying , is to join them on Unix :
All files are delimited and they have composite keys..
could unix be faster than Oracle in This regards..
Please advice (1 Reply)
Discussion started by: magedfawzy
1 Replies
4. Shell Programming and Scripting
Input file data contents:
>seq_1
MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA
>seq_2
AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE
>seq_3
ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM
ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies
5. Shell Programming and Scripting
Hi
I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as:
6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|......... (3 Replies)
Discussion started by: lakteja
3 Replies
6. Shell Programming and Scripting
Hi, all:
I've got two folders, say, "folder1" and "folder2".
Under each, there are thousands of files.
It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command.
However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies
7. Shell Programming and Scripting
I have this 2 files:
k5login
sanwar@systems.nyfix.com
jjamnik@systems.nyfix.com
nisha@SYSTEMS.NYFIX.COM
rdpena@SYSTEMS.NYFIX.COM
service/backups-ora@SYSTEMS.NYFIX.COM
ivanr@SYSTEMS.NYFIX.COM
nasapova@SYSTEMS.NYFIX.COM
tpulay@SYSTEMS.NYFIX.COM
rsueno@SYSTEMS.NYFIX.COM... (11 Replies)
Discussion started by: linuxgeek
11 Replies
8. Shell Programming and Scripting
I have a DB folder which sizes to 60GB approx. It has logs which size from 500MB - 1GB. I have an Installation which would update the DB. I need to backup this DB folder, just incase my Installation FAILS. But I do not need the logs in my backup. How do I exclude them during compression (tar)?
... (2 Replies)
Discussion started by: DevendraG
2 Replies
9. UNIX for Dummies Questions & Answers
Hi all,
I hope you are well. I am very happy to see your contribution. I am eager to become part of it.
I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below
For clear picture, please see... (9 Replies)
Discussion started by: kaaliakahn
9 Replies
10. Shell Programming and Scripting
Hi Friends !!
I am facing a hash total issue while performing over a set of files of huge volume:
Command used:
tail -n +2 <File_Name> |nawk -F"|" -v '%.2f' qq='"' '{gsub(qq,"");sa+=($156<0)?-$156:$156}END{print sa}' OFMT='%.5f'
Pipe delimited file and 156 column is for hash totalling.... (14 Replies)
Discussion started by: Ravichander
14 Replies
LEARN ABOUT HPUX
largefile
largefile(5) Standards, Environments, and Macros largefile(5)
NAME
largefile - large file status of utilities
DESCRIPTION
A large file is a regular file whose size is greater than or equal to 2 Gbyte ( 2**31 bytes). A small file is a regular file whose size is
less than 2 Gbyte.
Large file aware utilities
A utility is called large file aware if it can process large files in the same manner as it does small files. A utility that is large file
aware is able to handle large files as input and generate as output large files that are being processed. The exception is where additional
files are used as system configuration files or support files that can augment the processing. For example, the file utility supports the
-m option for an alternative "magic" file and the -f option for a support file that can contain a list of file names. It is unspecified
whether a utility that is large file aware will accept configuration or support files that are large files. If a large file aware utility
does not accept configuration or support files that are large files, it will cause no data loss or corruption upon encountering such files
and will return an appropriate error.
The following /usr/bin utilities are large file aware:
adb awk bdiff cat chgrp
chmod chown cksum cmp compress
cp csh csplit cut dd
dircmp du egrep fgrep file
find ftp getconf grep gzip
head join jsh ksh ln
ls mdb mkdir mkfifo more
mv nawk page paste pathchck
pg rcp remsh rksh rm
rmdir rsh sed sh sort
split sum tail tar tee
test touch tr uncompress uudecode
uuencode wc zcat
The following /usr/xpg4/bin utilities are large file aware:
awk cp chgrp chown du
egrep fgrep file grep ln
ls more mv rm sed
sh sort tail tr
The following /usr/xpg6/bin utilities are large file aware:
getconf ls tr
The following /usr/sbin utilities are large file aware:
install mkfile mknod mvdir swap
See the USAGE section of the swap(1M) manual page for limitations of swap on block devices greater than 2 Gbyte on a 32-bit operating sys-
tem.
The following /usr/ucb utilities are large file aware:
chown from ln ls sed
sum touch
The /usr/bin/cpio and /usr/bin/pax utilities are large file aware, but cannot archive a file whose size exceeds 8 Gbyte - 1 byte.
The /usr/bin/truss utilities has been modified to read a dump file and display information relevant to large files, such as offsets.
cachefs file systems
The following /usr/bin utilities are large file aware for cachefs file systems:
cachefspack cachefsstat
The following /usr/sbin utilities are large file aware for cachefs file systems:
cachefslog cachefswssize cfsadmin fsck
mount umount
nfs file systems
The following utilities are large file aware for nfs file systems:
/usr/lib/autofs/automountd /usr/sbin/mount
/usr/lib/nfs/rquotad
ufs file systems
The following /usr/bin utility is large file aware for ufs file systems:
df
The following /usr/lib/nfs utility is large file aware for ufs file systems:
rquotad
The following /usr/xpg4/bin utility is large file aware for ufs file systems:
df
The following /usr/sbin utilities are large file aware for ufs file systems:
clri dcopy edquota ff fsck
fsdb fsirand fstyp labelit lockfs
mkfs mount ncheck newfs quot
quota quotacheck quotaoff quotaon repquota
tunefs ufsdump ufsrestore umount
Large file safe utilities
A utility is called large file safe if it causes no data loss or corruption when it encounters a large file. A utility that is large file
safe is unable to process properly a large file, but returns an appropriate error.
The following /usr/bin utilities are large file safe:
audioconvert audioplay audiorecord comm diff
diff3 diffmk ed lp mail
mailcompat mailstats mailx pack pcat
red rmail sdiff unpack vi
view
The following /usr/xpg4/bin utilities are large file safe:
ed vi view
The following /usr/xpg6/bin utility is large file safe:
ed
The following /usr/sbin utilities are large file safe:
lpfilter lpforms
The following /usr/ucb utilities are large file safe:
Mail lpr
The following /usr/lib utility is large file safe:
sendmail
SEE ALSO
lf64(5), lfcompile(5), lfcompile64(5)
SunOS 5.10 7 Nov 2003 largefile(5)