Sponsored Content
Full Discussion: Sort large file
Top Forums UNIX for Dummies Questions & Answers Sort large file Post 302284021 by hegemaro on Wednesday 4th of February 2009 01:53:38 PM
Old 02-04-2009
I've worked with very large files before but not quite that large. Given several million lines, I found I could reduce the overall time by splitting the file into smaller units using grep, sort and save each unit, then combine the results.

For example, given a file of cksum output which will always begin with a numeric check sum,

Code:
#!/bin/ksh

    I=0
    while [ $I -lt 10 ]
    do
        grep "^$I" file | sort -u > $I.sort
        (( I += 1 ))
    done

    cat ?.sort > file.sorted

This could, of course, be expanded to include any characters replacing the while loop with a for loop.
 

10 More Discussions You Might Find Interesting

1. Filesystems, Disks and Memory

Strange difference in file size when copying LARGE file..

Hi, Im trying to take a database backup. one of the files is 26 GB. I am using cp -pr to create a backup copy of the database. after the copying is complete, if i do du -hrs on the folders i saw a difference of 2GB. The weird fact is that the BACKUP folder was 2 GB more than the original one! ... (1 Reply)
Discussion started by: 0ktalmagik
1 Replies

2. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

3. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

4. Shell Programming and Scripting

Script to search a large file with a list of terms in another file

Hi- I am trying to search a large file with a number of different search terms that are listed one per line in 3 different files. Most importantly I need to be able to do a case insensitive search. I have tried just using egrep -f but it doesn't seam to be able to handle the -i option when... (3 Replies)
Discussion started by: dougzilla
3 Replies

5. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

6. Shell Programming and Scripting

Script to sort large file with frequency

Hello, I have a very large file of around 2 million records which has the following structure: I have used the standard awk program to sort: # wordfreq.awk --- print list of word frequencies { # remove punctuation #gsub(/_]/, "", $0) for (i = 1; i <= NF; i++) freq++ } END { for (word... (3 Replies)
Discussion started by: gimley
3 Replies

7. Shell Programming and Scripting

Sort help: How to sort collected 'file list' by date stamp :

Hi Experts, I have a filelist collected from another server , now want to sort the output using date/time stamp filed. - Filed 6, 7,8 are showing the date/time/stamp. Here is the input: #---------------------------------------------------------------------- -rw------- 1 root ... (3 Replies)
Discussion started by: rveri
3 Replies

8. UNIX for Advanced & Expert Users

Help optimizing sort of large files

I'm doing a hobby project that has me sorting huge files with sort of monotonous keys. It's very slow -- the current file is about 300 GB and has been sorting for a day. I know that sort has this --batch-size and --buffer-size parameters, but I'd like a jump start if possible to limit the... (42 Replies)
Discussion started by: kogorman3
42 Replies

9. Linux

Split a large textfile (one file) into multiple file to base on ^L

Hi, Anyone can help, I have a large textfile (one file), and I need to split into multiple file to break each file into ^L. My textfile ========== abc company abc address abc contact ^L my company my address my contact my skills ^L your company your address ========== (3 Replies)
Discussion started by: fspalero
3 Replies

10. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies
largefile(5)                                            Standards, Environments, and Macros                                           largefile(5)

NAME
largefile - large file status of utilities DESCRIPTION
A large file is a regular file whose size is greater than or equal to 2 Gbyte ( 2**31 bytes). A small file is a regular file whose size is less than 2 Gbyte. Large file aware utilities A utility is called large file aware if it can process large files in the same manner as it does small files. A utility that is large file aware is able to handle large files as input and generate as output large files that are being processed. The exception is where additional files are used as system configuration files or support files that can augment the processing. For example, the file utility supports the -m option for an alternative "magic" file and the -f option for a support file that can contain a list of file names. It is unspecified whether a utility that is large file aware will accept configuration or support files that are large files. If a large file aware utility does not accept configuration or support files that are large files, it will cause no data loss or corruption upon encountering such files and will return an appropriate error. The following /usr/bin utilities are large file aware: adb awk bdiff cat chgrp chmod chown cksum cmp compress cp csh csplit cut dd dircmp du egrep fgrep file find ftp getconf grep gzip head join jsh ksh ln ls mdb mkdir mkfifo more mv nawk page paste pathchck pg rcp remsh rksh rm rmdir rsh sed sh sort split sum tail tar tee test touch tr uncompress uudecode uuencode wc zcat The following /usr/xpg4/bin utilities are large file aware: awk cp chgrp chown du egrep fgrep file grep ln ls more mv rm sed sh sort tail tr The following /usr/xpg6/bin utilities are large file aware: getconf ls tr The following /usr/sbin utilities are large file aware: install mkfile mknod mvdir swap See the USAGE section of the swap(1M) manual page for limitations of swap on block devices greater than 2 Gbyte on a 32-bit operating sys- tem. The following /usr/ucb utilities are large file aware: chown from ln ls sed sum touch The /usr/bin/cpio and /usr/bin/pax utilities are large file aware, but cannot archive a file whose size exceeds 8 Gbyte - 1 byte. The /usr/bin/truss utilities has been modified to read a dump file and display information relevant to large files, such as offsets. cachefs file systems The following /usr/bin utilities are large file aware for cachefs file systems: cachefspack cachefsstat The following /usr/sbin utilities are large file aware for cachefs file systems: cachefslog cachefswssize cfsadmin fsck mount umount nfs file systems The following utilities are large file aware for nfs file systems: /usr/lib/autofs/automountd /usr/sbin/mount /usr/lib/nfs/rquotad ufs file systems The following /usr/bin utility is large file aware for ufs file systems: df The following /usr/lib/nfs utility is large file aware for ufs file systems: rquotad The following /usr/xpg4/bin utility is large file aware for ufs file systems: df The following /usr/sbin utilities are large file aware for ufs file systems: clri dcopy edquota ff fsck fsdb fsirand fstyp labelit lockfs mkfs mount ncheck newfs quot quota quotacheck quotaoff quotaon repquota tunefs ufsdump ufsrestore umount Large file safe utilities A utility is called large file safe if it causes no data loss or corruption when it encounters a large file. A utility that is large file safe is unable to process properly a large file, but returns an appropriate error. The following /usr/bin utilities are large file safe: audioconvert audioplay audiorecord comm diff diff3 diffmk ed lp mail mailcompat mailstats mailx pack pcat red rmail sdiff unpack vi view The following /usr/xpg4/bin utilities are large file safe: ed vi view The following /usr/xpg6/bin utility is large file safe: ed The following /usr/sbin utilities are large file safe: lpfilter lpforms The following /usr/ucb utilities are large file safe: Mail lpr The following /usr/lib utility is large file safe: sendmail SEE ALSO
lf64(5), lfcompile(5), lfcompile64(5) SunOS 5.10 7 Nov 2003 largefile(5)
All times are GMT -4. The time now is 10:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy