Hello,
I have a very large file of around 2 million records which has the following structure:
Quote:
English characters#Hindi in Utf8 format
Mohit#मोहित
Shailesh#शैलेश
Bagde#बागडे
Mohit#मोहित
Shailesh#शैलेश
Goud#गौड
Mohit#मोहित
Shailesh#शैलेश
Ladava#लाडवा
Mohit#मोहित
Shailesh#शैलेश
Mehetre#मेहेत्रे
Mohit#मोहित
I have used the standard awk program to sort:
and a PERL program I found on the net:
While both work beautifully for small files of around fifty thousand lines when I execute them on the very large file, they run out of memory.
I am working on a Windows machine VISTA OS and have even tries increasing the paging memory size to around 8Mb but to no avail.
I believe there is a function in Perl where you can set the variable to 99999 which allows for very large file execution. I have tried to insert that in the Perl program but I get an out of memory call.
Could anybody provide with a solution where the program can run on a very large file of around 9 mb.
Many thanks.
Greetings all:
I am still new to Unix environment and I need help with the following requirement.
I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This... (1 Reply)
Dear All,
Could you please help me to split a file contain around 240,000,000 line to 4 files all equally likely , note that we need to maintain that the end of each file should started by start flage (MSISDN) and ended by end flag (End), also the number of the line between the... (10 Replies)
I was wondering how sort works.
Does file size and time to sort increase geometrically?
I have a 5.3 billion line file I'd like to use with sort -u I'm wondering if that'll take forever because of a geometric expansion?
If it takes 100 hours that's fine but not 100 days.
Thanks so much. (2 Replies)
Hi-
I am trying to search a large file with a number of different search terms that are listed one per line in 3 different files. Most importantly I need to be able to do a case insensitive search.
I have tried just using egrep -f but it doesn't seam to be able to handle the -i option when... (3 Replies)
hello,
Here is a program for creating a word-frequency
# wf.gk --- program to generate word frequencies from a file
{
# remove punctuation: This will remove all punctuations from the file
gsub(/_]/, "", $0)
#Start frequency analysis
for (i = 1; i <= NF; i++)
freq++
}
END
#Print output... (11 Replies)
Hello all -
I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
I am attempting to write a script that will pull out NTLM hashes from a text file that contains about 500,000 lines of data. Not all accounts contain hashes and I only need the ones that do contain hashes.
Here is a sample of what the data looks like:
There are thousands of other lines in... (6 Replies)
I'm doing a hobby project that has me sorting huge files with sort of monotonous keys. It's very slow -- the current file is about 300 GB and has been sorting for a day. I know that sort has this --batch-size and --buffer-size parameters, but I'd like a jump start if possible to limit the... (42 Replies)
tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-25} < book7.txt
This is not my script, it can be found way back from 1980 but once it worked fine to give me the most used words in a text file.
Now the shell is complaining about an error in sed
sed: -e... (5 Replies)
Hello, my first thread here.
I've been searching and fiddling around for about a week and I cannot find a solution.:confused:
I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files... (5 Replies)
Discussion started by: Josh52180
5 Replies
LEARN ABOUT MINIX
join
JOIN(1) General Commands Manual JOIN(1)NAME
join - relational database operator
SYNOPSIS
join [-an] [-e s] [-o list] [-tc] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard
input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis-
carded.
These options are recognized:
-an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-e s Replace empty output fields by string s.
-o list
Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
field number.
-tc Use character c as a separator (tab character). Every appearance of c in a line is significant.
SEE ALSO sort(1), comm(1), awk(1).
BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.
The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.
7th Edition April 29, 1985 JOIN(1)