Script to sort large file with frequency Post: 302663315

10 More Discussions You Might Find Interesting

1. HP-UX

Need to split a large data file using a Unix script

Greetings all: I am still new to Unix environment and I need help with the following requirement. I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This...

2. Shell Programming and Scripting

script to splite large file to number of small files

Dear All, Could you please help me to split a file contain around 240,000,000 line to 4 files all equally likely , note that we need to maintain that the end of each file should started by start flage (MSISDN) and ended by end flag (End), also the number of the line between the...

3. UNIX for Dummies Questions & Answers

Sort large file

I was wondering how sort works. Does file size and time to sort increase geometrically? I have a 5.3 billion line file I'd like to use with sort -u I'm wondering if that'll take forever because of a geometric expansion? If it takes 100 hours that's fine but not 100 days. Thanks so much.

4. Shell Programming and Scripting

Script to search a large file with a list of terms in another file

Hi- I am trying to search a large file with a number of different search terms that are listed one per line in 3 different files. Most importantly I need to be able to do a case insensitive search. I have tried just using egrep -f but it doesn't seam to be able to handle the -i option when...

5. Shell Programming and Scripting

Word Frequency Sort

hello, Here is a program for creating a word-frequency # wf.gk --- program to generate word frequencies from a file { # remove punctuation: This will remove all punctuations from the file gsub(/_]/, "", $0) #Start frequency analysis for (i = 1; i <= NF; i++) freq++ } END #Print output...

6. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files...

7. Shell Programming and Scripting

Script to pull hashes out of large text file

I am attempting to write a script that will pull out NTLM hashes from a text file that contains about 500,000 lines of data. Not all accounts contain hashes and I only need the ones that do contain hashes. Here is a sample of what the data looks like: There are thousands of other lines in...

8. UNIX for Advanced & Expert Users

Help optimizing sort of large files

I'm doing a hobby project that has me sorting huge files with sort of monotonous keys. It's very slow -- the current file is about 300 GB and has been sorting for a day. I know that sort has this --batch-size and --buffer-size parameters, but I'd like a jump start if possible to limit the...

9. Shell Programming and Scripting

Frequency of Words in a File, sed script from 1980

tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-25} < book7.txt This is not my script, it can be found way back from 1980 but once it worked fine to give me the most used words in a text file. Now the shell is complaining about an error in sed sed: -e...

10. Shell Programming and Scripting

Script to compare files in 2 folders and delete the large file

Hello, my first thread here. I've been searching and fiddling around for about a week and I cannot find a solution.:confused: I have been converting all of my home videos to HEVC and sometimes the files end up smaller and sometimes they don't. I am currently comparing all the video files...

LEARN ABOUT DEBIAN

xml::filter::sort::buffermgr

XML::Filter::Sort::BufferMgr(3pm)			User Contributed Perl Documentation			 XML::Filter::Sort::BufferMgr(3pm)

NAME

       XML::Filter::Sort::BufferMgr - Implementation class used by XML::Filter::Sort

DESCRIPTION

       The documentation is targetted at developers wishing to extend or replace this class.  For user documentation, see XML::Filter::Sort.

       Two classes are used to implement buffering records and spooling them back out in sorted order as SAX events.  One instance of the
       XML::Filter::Sort::Buffer class is used to buffer each record and one or more instances of the XML::Filter::Sort::BufferMgr class are used
       to manage the buffers.

API METHODS

       The API of this module as used by XML::Filter::Sort::Buffer consists of the following sequence of method calls:

       1.  When the first 'record' in a sequence is encountered, XML::Filter::Sort creates a XML::Filter::Sort::BufferMgr object using the "new()"
	   method.

       2.  XML::Filter::Sort calls the buffer manager's "new_buffer()" method to get a XML::Filter::Sort::Buffer object and all SAX events are
	   directed to this object until the end of the record is encountered.	The following events are supported by the current buffer
	   implementation:

	     start_element()
	     characters()
	     comment()
	     processing_instruction()
	     end_element()

       3.  When the end of the record is detected, XML::Filter::Sort calls the buffer manager's "close_buffer()" method, which in turn calls the
	   buffer's "close()" method.  The "close()" method returns a list of values for the sort keys and the buffer manager uses these to store
	   the buffer for later recall.  Subsequent records are handled as per step 2.

       4.  When the last record has been buffered, XML::Filter::Sort calls the buffer manager's "to_sax()" method.  The buffer manager retrieves
	   each of the buffers in sorted order and calls the buffer's "to_sax()" method.

       Each buffer attempts to match the sort key paths as SAX events are received.  Once a value has been found for a given key, that same path
       match is not attempted against subsequent events.  For efficiency, the code to match each key is compiled into a closure.  For even more
       efficiency, this compilation is done once when the XML::Filter::Sort object is created.	The "compile_matches()" method in the buffer
       manager class calls the "compile_matches()" method in the buffer class to achieve this.

DATA STRUCTURES

       In the current implementation, the XML::Filter::Sort::BufferMgr class simply uses a hash to store the buffer objects.  If only one sort key
       was defined, only a single hash is required.  The values in the hash are arrayrefs containing the list of buffers for records with
       identical keys.

       If two or more sort keys are defined, the hash values will be XML::Filter::Sort::BufferMgr objects which in turn will contain the buffers.
       The following illustration may clarify the relationship (BM=buffer manager, B=buffer):

					BM
			+----------------+---------------+
			|				 |
		       BM				BM
		  +-----+--------+		   +-----+----------+
		  |		 |		   |		    |
		 BM		BM		  BM		   BM
	    +-----+----+    +----+------+     +----+----+    +------+------+
	    |	  |    |    |	 |	|     |    |	|    |	    |	   |
	 [B,B,B] [B] [B,B] [B] [B,B] [B,B,B] [B] [B,B] [B] [B,B] [B,B,B] [B,B]

       This layered storage structure is transparent to the XML::Filter::Sort object which instantiates and interacts with only one buffer manager
       (the one at the top of the tree).

COPYRIGHT

       Copyright 2002 Grant McLean <grantm@cpan.org>

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.12.4							    2002-06-14					 XML::Filter::Sort::BufferMgr(3pm)

10 More Discussions You Might Find Interesting

1. HP-UX

Need to split a large data file using a Unix script

Discussion started by: SAIK

2. Shell Programming and Scripting

script to splite large file to number of small files

Discussion started by: ahmed.gad

3. UNIX for Dummies Questions & Answers

Sort large file

Discussion started by: dcfargo

4. Shell Programming and Scripting

Script to search a large file with a list of terms in another file

Discussion started by: dougzilla