Large file data handling issue Post: 302731311

10 More Discussions You Might Find Interesting

1. HP-UX

Need to split a large data file using a Unix script

Greetings all: I am still new to Unix environment and I need help with the following requirement. I have a large sequential file sorted on a field (say store#) that is being split into several smaller files, one for each store. That means if there are 500 stores, there will be 500 files. This...

2. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below...

3. Shell Programming and Scripting

Extract data from large file 80+ million records

Hello, I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file. What will be the besat and fastest way to extract the ne file. sample file format :--...

4. Shell Programming and Scripting

UNIX File handling -Issue in reading a file

I have been doing automation of daily check activity for a server, i have been using sqls to retrive the data and while loop for reading the data from the file for several activities. BUT i got a show stopper the below one.. where the data is getting store in $temp_file, but not being read by while...

5. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ...

6. Red Hat

Advice regarding filesystems handling large number of files

Hi All, I have a CentOS operating system installed. I work with really huge number of files which are not only huge in number but some of them really huge in size. Minimum number of files could be 1 million to 2 million in one directory itself. Some of the files are even several Gigabytes in...

7. Shell Programming and Scripting

UNIX file handling issue

I have a huge file semicolon( ; ) separated records are Pipe(|) delimited. e.g abc;def;ghi|jkl;mno;pqr|123;456;789 I need to replace the 50th field(semicolon separated) of each record with 9006. The 50th field can have no value e.g. ;; Can someone help me with the appropriate command.

8. UNIX for Dummies Questions & Answers

File handling issue

Hi All, I am running into an issue. I have a very big file. Wants to split it in smaller chunks. This file has multiple header/ trailers. Also, between each header/trailer there are records. Number of records in each header trailer combination can vary. Also, headers can start with...

9. Shell Programming and Scripting

Output large volume of data to CSV file

I have a program that output the ownership and permission on each directory and file on the server to a csv file. I am getting error message when I run the program. The program is not outputting to the csv file. Error: the file access permissions do not allow the specified action cannot...

10. Shell Programming and Scripting

Large File masking incorrectly happening � delimeter issue

The OS version is Red Hat Enterprise Linux Server release 6.10 I have a script to mask some columns with **** in a data file which is delimeted with � , I am using awk for the masking , when I try to mask a small file the awk works fine and masks the required column , but when the file is...

LEARN ABOUT DEBIAN

marc::file::usmarc

MARC::File::USMARC(3pm) 				User Contributed Perl Documentation				   MARC::File::USMARC(3pm)

NAME

       MARC::File::USMARC - USMARC-specific file handling

SYNOPSIS

	   use MARC::File::USMARC;

	   my $file = MARC::File::USMARC->in( $filename );

	   while ( my $marc = $file->next() ) {
	       # Do something
	   }
	   $file->close();
	   undef $file;

EXPORT

       None.

METHODS

   decode( $string [, &filter_func ] )
       Constructor for handling data from a USMARC file.  This function takes care of all the tag directory parsing & mangling.

       Any warnings or coercions can be checked in the "warnings()" function.

       The $filter_func is an optional reference to a user-supplied function that determines on a tag-by-tag basis if you want the tag passed to
       it to be put into the MARC record.  The function is passed the tag number and the raw tag data, and must return a boolean.  The return of a
       true value tells MARC::File::USMARC::decode that the tag should get put into the resulting MARC record.

       For example, if you only want title and subject tags in your MARC record, try this:

	   sub filter {
	       my ($tagno,$tagdata) = @_;

	       return ($tagno == 245) || ($tagno >= 600 && $tagno <= 699);
	   }

	   my $marc = MARC::File::USMARC->decode( $string, &filter );

       Why would you want to do such a thing?  The big reason is that creating fields is processor-intensive, and if your program is doing read-
       only data analysis and needs to be as fast as possible, you can save time by not creating fields that you'll be ignoring anyway.

       Another possible use is if you're only interested in printing certain tags from the record, then you can filter them when you read from
       disc and not have to delete unwanted tags yourself.

   update_leader()
       If any changes get made to the MARC record, the first 5 bytes of the leader (the length) will be invalid.  This function updates the leader
       with the correct length of the record as it would be if written out to a file.

   _build_tag_directory()
       Function for internal use only: Builds the tag directory that gets put in front of the data in a MARC record.

       Returns two array references, and two lengths: The tag directory, and the data fields themselves, the length of all data (including the
       Leader that we expect will be added), and the size of the Leader and tag directory.

   encode()
       Returns a string of characters suitable for writing out to a USMARC file, including the leader, directory and all the fields.

RELATED MODULES

       MARC::Record

TODO

       Make some sort of autodispatch so that you don't have to explicitly specify the MARC::File::X subclass, sort of like how DBI knows to use
       DBD::Oracle or DBD::Mysql.

       Create a toggle-able option to check inside the field data for end of field characters.	Presumably it would be good to have it turned on
       all the time, but it's nice to be able to opt out if you don't want to take the performance hit.

LICENSE

       This code may be distributed under the same terms as Perl itself.

       Please note that these modules are not products of or supported by the employers of the various contributors to the code.

AUTHOR

       Andy Lester, "<andy@petdance.com>"

perl v5.10.1							    2010-03-29						   MARC::File::USMARC(3pm)