How to extract data from a huge file? Post: 302159589

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number...

2. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc...

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and...

4. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is...

5. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . .

6. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

7. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached...

8. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized...

9. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ......

10. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat) File 1 - 15 columns File 2 - 15 columns Data is...

LEARN ABOUT DEBIAN

marc::charset

MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME

       MARC::Charset - convert MARC-8 encoded strings to UTF-8

SYNOPSIS

	   # import the marc8_to_utf8 function
	   use MARC::Charset 'marc8_to_utf8';

	   # prepare STDOUT for utf8
	   binmode(STDOUT, 'utf8');

	   # print out some marc8 as utf8
	   print marc8_to_utf8($marc8_string);

DESCRIPTION

       MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates
       unicode, and allows you to put non-Roman scripts in MARC bibliographic records.

	   http://www.loc.gov/marc/specifications/spechome.html

EXPORTS

   ignore_errors()
       Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting.  This is helpful if you have records
       that contain both MARC8 and UNICODE characters.

	   my $ignore = MARC::Charset->ignore_errors();

	   MARC::Charset->ignore_errors(1); # ignore errors
	   MARC::Charset->ignore_errors(0); # DO NOT ignore errors

   assume_unicode()
       Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting.
       This is helepfuli if you have records that contain both MARC8 and UNICODE characters.

	   my $setting = MARC::Charset->assume_unicode();

	   MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
	   MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode

   assume_encoding()
       Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current
       setting.  This is helpful if you have records that contain both MARC8 and other characters.

	   my $setting = MARC::Charset->assume_encoding();

	   MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
	   MARC::Charset->assume_encoding(''); # DO NOT assume any encoding

   marc8_to_utf8()
       Converts a MARC-8 encoded string to UTF-8.

	   my $utf8 = marc8_to_utf8($marc8);

       If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:

	   my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

	 or

	   MARC::Charset->ignore_errors(1);
	   my $utf8 = marc8_to_utf8($marc8);

   utf8_to_marc8()
       Will attempt to translate utf8 into marc8.

	   my $marc8 = utf8_to_marc8($utf8);

       If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:

	   my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

	 or

	   MARC::Charset->ignore_errors(1);
	   my $utf8 = marc8_to_utf8($marc8);

DEFAULT CHARACTER SETS

       If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the
       appropriate character set code:

	   use MARC::Charset::Constants qw(:all);
	   $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
	   $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;

SEE ALSO

       o   MARC::Charset::Constant

       o   MARC::Charset::Table

       o   MARC::Charset::Code

       o   MARC::Charset::Compiler

       o   MARC::Record

       o   MARC::XML

AUTHOR

       Ed Summers (ehs@pobox.com)

perl v5.12.4							    2011-08-05							MARC::Charset(3pm)