Edit a Huge one line file Post: 302655339

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Hello, I need to insert a line (like a header) as the first line of a very huge file (about 3 ml rows). I am able to do it with sed, but redirecting the output and creating a new file takes quite some time. I was wondering if there was a more efficient way of doing it? Any help would be...

2. UNIX for Dummies Questions & Answers

edit each line in the file

I am trying to edit each line in a file. The file has several columns delimitted by '|'. I need to take out the last two columns. Each line starts with a unique word through which I am storing the lines in a variable and cutting the last two colums. But, when I am echoing the line, it is...

3. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

i need help..!!!! i have one big text file estimate data file size 50 - 100GB with 70 Mega Rows. on OS SUN Solaris version 8 How i can remove first line of the text file. Please suggest me for solutions. Thank you very much in advance:)

4. Shell Programming and Scripting

Edit a line in a file with perl

Hi, How can I edit a line in a file? For example, a.txt contains: start: 1 2 3 4 stop: a b c d and I want to change "3" to "9" and to add "5" after "4" the result should be (a.txt): start: 1 9 3 4 5 stop: a b c d Thanks, zed

5. Shell Programming and Scripting

Implement in one line sed or awk having no delimiter and file size is huge

I have file which contains around 5000 lines. The lines are fixed legth but having no delimiter.Each line line contains nearly 3000 characters. I want to delete the lines a> if it starts with 1 and if 576th postion is a digit i,e 0-9 or b> if it starts with 0 or 9(i,e header and footer) ...

6. Shell Programming and Scripting

How to edit file to have one line entry?

Hello All, My file content is: DROP TABLE "FACT_WORLD"; CREATE TABLE "FACT_WORLD" ( "AR_ID" INTEGER NOT NULL, "ORG_ID" INTEGER NOT NULL ) DATA CAPTURE NONE COMPRESS YES; I want to change this file to have entries in one...

7. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised...

8. Shell Programming and Scripting

Edit first line of a text file

Hi friends, Issue1: I have a text file with the first line like this #chrom start end Readcount_A Normalized_Readcount_A ReadcountB Normalized_Readcount_B Fc_A_vs_B pvalue_A_vs_B FDR_A_vs_B Fc_B_vs_A pvalue_B_vs_A FDR_B_vs_A <a href="http://unix.com/">Link</a> How can I change it to the...

9. UNIX for Dummies Questions & Answers

Need to replace new line characters in a huge file

Hi , I would like to replace new line characters(\n) in a huge file of about 2 million records . I tried this one (:%s/\n//g) but it's hanging there and no result. Does this command do not work if the file is big. Please let me know if you have any other options Regards Raj

10. Shell Programming and Scripting

Reading ALL BUT the first and last line of a huge file

Hi. Pardon me if I'm posting a duplicate thread but.. I have a text file with over 150 Million records, file size is in the range if MB(close to GB). The requirement is to read ALL the lines excepting the FIRST LINE which is the file header and the LAST LINE which is it's trailer record. ...

LEARN ABOUT DEBIAN

jcode

Jcode(3pm)						User Contributed Perl Documentation						Jcode(3pm)

NAME

       Jcode - Japanese Charset Handler

SYNOPSIS

	use Jcode;
	#
	# traditional
	Jcode::convert($str, $ocode, $icode, "z");
	# or OOP!
	print Jcode->new($str)->h2z->tr($from, $to)->utf8;

DESCRIPTION

       <Japanese document is now available as Jcode::Nihongo. >

       Jcode.pm supports both object and traditional approach.	With object approach, you can go like;

	 $iso_2022_jp = Jcode->new($str)->h2z->jis;

       Which is more elegant than:

	 $iso_2022_jp = $str;
	 &jcode::convert($iso_2022_jp, 'jis', &jcode::getcode($str), "z");

       For those unfamiliar with objects, Jcode.pm still supports "getcode()" and "convert()."

       If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the standard charset handler module for Perl 5.8 or later.

Methods
       Methods mentioned here all return Jcode object unless otherwise mentioned.

       Constructors

       $j = Jcode->new($str [, $icode])
	 Creates Jcode object $j from $str.  Input code is automatically checked unless you explicitly set $icode. For available charset, see get-
	 code below.

	 For perl 5.8.1 or better, $icode can be any encoding name that Encode understands.

	   $j = Jcode->new($european, 'iso-latin1');

	 When the object is stringified, it returns the EUC-converted string so you can <print $j> instead of <print $j->euc>.

	 Passing Reference
	   Instead of scalar value, You can use reference as

	   Jcode->new($str);

	   This saves time a little bit.  In exchange of the value of $str being converted. (In a way, $str is now "tied" to jcode object).

       $j->set($str [, $icode])
	 Sets $j's internal string to $str.  Handy when you use Jcode object repeatedly (saves time and memory to create object).

	  # converts mailbox to SJIS format
	  my $jconv = new Jcode;
	  $/ = 00;
	  while(&lt;&gt;){
	      print $jconv->set($_)->mime_decode->sjis;
	  }

       $j->append($str [, $icode]);
	 Appends $str to $j's internal string.

       $j = jcode($str [, $icode]);
	 shortcut for Jcode->new() so you can go like;

       Encoded Strings

       In general, you can retrieve encoded string as $j->encoded.

       $sjis = jcode($str)->sjis
       $euc = $j->euc
       $jis = $j->jis
       $sjis = $j->sjis
       $ucs2 = $j->ucs2
       $utf8 = $j->utf8
	 What you code is what you get :)

       $iso_2022_jp = $j->iso_2022_jp
	 Same as "$j->h2z->jis".  Hankaku Kanas are forcibly converted to Zenkaku.

	 For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports.  For example:

	   $european = $j->iso_latin1; # replace '-' with '_' for names.

	 FYI: Encode::Encoder uses similar trick.

	 $j->fallback($fallback)
	   For perl is 5.8.1 or better, Jcode stores the internal string in UTF-8.  Any character that does not map to ->encoding are replaced
	   with a '?', which is Encode standard.

	     my $unistr = "x{262f}"; # YIN YANG
	     my $j = jcode($unistr);  # $j->euc is '?'

	   You can change this behavior by specifying fallback like Encode.  Values are the same as Encode.  "Jcode::FB_PERLQQ", "Jcode::FB_XML-
	   CREF", "Jcode::FB_HTMLCREF" are aliased to those of Encode for convenice.

	     print $j->fallback(Jcode::FB_PERLQQ)->euc;   # 'x{262f}'
	     print $j->fallback(Jcode::FB_XMLCREF)->euc;  # '&#x262f;'
	     print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '&#9775;'

	   The global variable $Jcode::FALLBACK stores the default fallback so you can override that by assigning the value.

	     $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme

       [@lines =] $jcode->jfold([$width, $newline_str, $kref])
	 folds lines in jcode string every $width (default: 72) where $width is the number of "halfwidth" character.  Fullwidth Characters are
	 counted as two.

	 with a newline string spefied by $newline_str (default: "
").

	 Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better.

       $length = $jcode->jlength();
	 returns character length properly, rather than byte length.

       Methods that use MIME::Base64

       To use methods below, you need MIME::Base64.  To install, simply

	  perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'

       If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled.

       $mime_header = $j->mime_encode([$lf, $bpl])
	 Converts $str to MIME-Header documented in RFC1522.  When $lf is specified, it uses $lf to fold line (default: 
).  When $bpl is speci-
	 fied, it uses $bpl for the number of bytes (default: 76; this number must be smaller than 76).

	 For Perl 5.8.1 or better, you can also encode MIME Header as:

	   $mime_header = $j->MIME_Header;

	 In which case the resulting $mime_header is MIME-B-encoded UTF-8 whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.  Most
	 modern MUAs support both.

       $j->mime_decode;
	 Decodes MIME-Header in Jcode object.  For perl 5.8.1 or better, you can also do the same as:

	   Jcode->new($str, 'MIME-Header')

       Hankaku vs. Zenkaku

       $j->h2z([$keep_dakuten])
	 Converts X201 kana (Hankaku) to X208 kana (Zenkaku).  When $keep_dakuten is set, it leaves dakuten as is (That is, "ka + dakuten" is left
	 as is instead of being converted to "ga")

	 You can retrieve the number of matches via $j->nmatch;

       $j->z2h
	 Converts X208 kana (Zenkaku) to X201 kana (Hankaku).

	 You can retrieve the number of matches via $j->nmatch;

       Regexp emulators

       To use "->m()" and "->s()", you need perl 5.8.1 or better.

       $j->tr($from, $to, $opt);
	 Applies "tr/$from/$to/" on Jcode object where $from and $to are EUC-JP strings.  On perl 5.8.1 or better, $from and $to can also be
	 flagged UTF-8 strings.

	 If $opt is set, "tr/$from/$to/$opt" is applied.  $opt must be 'c', 'd' or the combination thereof.

	 You can retrieve the number of matches via $j->nmatch;

	 The following methods are available only for perl 5.8.1 or better.

       $j->s($patter, $replace, $opt);
	 Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in EUC-JP or flagged UTF-8. $opt are the same as regexp options.  See
	 perlre for regexp options.

	 Like "$j->tr()", "$j->s()" returns the object itself so you can nest the operation as follows;

	   $j->tr("a-z", "A-Z")->s("foo", "bar");

       [@match = ] $j->m($pattern, $opt);
	 Applies "m/$patter/$opt".  Note that this method DOES NOT RETURN AN OBJECT so you can't chain the method like	"$j->s()".

       Instance Variables

       If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That's what OOP is
       all about)

       FYI, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don't have to know as long as you use
       access methods instead;	Once again, that's OOP)

       $j->r_str
	 Reference to the EUC-coded String.

       $j->icode
	 Input charcode in recent operation.

       $j->nmatch
	 Number of matches (Used in $j->tr, etc.)

Subroutines
       ($code, [$nmatch]) = getcode($str)
	 Returns char code of $str. Return codes are as follows

	  ascii   Ascii (Contains no Japanese Code)
	  binary  Binary (Not Text File)
	  euc	  EUC-JP
	  sjis	  SHIFT_JIS
	  jis	  JIS (ISO-2022-JP)
	  ucs2	  UCS2 (Raw Unicode)
	  utf8	  UTF8

	 When array context is used instead of scaler, it also returns how many character codes are found.  As mentioned above, $str can be $str
	 instead.

	 jcode.pl Users:  This function is 100% upper-conpatible with jcode::getcode() -- well, almost;

	  * When its return value is an array, the order is the opposite;
	    jcode::getcode() returns $nmatch first.

	  * jcode::getcode() returns 'undef' when the number of EUC characters
	    is equal to that of SJIS.  Jcode::getcode() returns EUC.  for
	    Jcode.pm there is no in-betweens.

       Jcode::convert($str, [$ocode, $icode, $opt])
	 Converts $str to char code specified by $ocode.  When $icode is specified also, it assumes $icode for input string instead of the one
	 checked by getcode(). As mentioned above, $str can be $str instead.

	 jcode.pl Users:  This function is 100% upper-conpatible with jcode::convert() !

BUGS

       For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode.  Meaning Jcode is subject to bugs therein.

ACKNOWLEDGEMENTS

       This package owes a lot in motivation, design, and code, to the jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.

       Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from the very first stage of development.

       JEncode by makamaka@donzoko.net has inspired me to integrate Encode to Jcode.  He has also contributed Japanese POD.

       And folks at Jcode Mailing list <jcode5@ring.gr.jp>.  Without them, I couldn't have coded this far.

SEE ALSO

       Encode

       Jcode::Nihongo

       <http://www.iana.org/assignments/character-sets>

COPYRIGHT

       Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.8.8							    2005-02-19								Jcode(3pm)

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Insert a line as the first line into a very huge file

Discussion started by: shriek

2. UNIX for Dummies Questions & Answers

edit each line in the file

Discussion started by: chiru_h

3. UNIX for Dummies Questions & Answers

How to remove FIRST Line of huge text file on Solaris

Discussion started by: madoatz

4. Shell Programming and Scripting

Edit a line in a file with perl

Discussion started by: zed