Split a huge data into few different files?! Post: 302366815

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script error to split huge data one by one.

Below is my perl script: #!/usr/bin/perl open(FILE,"$ARGV") or die "$!"; @DATA = <FILE>; close FILE; $join = join("",@DATA); @array = split( ">",$join); for($i=0;$i<=scalar(@array);$i++){ system ("/home/bin/./program_name_count_length MULTI_sequence_DATA_FILE -d...

2. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ...

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment...

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

into small files. i need to add a head.txt and tail.txt into small files at the begin and end, and give a name as q1.xml q2.xml q3.xml .... thank you very much.

5. Shell Programming and Scripting

Split a file into several files using a data

Hi All, I have file(File1) with data like below: 102100|LName|Gender|Company|Branch|Bday|Salary|Age 102100|bbbb|male|cccc|dddd|19900814|15000|20| 102101|asdg|male|gggg|ksgu|19911216||| 102102|bdbm|male|kkkk|acke|19931018||23| 102102|kfjg|male|kkkc|gkgg|19921213|14000|24|...

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below ...

7. Shell Programming and Scripting

Split a folder with huge number of files in n folders

We have a folder XYZ with large number of files (>350,000). how can i split the folder and create say 10 of them XYZ1 to XYZ10 with 35,000 files each. (doesnt matter which files go where).

8. Shell Programming and Scripting

Split JSON to different data files

Hi Gurus, I have below JSON file, now I want to rewrite this file into a new file. I will appreciate if anyone can help me to provide the solution...I can't use jq. { "_id": "3ad893cb4cf1560add7b4caffd4b6126", "_rev": "1-1f0ce165e1d210319cf6e9f9c6ff654f", "name":...

9. UNIX for Advanced & Expert Users

File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this

I have 2 large file (.dat) around 70 g, 12 columns but the data not sorted in both the files.. need your inputs in giving the best optimized method/command to achieve this and redirect the not macthing lines to the thrid file ( diff.dat) File 1 - 15 columns File 2 - 15 columns Data is...

10. Solaris

Split huge File System

Gents I have huge NAS File System as /sys with size 10 TB and I want to Split each 1TB in spirit File System to be mounted in the server. How to can I do that without changing anything in the source. Please your support.

LEARN ABOUT ULTRIX

sed

sed(1)							      General Commands Manual							    sed(1)

Name
       sed - stream text editor

Syntax
       sed [-n] [-e script] [-f sfile] [file...]

Description
       The  command  copies  the  named  files	(standard input default) to the standard output, edited according to a script of commands.  The -f
       option causes the script to be taken from file sfile; these options accumulate.	If there is just one -e option and no -f's,  the  flag	-e
       may  be omitted.  The -n option suppresses the default output; inclusion in the script of a comment command of the form also suppresses the
       default output.	(See the description of the `#' command.)

       A script consists of editing commands of the following form:

	      [address [, address] ] function [arguments]

       Nominally, there is one command per line; but commands can be concatenated on a line by being separated with semicolons

       In normal operation cyclically copies a line of input into a pattern space (unless there is something left after a `D' command), applies in
       sequence all commands whose addresses select that pattern space, and at the end of the script copies the pattern space to the standard out-
       put (except under -n) and deletes the pattern space.

       An address is either a decimal number that counts input lines cumulatively across files, a `$' that addresses the last line of input, or  a
       context address, `/regular expression/', in the style of ed(1) modified thus:

	  o    In  a  context  address, the construction ?regular expression?, where ? is any character, is identical to regular expression. Note
	       that in the context address xabcxdefx, the second x stands for itself, so that the regular expression is abcxdef.

	  o    The escape sequence `
' matches a new line embedded in the pattern space.

	  o    A command line with no addresses selects every pattern space.

	  o    A command line with one address selects each pattern space that matches the address.

	  o    A command line with two addresses selects the inclusive range from the first pattern space that matches the first  address  through
	       the  next  pattern  space  that matches the second.  (If the second address is a number less than or equal to the line number first
	       selected, only one line is selected.)  Thereafter the process is repeated, looking again for the first address.

       Editing commands can be applied only to non-selected pattern spaces by use of the negation function `!' (below).

       In the following list of functions the maximum number of permissible addresses for each function is indicated in parentheses.

       An argument denoted text consists of one or more lines, all but the last of which end with `' to hide the new line.  Backslashes  in  text
       are  treated  like  backslashes in the replacement string of an `s' command, and may be used to protect initial blanks and tabs against the
       stripping that is done on every script line.

       An argument denoted rfile or wfile must terminate the command line and must be preceded by exactly one blank.  Each wfile is created before
       processing begins.  There can be at most 10 distinct wfile arguments.

       (1)a
       text
	       Append.	Place text on the output before reading the next input line.

       (2)b label
	       Branch to the `:' command bearing the label.  If label is empty, branch to the end of the script.

       (2)c
       text
	       Change.	 Delete  the  pattern space.  With 0 or 1 address or at the end of a 2-address range, place text on the output.  Start the
	       next cycle.

       (2)d    Delete the pattern space.  Start the next cycle.

       (2)D    Delete the initial segment of the pattern space through the first new line.  Start the next cycle.

       (2)g    Replace the contents of the pattern space by the contents of the hold space.

       (2)G    Append the contents of the hold space to the pattern space.

       (2)h    Replace the contents of the hold space by the contents of the pattern space.

       (2)H    Append the contents of the pattern space to the hold space.

       (1)i
       text
	       Insert.	Place text on the standard output.

       (2)n    Copy the pattern space to the standard output.  Replace the pattern space with the next line of input.

       (2)N    Append the next line of input to the pattern space with an embedded new line.  (The current line number changes.)

       (2)p    Print.  Copy the pattern space to the standard output.

       (2)P    Copy the initial segment of the pattern space through the first new line to the standard output.

       (1)q    Quit.  Branch to the end of the script.	Do not start a new cycle.

       (2)r rfile
	       Read the contents of rfile.  Place them on the output before reading the next input line.

       (2)s/regular expression/replacement/flags
	       Substitute the replacement string for instances of the regular expression in the pattern space.	Any character may be used  instead
	       of `/'.	For a more complete description see The flags is zero or more of

	       g       Global.	Substitute for all nonoverlapping instances of the regular expression rather than just the first one.

	       p       Print the pattern space if a replacement was made.

	       w wfile Write.  Append the pattern space to wfile if a replacement was made.

       (2)t label
	       Test.   Branch  to  the `:' command bearing the label if any substitutions have been made since the most recent reading of an input
	       line or execution of a `t'.  If label is empty, branch to the end of the script.

       (2)w wfile
	       Write.  Append the pattern space to wfile.

       (2)x    Exchange the contents of the pattern and hold spaces.

       (2)y/string1/string2/
	       Transform.  Replace all occurrences of characters in string1 with the corresponding character in string2.  The lengths  of  string1
	       and string2 must be equal.

       (2)! function
	       Don't.  Apply the function (or group, if function is `{') only to lines not selected by the address(es).

       (0): label
	       This command does nothing; it bears a label for `b' and `t' commands to branch to.

       (1)=    Place the current line number on the standard output as a line.

       (2){    Execute the following commands through a matching `}' only when the pattern space is selected.

       (0)     An empty command is ignored.

       (0)#    With one exception, any line whose first nonblank character is a number sign is a comment and is ignored.  The exception is that if
	       the first such line encountered contains only the number sign followed by the letter `n' the default output is suppressed as if the
	       -n option were in force.

Options
       -e 'command;command...'
	       Uses command;command...	as the editing script.	If no -f option is given, the -e keyword can be omitted.  For example, the follow-
	       ing two command are functionally identical:
	       % sed -e 's/DIGITAL/Digital/g' summary > summary.out
	       % sed 's/DIGITAL/Digital/g' summary > summary.out

       -f sfile
	       Uses specified file as input file of commands to be executed.  Can be used with -e option to apply both	explicit  commands  and  a
	       separate script file.

       -n      Suppresses  all	normal	output, writing only lines explicitly written by the `p' or `P' commands or by an `s' command with the `p'
	       flag.

See Also
       awk(1), ed(1), grep(1), lex(1)

																	    sed(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script error to split huge data one by one.

Discussion started by: patrick87

2. Shell Programming and Scripting

Problem running Perl Script with huge data files

Discussion started by: ad23

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

Discussion started by: lv99

4. Shell Programming and Scripting

how to split a huge file by every 100 lines

Discussion started by: dtdt

5. Shell Programming and Scripting

Split a file into several files using a data

Discussion started by: sarav.shan

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Discussion started by: KishM

7. Shell Programming and Scripting

Split a folder with huge number of files in n folders

Discussion started by: AlokKumbhare