07-09-2009
Can I split a 10GB file into 1 GB sizes using my repeating data pattern
I'm not a unix guy so excuses my ignorance... I'm the database ETL guy.
I'm trying to be proactive and devise a plan B for a ETL process where I expect a file 10X larger than what I process daily for a recast job. The ETL may handle it but I just don't know.
This file may need to be split and we don't want to lose related data. I assume it would be easier to do it at the unix level rather than the etl tool providing there are no limitations to file sizes with the unix commands.
The file will most likely be 10GB +- a few GB. It is unknown at this time
The basic file format is as follows with the first 3 characters being the record type (100,401,404,410,411)
The file must be split into segments equal to a daily run approximately 1gb in size and it has to occur just before a 100 record as all the rows that follow a 100 belong together.
1001104vvbvnbvd
4011104ghghghgh
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
4103445kkjkljlk
4103445kkjkljlk
4113445kkjkljlk
4043445kkjkljlk
10011ffgfgg1250
4011104fffhghgh
404111kjddfjkdf
404111kjdkrtrdf
etc...
thanks in advance. I think we use HP-UX
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this.
For example:
split -l 3000000 filename.txt
This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
2. Shell Programming and Scripting
Dear all,
I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc
each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies
3. Shell Programming and Scripting
I want a shell script that will traverse a file system starting at specific path.
And look at all file names for repeating sequences of and remove them from the file name.
The portion of the name that gets removed has to be a repeating sequence of the same characters.
So the script would... (3 Replies)
Discussion started by: z399y
3 Replies
4. Shell Programming and Scripting
Hello!
Have some problem with extract files from saved session.
File contains any kind of special/printable characters.
DATA NumberA DATA
DATA Begin
DATA1.1
DATA1.2 NumberB1 DATA1.3
DATA1.4
End DATA
DATA
DATA Begin
DATA2.1
DATA2.2 NumberB2 DATA2.3
DATA2.4
End DATA
DATA
... (4 Replies)
Discussion started by: vvild
4 Replies
5. UNIX for Dummies Questions & Answers
I want to extract the last rows of a data file, similar to that one below:
C1 xxx
C2 rrr
C3 ttt
....
Cn-1 hhh
Cn bbb
C1 yyy
C2 sss
C3 uuu
...
Cn-1 iii
Cn ccc
...
I just want to extract the final rows between C1 and Cn at each data file. n is not a constant,... (2 Replies)
Discussion started by: natasha
2 Replies
6. Shell Programming and Scripting
Hi,
I have an sqlplus output file using the character ';' as a delimiter and I would like to replace the fields without datas (i.e delimited by ';;') by ';0;'
Example: my sqlplus output:
11;22;33;44;;;77;;
What I would like to have:
11;22;33;44;0;0;77;0;
Thanks in advance for your... (2 Replies)
Discussion started by: popesk
2 Replies
7. Solaris
Hi Team
I have one 10 Gb log file
I want to split it into say 10 of 1-1Gb file
pls share ur experiences how to do this?
Thanks in advance, (3 Replies)
Discussion started by: zimmyyash
3 Replies
8. Shell Programming and Scripting
Hi ,
I have huge files around 400 mb, which has clob data and have diffeent scenarios:
I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria.
Scenario 1:
file name : scenario_1.txt
... (2 Replies)
Discussion started by: sol_nov
2 Replies
9. Shell Programming and Scripting
I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY.
I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies
10. UNIX for Advanced & Expert Users
Hello All,
I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K
Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as
File1: A,B,B,B,B,K
File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies
LEARN ABOUT DEBIAN
ansi_ctrlu
term::ansi::ctrl::unix(3tcl) Terminal control term::ansi::ctrl::unix(3tcl)
__________________________________________________________________________________________________________________________________________________
NAME
term::ansi::ctrl::unix - Control operations and queries
SYNOPSIS
package require Tcl 8.4
package require term::ansi::ctrl::unix ?0.1.1?
::term::ansi::ctrl::unix::import ?ns? ?arg...?
::term::ansi::ctrl::unix::raw
::term::ansi::ctrl::unix::raw
::term::ansi::ctrl::unix::columns
::term::ansi::ctrl::unix::rows
_________________________________________________________________
DESCRIPTION
WARNING: This package is unix-specific and depends on the availability of two unix system commands for terminal control, i.e. stty and
tput, both of which have to be found in the $PATH. If any of these two commands is missing the loading of the package will fail.
The package provides commands to switch the standard input of the current process between raw and cooked input modes, and to query the size
of terminals, i.e. the available number of columns and lines.
API
INTROSPECTION
::term::ansi::ctrl::unix::import ?ns? ?arg...?
This command imports some or all attribute commands into the namespace ns. This is by default the namespace ctrl. Note that this is
relative namespace name, placing the imported command into a child of the current namespace. By default all commands are imported,
this can howver be restricted by listing the names of the wanted commands after the namespace argument.
OPERATIONS
::term::ansi::ctrl::unix::raw
This command switches the standard input of the current process to raw input mode. This means that from then on all characters typed
by the user are immediately reported to the application instead of waiting in the OS buffer until the Enter/Return key is received.
::term::ansi::ctrl::unix::raw
This command switches the standard input of the current process to cooked input mode. This means that from then on all characters
typed by the user are kept in OS buffers for editing until the Enter/Return key is received.
::term::ansi::ctrl::unix::columns
This command queries the terminal connected to the standard input for the number of columns available for display.
::term::ansi::ctrl::unix::rows
This command queries the terminal connected to the standard input for the number of rows (aka lines) available for display.
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category term of
the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for
either package and/or documentation.
KEYWORDS
ansi, columns, control, cooked, input mode, lines, raw, rows, terminal
CATEGORY
Terminal control
COPYRIGHT
Copyright (c) 2006-2011 Andreas Kupries <andreas_kupries@users.sourceforge.net>
term 0.1.1 term::ansi::ctrl::unix(3tcl)