Thanks,
Tomorrow I will try and let you know if I need more help.
---------- Post updated at 10:47 PM ---------- Previous update was at 10:42 AM ----------
Hi Corona,
I tried your code and it working fine, Thank you so much but I have couple of doubt which I wanted to share with you
After Running the above code I check the data inside the file using more and cat comment. Its coming as per expectation but when I opening these files in notepad in windows system all records are coming in one line
Notepad file
Using more command
Do we have to insert newline for each record ?
Also, Can you suggest abount the performance of this command on 2 million records?
Regards
Ibrar Ahmad
Last edited by Don Cragun; 05-07-2015 at 01:09 AM..
Reason: Change ICODE tags to CODE tags for multi-line data.
*nix and windows are two seriously different systems. One of the differences is the designation of line ends in text files. *nix uses <NL> (= <new line>, 0x0A, \n), windows uses the combination of <CR> (= <carriage return>, 0x0D, \r) and <NL>.
So you should stay on one system. If working on both is unavoidable, you'll need to take extra care to convert the files. You can use conversion tools like unix2dos, iconv, or recode. Or you can print the \r explicitly in the above awk solution.
I don't have a file with 2 million records at hand, so its difficult to predict, esp. on a windows system. Still I guess it should work in a few seconds unless many files are to be opened/closed again and again..
Using:
could have a problem on some versions of awk because the precedence between concatenation of strings and the output redirection operator in print statements is not specified by the standards. Some implementations of awk will treat this code as:
(possibly giving a syntax error, and certainly not producing the output files you want) and others will treat it as:
Since it is working on your system, we can assume that your version of awk is doing the latter.
This code doesn't close and reopen files, so, if it doesn't give an error (too many open files) it should run pretty quickly. It might run slightly faster if you move appending the "_" out of the loop and just do it once instead of two million times:
As was noted before, if this fails on your two million record file with a too many open files error, you'll have to build in code to close and open files. If you open and close files for each output line, that will run considerably slower. But, doing anything smarter than that would require you to evaluate the input file to see if lines directed to the same output file are closely grouped in the input, if there are common occurrences of adjacent lines that will be directed to the same output file, etc. that can be used to make smarter decisions about when to close a file and when (if ever) a file needs to be reopened.
Since you couldn't even guess at how many output files would be produced from your input file, I assume you have not tried to evaluate any of the above questions that might help produce more efficient code if you do have to close and reopen files.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
In general for collecting, gating lines that are related, one could do an initial stable sort on the field of interest, then, while reading with a subsequent (say awk) code, close previous file and open new file when the field content changes.
Obviously this touches the data an extra time (at least), but is conceptually simple and would never run into the limit for open files. Assuming a pipe connection, then no extra files (other than from the sort) are created.
hdr=$(cut -c1 $path$file|head -1)#extract header”H”
trl=$(cut -c|path$file|tail -1)#extract trailer “T”
SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name
If; then # start loop if it is a header
While read I #read file
Do... (4 Replies)
Hi All,
I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom.
from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Hi All,
I am new to Scripting language.
I want to split a file and create several subfiles using Perl script.
Example :
File format :
Sourcename ID Date Nbr
SU IMYFDJ 9/17/2012 5552159976555
SU BWZMIG 9/14/2012 1952257857887
AR PEHQDF 11/26/2012 ... (13 Replies)
I m writing a script to check Server Hardening.
The problem is whenever i add new point it grows and it become very tedious to edit the script file.
Is there any way of making them separate and call them from one base script?
Is it possible to define global variable that can be accessed via... (5 Replies)
Can anyone help me in giving a script for the below scenario
I have file from the source increamenting in size...
I require to write a script witch will move the data to the new file once the file reaches 50MB of size .
This needs If the first file reaches 50MB then my script has to generate... (3 Replies)
Hello All,
I have a small problem with file group/splitting and I am trying to get the best way to perform this in unix. I am trying with awk but need some suggestion what would be the best and fastest way to-do it.
Here is the problem. I have a fixed length file with filled with product... (4 Replies)
Hi,
I need to split the file lines in below format as.
Input file : Sample.txt
<Rule expression="DeliverToCompID IS NULL" invert="true">
<Rule field="PossDupFlag" value="Y" >
<Rule expression="OrdStatus = '2' AND OrigClOrdID IS NULL">
Output... (5 Replies)