Sponsored Content
Top Forums UNIX for Advanced & Expert Users File comaprsons for the Huge data files ( around 60G) - Need optimized and teh best way to do this Post 303025154 by vgersh99 on Thursday 25th of October 2018 10:23:01 AM
Old 10-25-2018
look into man grep with options -F and -f.
Or man fgrep
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

search and grab data from a huge file

folks, In my working directory, there a multiple large files which only contain one line in the file. The line is too long to use "grep", so any help? For example, if I want to find if these files contain a string like "93849", what command I should use? Also, there is oder_id number... (1 Reply)
Discussion started by: ting123
1 Replies

2. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

3. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

4. Shell Programming and Scripting

Split a huge data into few different files?!

Input file data contents: >seq_1 MSNQSPPQSQRPGHSHSHSHSHAGLASSTSSHSNPSANASYNLNGPRTGGDQRYRASVDA >seq_2 AGAAGRGWGRDVTAAASPNPRNGGGRPASDLLSVGNAGGQASFASPETIDRWFEDLQHYE >seq_3 ATLEEMAAASLDANFKEELSAIEQWFRVLSEAERTAALYSLLQSSTQVQMRFFVTVLQQM ARADPITALLSPANPGQASMEAQMDAKLAAMGLKSPASPAVRQYARQSLSGDTYLSPHSA... (7 Replies)
Discussion started by: patrick87
7 Replies

5. Shell Programming and Scripting

Splitting the Huge file into several files...

Hi I have to write a script to split the huge file into several pieces. The file columns is | pipe delimited. The data sample is as: 6625060|1420215|07308806|N|20100120|5572477081|+0002.79|+0000.00|0004|0001|......... (3 Replies)
Discussion started by: lakteja
3 Replies

6. Shell Programming and Scripting

Problem running Perl Script with huge data files

Hello Everyone, I have a perl script that reads two types of data files (txt and XML). These data files are huge and large in number. I am using something like this : foreach my $t (@text) { open TEXT, $t or die "Cannot open $t for reading: $!\n"; while(my $line=<TEXT>){ ... (4 Replies)
Discussion started by: ad23
4 Replies

7. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

8. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

9. UNIX for Dummies Questions & Answers

File comparison of huge files

Hi all, I hope you are well. I am very happy to see your contribution. I am eager to become part of it. I have the following question. I have two huge files to compare (almost 3GB each). The files are simulation outputs. The format of the files are as below For clear picture, please see... (9 Replies)
Discussion started by: kaaliakahn
9 Replies

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies
IPC::Run::Win32Helper(3)				User Contributed Perl Documentation				  IPC::Run::Win32Helper(3)

NAME
IPC::Run::Win32Helper - helper routines for IPC::Run on Win32 platforms. SYNOPSIS
use IPC::Run::Win32Helper; # Exports all by default DESCRIPTION
IPC::Run needs to use sockets to redirect subprocess I/O so that the select() loop will work on Win32. This seems to only work on WinNT and Win2K at this time, not sure if it will ever work on Win95 or Win98. If you have experience in this area, please contact me at barries@slaysys.com, thanks!. FUNCTIONS
optimize() Most common incantations of "run()" (not "harness()", "start()", or "finish()") now use temporary files to redirect input and output instead of pumper processes. Temporary files are used when sending to child processes if input is taken from a scalar with no filter subroutines. This is the only time we can assume that the parent is not interacting with the child's redirected input as it runs. Temporary files are used when receiving from children when output is to a scalar or subroutine with or without filters, but only if the child in question closes its inputs or takes input from unfiltered SCALARs or named files. Normally, a child inherits its STDIN from its parent; to close it, use "0<&-" or the "noinherit => 1" option. If data is sent to the child from CODE refs, filehandles or from scalars through filters than the child's outputs will not be optimized because "optimize()" assumes the parent is interacting with the child. It is ok if the output is filtered or handled by a subroutine, however. This assumes that all named files are real files (as opposed to named pipes) and won't change; and that a process is not communicating with the child indirectly (through means not visible to IPC::Run). These can be an invalid assumptions, but are the 99% case. Write me if you need an option to enable or disable optimizations; I suspect it will work like the "binary()" modifier. To detect cases that you might want to optimize by closing inputs, try setting the "IPCRUNDEBUG" environment variable to the special "notopt" value: C:> set IPCRUNDEBUG=notopt C:> my_app_that_uses_IPC_Run.pl optimizer() rationalizations Only for that limited case can we be sure that it's ok to batch all the input in to a temporary file. If STDIN is from a SCALAR or from a named file or filehandle (again, only in "run()"), then outputs to CODE refs are also assumed to be safe enough to batch through a temp file, otherwise only outputs to SCALAR refs are batched. This can cause a bit of grief if the parent process benefits from or relies on a bit of "early returns" coming in before the child program exits. As long as the output is redirected to a SCALAR ref, this will not be visible. When output is redirected to a subroutine or (deprecated) filters, the subroutine will not get any data until after the child process exits, and it is likely to get bigger chunks of data at once. The reason for the optimization is that, without it, "pumper" processes are used to overcome the inconsistancies of the Win32 API. We need to use anonymous pipes to connect to the child processes' stdin, stdout, and stderr, yet select() does not work on these. select() only works on sockets on Win32. So for each redirected child handle, there is normally a "pumper" process that connects to the parent using a socket--so the parent can select() on that fd--and to the child on an anonymous pipe--so the child can read/write a pipe. Using a socket to connect directly to the child (as at least one MSDN article suggests) seems to cause the trailing output from most children to be lost. I think this is because child processes rarely close their stdout and stderr explicitly, and the winsock dll does not seem to flush output when a process that uses it exits without explicitly closing them. Because of these pumpers and the inherent slowness of Win32 CreateProcess(), child processes with redirects are quite slow to launch; so this routine looks for the very common case of reading/writing to/from scalar references in a run() routine and converts such reads and writes in to temporary file reads and writes. Such files are marked as FILE_ATTRIBUTE_TEMPORARY to increase speed and as FILE_FLAG_DELETE_ON_CLOSE so it will be cleaned up when the child process exits (for input files). The user's default permissions are used for both the temporary files and the directory that contains them, hope your Win32 permissions are secure enough for you. Files are created with the Win32API::File defaults of FILE_SHARE_READ|FILE_SHARE_WRITE. Setting the debug level to "details" or "gory" will give detailed information about the optimization process; setting it to "basic" or higher will tell whether or not a given call is optimized. Setting it to "notopt" will highligh those calls that aren't optimized. win32_parse_cmd_line @words = win32_parse_cmd_line( q{foo bar 'baz baz' "bat bat"} ); returns 4 words. This parses like the bourne shell (see the bit about shellwords() in Text::ParseWords), assuming we're trying to be a little cross-platform here. The only difference is that "" is *not* treated as an escape except when it precedes punctuation, since it's used all over the place in DOS path specs. TODO: globbing? probably not (it's unDOSish). TODO: shebang emulation? Probably, but perhaps that should be part of Run.pm so all spawned processes get the benefit. LIMITATIONS: shellwords dies silently on malformed input like a" win32_spawn Spawns a child process, possibly with STDIN, STDOUT, and STDERR (file descriptors 0, 1, and 2, respectively) redirected. LIMITATIONS. Cannot redirect higher file descriptors due to lack of support for this in the Win32 environment. This can be worked around by marking a handle as inheritable in the parent (or leaving it marked; this is the default in perl), obtaining it's Win32 handle with "Win32API::GetOSFHandle(FH)" or "Win32API::FdGetOsFHandle($fd)" and passing it to the child using the command line, the environment, or any other IPC mechanism (it's a plain old integer). The child can then use "OsFHandleOpen()" or "OsFHandleOpenFd()" and possibly "<open FOO ""&BAR">> or "<open FOO ""&$fd>> as need be. Ach, the pain! Remember to check the Win32 handle against INVALID_HANDLE_VALUE. AUTHOR
Barries Slaymaker <barries@slaysys.com>. Funded by Perforce Software, Inc. COPYRIGHT
Copyright 2001, Barrie Slaymaker, All Rights Reserved. You may use this under the terms of either the GPL 2.0 ir the Artistic License. perl v5.12.1 2010-04-01 IPC::Run::Win32Helper(3)
All times are GMT -4. The time now is 10:02 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy