Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Reducing input file size after pattern search Post 302996288 by Don Cragun on Sunday 23rd of April 2017 04:17:28 PM
Old 04-23-2017
Hi Xterra,
I think you don't understand what is being suggested. If you have a file containing a million records, each of those records has a 1st line that is one of four values, and you want to create four output files where each of those output files contains all records that have the same 1st line; then you do not want to read that input file 4 times. You want to read it once and create all of your 4 output files in one pass. Doing this you read a million records, write a million records, and you're done.

What you are asking to do instead is read a million records, write ~250000 records to one file, and write ~750000 records to another file; then you read ~750000 records, write ~250000 records to one file, and write ~500000 records to another file; then you read ~500000 records, write ~250000 records to one file, and write ~250000 records to another file; and then you read ~250000 records, write ~250000 records to one file and write 0 records to another file. Why would you want to read ~2.5 million records and write ~2.5 million records instead of reading 1 million records and write 1 million records?

The code that you currently have is reading 4 million records and writing 1 million records (i.e., 5 million I/O operations). What you are asking to do would read 2.5 million records and write 2.5 million records (i.e., 5 million I/O operations). Even if we skip the last read and write and just rename one of the last two output files, your plan still has 4.5 million I/O operations instead of the 2 million I/O operations being proposed by RudiC and jim mcnamara.

Is there something else that you haven't told us about your data that would affect what I assume you are trying to do?
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help? (2 Replies)
Discussion started by: frustrated1
2 Replies

2. Programming

reducing size of executeable in C under Unix

Hi, Could any one tell me how to reduce the size of an executable file of C under Unix. thanks (2 Replies)
Discussion started by: useless79
2 Replies

3. Solaris

reducing to root file size

My root file size has reached 80% and I am looking where all i can reduce the file size . Here is the output of top directories in / . To me none of this looks useful but not sure . We use an appplication and email. Which all can be deleted . Please advise . 2016989 989445 /var 930059 ... (2 Replies)
Discussion started by: Hitesh Shah
2 Replies

4. Shell Programming and Scripting

How to assign the Pattern Search string as Input Variable

guys, I need to know how to assing pattern matched string as an input command variable. Here it goes' My script is something like this. ./routing.sh <Server> <enable|disable> ## This Script takes an input <Server> variable from this line of the script ## echo $1 | egrep... (1 Reply)
Discussion started by: raghunsi
1 Replies

5. Shell Programming and Scripting

Search for a pattern in a String file and count the occurance of each pattern

I am trying to search a file for a patterns ERR- in a file and return a count for each of the error reported Input file is a free flowing file without any format example of output ERR-00001=5 .... ERR-01010=10 ..... ERR-99999=10 (4 Replies)
Discussion started by: swayam123
4 Replies

6. Shell Programming and Scripting

How to use sed to search a particular pattern in a file backward after a pattern is matched.?

Hi, I have two files file1.txt and file2.txt. Please see the attachments. In file2.txt (which actually is a diff output between two versions of file1.txt.), I extract the pattern corresponding to 1172c1172. Now ,In file1.txt I have to search for this pattern 1172c1172 and if found, I have to... (9 Replies)
Discussion started by: saurabh kumar
9 Replies

7. Shell Programming and Scripting

Reducing the decimal points of numbers (3d coordinates) in a file; how to input data to e.g. Python

I have a file full of coordinates of the form: 37.68899917602539 58.07500076293945 57.79100036621094 The numbers don't always have the same number of decimal points. I need to reduce the decimal points of all the numbers (there are 128 rows of 3 numbers) to 2. I have tried to do this... (2 Replies)
Discussion started by: crunchgargoyle
2 Replies

8. Shell Programming and Scripting

Search pattern in a file taking input from another file

Hi, Below is my requirement File1: svasjsdhvassdvasdhhgvasddhvasdhasdjhvasdjsahvasdjvdasjdvvsadjhv vdjvsdjasvdasdjbasdjbasdjhasbdasjhdbjheasbdasjdsajhbjasbjasbhddjb svfsdhgvfdshgvfsdhfvsdadhfvsajhvasjdhvsajhdvsadjvhasjhdvjhsadjahs File2: sdh hgv I need a command such that... (8 Replies)
Discussion started by: imrandec85
8 Replies

9. Shell Programming and Scripting

Grep command to search pattern corresponding to input from user

One more question: I want to grep "COS_12_TM_4 pattern from a file look likes : "COS_12_TM_4" " ];I am taking scan_out as the input from the user. How to search "COS_12_TM_4" in the file which is corresponds to scan_out (12 Replies)
Discussion started by: Preeti Chandra
12 Replies

10. UNIX for Beginners Questions & Answers

Grep/awk using a begin search pattern and end search pattern

I have this fileA TEST FILE ABC this file contains ABC; TEST FILE DGHT this file contains DGHT; TEST FILE 123 this file contains ABC, this file contains DEF, this file contains XYZ, this file contains KLM ; I want to have a fileZ that has only (begin search pattern for will be... (2 Replies)
Discussion started by: vbabz
2 Replies
vis(1)							      General Commands Manual							    vis(1)

NAME
vis, inv - make unprintable and non-ASCII characters in a file visible or invisible SYNOPSIS
file ... file ... DESCRIPTION
reads characters from each file in sequence and writes them to the standard output, converting those that are not printable or not ASCII into a visible form. inv performs the inverse function, reading printable characters from each file, returning them to non-printable or non-ASCII form, if appropriate, then writing them to standard output; Non-printable ASCII characters are represented using C-like escape conventions: backslash backspace escape form-feed new-line carriage return space horizontal tab vertical tab the character whose ASCII code is the 3-digit octal number n. the character whose ASCII code is the 2-digit hexadecimal number n. Non-ASCII single- or multi-byte characters are examined one byte at a time. For each byte, if it can be displayed as an ASCII character, it is treated as if it is an ASCII character; Otherwise, it is represented in the following conventions: the 8-bit character whose code value is the 3-digit octal number n. the 8-bit character whose code value is the 2-digit hexadecimal number n. Space, horizontal-tab, and new-line characters can be treated as printable (and therefore passed unaltered to the output) or non-printable depending on the options selected. Backslash, although printable, is expanded by vis, to a pair of backslashes so that when they are passed back through inv, they convert back to a single backslash. If no input file is given, or if the argument is encountered, and inv read from the standard input. Options and recognize the following options: Treat new-line, space, and horizontal tab as non-printable characters. expands them visibly as and rather than passing them directly to the output. discards these characters, expecting only the printable expansions. New-line characters are inserted by every 16 bytes so that the output will be in a form that is usable by most editors. Make and silent about non-existent files, identical input and output, and write errors. Normally, no input file can be the same as the output file unless it is a special file. Treat horizontal-tab and space characters as non-printable in the same manner that treats them. Cause output to be unbuffered (byte-by-byte); normally, output is buffered. Cause output to be in hexadecimal form rather than the default octal form. Either form is accepted to as input. EXTERNAL INFLUENCES
Environment Variables determines the language in which messages are displayed. International Code Set Support Single- and multi-byte character code sets are supported. WARNINGS
Redirecting output to an input file destroys the original data. Therefore, command forms such as should be avoided unless the source file can be safely discarded. AUTHOR
was developed by HP. SEE ALSO
cat(1), echo(1), od(1). vis(1)
All times are GMT -4. The time now is 08:23 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy