Home
Man
Search
Today's Posts
Register

BSD, Linux, and UNIX shell scripting Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

How to control grep output intact for each matching line?

Tags
grep, line, mutiple files, output, pipe, shell scripts, tac

Login to Reply

 
Thread Tools Search this Thread
# 1  
Old 4 Weeks Ago
How to control grep output intact for each matching line?

I have multiple (~80) files (some can be as big as 30GB of >1 billion of lines!) to grep on a pattern, and piped the match to a single file. I have a 96-core machine so that each grep job was sent to the background to speed up the search:
Code:
file1.tab
chr1A_part1    123241847    123241848
chr1A_part1    123241848    123241849
chr1A_part1    123241849    123241850
chr1A_part1    123241850    123241851
......

The input files have uniformly 3 fields each row, so should the output file,
Code:
for file in $(cat files.list); do 
grep -F chr1A ${file} >> subset_chr1A.tab &
done

but I found some of the matching lines are broken and the output file became a mess!
Code:
subset_chr1A.tab
chr1A_part1    123241847    123241848
chr1A_part1    123241848    123241849
chr1A_part1    1232
41849    123241850
ch1
chr1A_part1    12
3241850    
chr1A_part1    123441848    123441849
123541851
...

It seems to me the problem is from the writing of the pipe, as 80 grep jobs for 80 files are writing to the same output file. By default grep prints matching lines so that I assume each row should be printed as a whole, but it did not in my case.

What is wrong here?

Last edited by yifangt; 4 Weeks Ago at 02:20 PM.. Reason: typos
# 2  
Old 4 Weeks Ago
Buffering will make a mess of this, bundling arbitrary blocks into one write. These arbitrary blocks don't care much where lines begin and end. Long enough lines could conceivably take more than one write!

If you have GNU awk, --line-buffered may help, but will have a big performance cost.

You could also send the output to separate files and cat them together later.
The Following User Says Thank You to Corona688 For This Useful Post:
yifangt (4 Weeks Ago)
# 3  
Old 4 Weeks Ago
I will do with the second suggestion. Thanks!
# 4  
Old 4 Weeks Ago
Why not forgo the loop?

Code:
grep -F chr1A file*.tab > subset_chr1A.tab

# 5  
Old 4 Weeks Ago
True, the limit is likely to be disk, not CPU.
# 6  
Old 4 Weeks Ago
Thanks Rudic!
Before I try your method, does this grep -F chr1A file*.tab swallow all the 80 files (~2400GB!) in memory first?
# 7  
Old 4 Weeks Ago
I don't think it consumes too much memory - it read the files line by line, greps each, and drops, or outputs, it.
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Grep file starting from pattern matching line kristinu UNIX for Beginners Questions & Answers 1 04-21-2017 06:44 AM
Printing the output of a gzip command intact SkySmart Shell Programming and Scripting 4 12-12-2016 05:31 AM
Grep log file to get line above matching pattern wahi80 Shell Programming and Scripting 4 09-21-2015 11:04 PM
Help in removing control M and Line feed in output file. Bipin Kumar Shell Programming and Scripting 7 09-05-2013 06:03 PM
how to grep a number from output line nitin_aaa27 UNIX for Dummies Questions & Answers 5 02-24-2010 01:27 AM
find out line number of matching string using grep sarbjit Shell Programming and Scripting 10 09-09-2009 05:34 AM
Identify matching data in a file and output to original line, in perl Pcushing Shell Programming and Scripting 4 12-06-2008 08:41 AM
Grep or other ways to output line above and/or below searched line sammac UNIX for Dummies Questions & Answers 2 07-24-2008 03:28 AM
How to grep / zgrep to output ONLY the matching filename and line number? vvaidyan UNIX for Dummies Questions & Answers 3 03-12-2008 06:33 PM
Grep Line with Matching Fields hemangjani UNIX for Advanced & Expert Users 13 08-10-2007 12:46 PM


All times are GMT -4. The time now is 12:01 PM.

Unix & Linux Forums Content Copyright 1993-2018. All Rights Reserved.
UNIX.COM Login
Username:
Password:  
Show Password