Sponsored Content
Top Forums Shell Programming and Scripting Attach filename to wc results on massive number of files Post 303032281 by bakunin on Thursday 14th of March 2019 02:46:34 PM
Old 03-14-2019
Quote:
Originally Posted by yifangt
The problem with my script is the "echo -n $f" "; always accomplishes first, and the ${f}_R1.fq.gz | wc -l part is behind too much and the result was not aligned as expected.
Actually this is a very interesting problem. It is hard simulate without actually create some terabytes of files that are similar in size to what you have to process, therefore, before i start to actually do that, i'd like to offer a few theories first which you may verify:

my suspicion is that the problem is the buffered nature of <stdout>. From time to time this buffer is flushed and because the output of echo is available already it gets written into the file but since the zcat still runs at that time it will be written at a much later time. Maybe the following might help. I used printf instead of echo, but that is not the point: to execute the output statement the subshell has to be finished, therefore the line should get printed completely or not at all. Because the whole process gets put in background the original order of the filenames will no longer be retained - maybe no concern to you but you should be aware of that.

Another point is the number of processes you start: starting an (in principle unlimited) amount of background processes at the same time is always a bit of an hazard. The script might work well with 10 or 20 files generating 10 or 20 background processes but a directory may as well hold millions of files. No system would survive an attempt to start a million background processes, no matter how small they are and how many processors you have. You may want to implement some logic to only have some maximum number of bround processes running concurrently.

Code:
$(printf "%s\t%s\n" "$f" $(zcat ${f}_R1.fq.gz | wc -l) ) >> raw_reads_count.table1 &

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk | stop after specified number of results

I am searching some rather large text files using grep and or awk. What I would like to know is if there is a way (either with grep, awk, or realy any other unix tool) to stop the search when a predifined number of results are returned. I would like to do this for speed purpuses. When i get... (6 Replies)
Discussion started by: evan108
6 Replies

2. UNIX for Advanced & Expert Users

pine does'nt attach files

Hello All, I am maintaining a server and I use pine as MUA and sendmail as MTA. Suddenly many users in the network face the problem of not being able to attach files using pine. I checked the sendmail.cf file and found a variable "MaxMessageSize = 1000000". Eventhough the message size... (2 Replies)
Discussion started by: maybemedic
2 Replies

3. Shell Programming and Scripting

attach 2 files using mailx

if test.dat is the file cat test.dat|uuencode test.dat|mailx -s "subject" mailid can be used for attaching test.dat how can i attach more than one file to a mail using mailx (2 Replies)
Discussion started by: anumkoshy
2 Replies

4. Shell Programming and Scripting

attach multiple files in email

I am trying to send multiple files as attachment in one email, I tried to search but couldn't find. Please let me know if any solutions. (2 Replies)
Discussion started by: mgirinath
2 Replies

5. Shell Programming and Scripting

Filename from splitting files to have the same filename of the original file with counter value

Hi all, I have a list of xml file. I need to split the files to a different files when see the <ko> tag. The list of filename are B20090908.1100-20090908.1200_CDMA=1,NO=2,SITE=3.xml B20090908.1200-20090908.1300_CDMA=1,NO=2,SITE=3.xml B20090908.1300-20090908.1400_CDMA=1,NO=2,SITE=3.xml ... (3 Replies)
Discussion started by: natalie23
3 Replies

6. Shell Programming and Scripting

How to attach two files in unix script

Hi, My script has to send 2 files as a separate attachment(Note : files to be sent without zipping) to the specified email id. Below code was used but it is not attaching the file as expected instead the file contents are displayed in the body of the email. Kindly,help with your... (22 Replies)
Discussion started by: meva
22 Replies

7. UNIX for Dummies Questions & Answers

massive tarred grib files totally unacceptable

Hi, I have 7 terabytes of tar files, one for every single day since 1980. Inside these tar files are GRIB files, each with 100+ variables. There's 8 GRIBs in each tar, corresponding to different times of the day. I need 6 friggin variables..., and it takes TWO WEEKS TO EXTRACT ALL THE TAR FILES... (3 Replies)
Discussion started by: sammysoil
3 Replies

8. Shell Programming and Scripting

counting the number of characters in the filename of all files in a directory?

I am trying to display the output of ls and also print the number of characters in EVERY file name. This is what I have so far: #!/bin/sh for x in `ls`; do echo The number of characters in x | wc -m done Any help appreciated (1 Reply)
Discussion started by: LinuxNubBrah
1 Replies

9. Shell Programming and Scripting

How to count number of results found?

Hi guys, I'm struggling with this one, any help is appreciated. I have File1 with hundreds of unique words, like this: word1 word2 word3 I want to count each word from file1 in file2 and return how many times each word is found. I tried something like this: for i in $(cat file1); do... (13 Replies)
Discussion started by: demmel
13 Replies

10. Shell Programming and Scripting

Adding filename and line number from multiple files to final file

Hi all, I have 20 files (file001.txt upto file020.txt) and I want to read them from 3rd line upto end of file (line 1002). But in the final file they should appear to start from line 1. I need following kind of output in a single file: Filename Line number 2ndcolumn 4thcolumn I... (14 Replies)
Discussion started by: bioinfo
14 Replies
Bio::Structure::Atom(3pm)				User Contributed Perl Documentation				 Bio::Structure::Atom(3pm)

NAME
Bio::Structure::Atom - Bioperl structure Object, describes an Atom SYNOPSIS
#add synopsis here DESCRIPTION
This object stores a Bio::Structure::Atom FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Kris Boulez Email kris.boulez@algonomics.com APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new() Title : new() Usage : $struc = Bio::Structure::Atom->new( -id => 'human_id', ); Function: Returns a new Bio::Structure::Atom object from basic constructors. Probably most called from Bio::Structure::IO. Returns : a new Bio::Structure::Atom object x() Title : x Usage : $x = $atom->x($x); Function: Set/gets the X coordinate for an Atom Returns : The value for the X coordinate of the Atom (This is just a number, it is expected to be in Angstrom, but no garantees) Args : The X coordinate as a number y() Title : y Usage : $y = $atom->y($y); Function: Set/gets the Y coordinate for an Atom Returns : The value for the Y coordinate of the Atom (This is just a number, it is eypected to be in Angstrom, but no garantees) Args : The Y coordinate as a number z() Title : z Usage : $z = $atom->z($z); Function: Set/gets the Z coordinate for an Atom Returns : The value for the Z coordinate of the Atom (This is just a number, it is ezpected to be in Angstrom, but no garantees) Args : The Z coordinate as a number xyz() Title : xyz Usage : ($x,$y,$z) = $atom->xyz; Function: Gets the XYZ coordinates for an Atom Returns : A list with the value for the XYZ coordinate of the Atom Args : residue() Title : residue Usage : Function: No code here, all parent/child stuff via Entry Returns : Args : icode() Title : icode Usage : $icode = $atom->icode($icode) Function: Sets/gets the icode Returns : Returns the icode for this atom Args : reference to an Atom serial() Title : serial Usage : $serial = $atom->serial($serial) Function: Sets/gets the serial number Returns : Returns the serial number for this atom Args : reference to an Atom occupancy() Title : occupancy Usage : $occupancy = $atom->occupancy($occupancy) Function: Sets/gets the occupancy Returns : Returns the occupancy for this atom Args : reference to an Atom tempfactor() Title : tempfactor Usage : $tempfactor = $atom->tempfactor($tempfactor) Function: Sets/gets the tempfactor Returns : Returns the tempfactor for this atom Args : reference to an Atom segID() Title : segID Usage : $segID = $atom->segID($segID) Function: Sets/gets the segID Returns : Returns the segID for this atom Args : reference to an Atom pdb_atomname() Title : pdb_atomname Usage : $pdb_atomname = $atom->pdb_atomname($pdb_atomname) Function: Sets/gets the pdb_atomname (atomname used in the PDB file) Returns : Returns the pdb_atomname for this atom Args : reference to an Atom element() Title : element Usage : $element = $atom->element($element) Function: Sets/gets the element Returns : Returns the element for this atom Args : reference to an Atom charge() Title : charge Usage : $charge = $atom->charge($charge) Function: Sets/gets the charge Returns : Returns the charge for this atom Args : reference to an Atom sigx() Title : sigx Usage : $sigx = $atom->sigx($sigx) Function: Sets/gets the sigx Returns : Returns the sigx for this atom Args : reference to an Atom sigy() Title : sigy Usage : $sigy = $atom->sigy($sigy) Function: Sets/gets the sigy Returns : Returns the sigy for this atom Args : reference to an Atom sigz() Title : sigz Usage : $sigz = $atom->sigz($sigz) Function: Sets/gets the sigz Returns : Returns the sigz for this atom Args : reference to an Atom sigocc() Title : sigocc Usage : $sigocc = $atom->sigocc($sigocc) Function: Sets/gets the sigocc Returns : Returns the sigocc for this atom Args : reference to an Atom sigtemp() Title : sigtemp Usage : $sigtemp = $atom->sigtemp($sigtemp) Function: Sets/gets the sigtemp Returns : Returns the sigtemp for this atom Args : reference to an Atom aniso() Title : aniso Usage : $u12 = $atom->aniso("u12", $u12) Function: Sets/gets the anisotropic temperature factors Returns : Returns the requested factor for this atom Args : reference to an Atom, name of the factor, value for the factor id() Title : id Usage : $atom->id("CZ2") Function: Gets/sets the ID for this atom Returns : the ID Args : the ID _remove_residue() Title : _remove_residue Usage : Function: Removes the Residue this Atom is atttached to. Returns : Args : _grandparent() Title : _grandparent Usage : Function: get/set a symbolic reference to our grandparent Returns : Args : perl v5.14.2 2012-03-02 Bio::Structure::Atom(3pm)
All times are GMT -4. The time now is 03:29 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy