You could do it also without sed, just shell parameter expansion...
Yes, but the original problem was: read a lot (~700k) files and extract only a certain part of line 3. Shell expansion can extract that part but it is not easy to interrupt the reading process after only 3 lines. Therefore i figured there must be a tradeoff between the preserved fork() of shell expansion and the lesser I/O the sed solution produces.
Which optimisation weighs heavier is probably different from system to system and depends on so many factors i didn't even try to take measurements. I could have, but the disks i have on all my systems all come from several EMC VMaxes (we even boot from LUNs via VIOS) and i doubt that thread-O/P has an I/O-subsystem capable of shoveling up to 700MB/s to/from the disks. This will, IMHO, have such a big impact on the tme it takes to read the 700k files that i could as well roll a dice.
Remember that we're processing a single directory containing 690,000 files. So, we have some constraints...
In theory for i in *.txt should work, but even though no exec is involved, we are still talking about a list of arguments that is probably well over 7.5Mb (and the shell will waste time sorting this list when the order in which the files are processed doesn't matter for this project).
I can't use:
because the behavior of find is undefined if the directory changes while find is reading it.
Invoking sed (or any other utility 690,000 times) to determine the directory to which a file should be moved will take forever. Similarly, invoking mv 690,000 times will take forever. We need to efficiently determine to which directory a list of files should move and move those files in large groups (not individually).
Once I have the list of files to go to a directory, I can use:
on OS X to minimize the number of needed invocations of mv.
Even if we do this in the shell using entirely shell built-ins to determine the directory to which a file should be moved, echo filename>> listN or printf filename>> listN will still be opening and closing the list files 690,000 times.
There will be more target directories than there are available open files in an awk script on OS X, but we don't know the maximum number of hyphens (nor the number of different values for the number of hyphens) in the 1st word of the 3rd line of these 690,000 files. (We do know that there can be at least 18 hyphens.) I think I can use a pipeline with:
and easily just read the first 3 lines of the files being processed and just open and close the list files once. The first stage of the above pipeline could also be replaced by the find command mentioned before and that would simplify the 1st awk script in the pipeline. (This could fairly easily be extended to let awk spawn more copies of itself to handle an unlimited number of open list files, but I don't think it will be needed for this project.)
I have a good start on this pipeline, but it will take me a while to finish the code and test it.
This User Gave Thanks to Don Cragun For This Post:
What if we execute commands one by one, will it be easier? There was a time I was executing many different command lines by writing them all in a text editor and saved it as Unix Executable File . I don't know what you all call this process but I found it out myself so I don't what this process is called.
So there was a time I did this to organise file by using mvfilename containing "x" into folder x with respect to the amount of x.
The script was like this [This script wasn't about this problem]
And I dragged that Unix Executable File into Terminal and hit Enter. Terminal executed all of them one by one.
My command is kind of amateur but I kind of managed to execute them in orderly manner.
At that time I had a million over files and these command could do the work.
I wasn't also sure how many x there could be in filename so I wrote the command as many as I thought it would be.
But of course, this is just executing filename so it was easy and I didn't expect executing file-content would so hard
No, you can't look at the names of files and magically guess how many hyphens are in the 3rd lines of those files. And, as noted before using find | ... | mv ... may miss files depending on filesystem type when you have a directory with this many files in it...
If I have correctly understood what you want to do, the following script will move *.txt files with names that do not contain any space characters from /Users/Nexeu/Documents/Dict to subdirectories under /Users/Nexeu/Documents/Syllable. The target directory and subdirectories will be created if they do not already exist. This script will give errors if you try to move files with more than 33 different values for the number of hyphens contained on the 3rd line. If you save the output containing those errors, extract lines from that output that start and end with a ' character, and feed those lines into a modified script that runs in $SRCDIR and just runs the 2nd awk script, it will create list files for another 17 target subdirectories and the last part of the script will use those list files to move those files into the proper target directories. Or, you can just run the entire script again to process up to 33 more different hyphen counts (but that will take longer if there are still lots of files to process).
When tested on a MacBook Pro running OS X Yosemite 10.10.3, it did what I expected with a couple of hundred files with 35 different hyphen counts. Obviously, it has not been tested in an environment with 690,000 files.
If you want it to provide a verbose list of the mv commands it uses while moving files from .../Dict to subdirectories of .../Syllable, uncomment the next to the last line in the script and comment out the line before that.
Good luck!
This User Gave Thanks to Don Cragun For This Post:
In post #5 in this thread you showed us that you were using the prompt:
I made the obviously bad assumption that that meant you were running OS X on a MacBook Pro (which has a BSD based xargs; not the GNU xargs you're using).
With the list files created by the current script, the following should finish the job for you:
This is untested code (I don't have a mv utility that has a -t option), but it should come close to doing what you need if you're using GNU xargs and rm utilities.
The first four lines of the output you showed us say that the files κ.txt, μci.txt, μg.txt, and μm.txt do not have the expected format:
on line 3.
This User Gave Thanks to Don Cragun For This Post:
Hi 2 all,
i have had AIX 7.2
:/# /usr/IBMAHS/bin/apachectl -v
Server version: Apache/2.4.12 (Unix)
Server built: May 25 2015 04:58:27
:/#:/# /usr/IBMAHS/bin/apachectl -M
Loaded Modules:
core_module (static)
so_module (static)
http_module (static)
mpm_worker_module (static)
... (3 Replies)
Hello.
System : opensuse leap 42.3
I have a bash script that build a text file.
I would like the last command doing :
print_cmd -o page-left=43 -o page-right=22 -o page-top=28 -o page-bottom=43 -o font=LatinModernMono12:regular:9 some_file.txt
where :
print_cmd ::= some printing... (1 Reply)
Hi everybody,
I am new at Unix/Bourne shell scripting and with my youngest experiences, I will not become very old with it :o
My code:
#!/bin/sh
set -e
set -u
export IFS=
optl="Optl"
LOCSTORCLI="/opt/lsi/storcli/storcli"
($LOCSTORCLI /c0 /vall show | grep RAID | cut -d " "... (5 Replies)
Okay, so I have a rather large text file and will have to process many more and this will save me hours of work.
I'm not very good at scripting, so bear with me please.
Working on Linux RHEL
I've been able to filter and edit and clean up using sed, but I have a problem with moving lines.
... (9 Replies)
How to use "mailx" command to do e-mail reading the input file containing email address, where column 1 has name and column 2 containing “To” e-mail address
and column 3 contains “cc” e-mail address to include with same email.
Sample input file, email.txt
Below is an sample code where... (2 Replies)
I have a bunch of random character lines like ABCEDFG. I want to find all lines with "A" and then change any "E" to "X" in the same line. ALL lines with "A" will have an "X" somewhere in it. I have tried sed awk and vi editor. I get close, not quite there. I know someone has already solved this... (10 Replies)
Hi ,
i have some files of specific pattern ...i need to look for files which are having size greater than zero and move those files to another directory..
Ex...
abc_0702,
abc_0709,
abc_782
abc_1234 ...etc
need to find out which is having the size >0 and move those to target directory..... (7 Replies)
strange :)
can you tell why?:cool:
#!/bin/bash
echo " enter your age "
read age
if ; then
echo " you do not have to pay tax "
elif ]; then
echo " you are eligible for income tax "
else
echo " you dont have to pay tax "
fi (3 Replies)
Hi,
I have line in input file as below:
3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL
My expected output for line in the file must be :
"1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL"
Can someone... (7 Replies)
Hi Friends,
Can any of you explain me about the below line of code?
mn_code=`env|grep "..mn"|awk -F"=" '{print $2}'`
Im not able to understand, what exactly it is doing :confused:
Any help would be useful for me.
Lokesha (4 Replies)