I've got a directory of about 6000 txt files that look like this:
I need to execute a command on them to combine them and, in the end, have one big file with all the needed columns taken form all the 6000 files. I've got the "combining" program, but my problem is that once I've combined the first two files that output file should be the input file for adding the third one and so on.
Here is a schematic:
I've created a list of all the files than need to be combined (named master-infile) and wrote this loop:
But the looping variables seem to be all wrong and all sorts of weird combinations of files get combined. I guess I need to get a count of output files? Any ideas?
Any help would be greatly appreciated!
Last edited by vbe; 08-20-2016 at 10:55 AM..
Reason: icode tags => code tags
That's quite an academic specification. How about showing some more details about the concatention of lines and columns therein? The logics behind it?
Do you know about the paste command?
Each of the files consisted of 7 columns: the first three columns are the same in each file and the other 4 columns are the actual data (but we need to use only the last three of those). The merging would mean to keep the first three columns and then add the the last three columns of each file matched by the first column. For example, infile1 could be
infile2 could be:
infile3 could be
then the after the merging step1, the outfile1 would look like like this:
then after adding infile3, the outfile2 would look like this:
I hope this clears things up.
Many thanks!
Last edited by vbe; 08-20-2016 at 10:54 AM..
Reason: code tags please not icode
Hi zajtat,
If combining.executable infile1 infile2 > outfile1 produces the output you said it does, then running the command:
won't give you the output you said you want (it would put the data from infile3 before the data from end of the infile2 in outfile2 discarding everything from infile1, wouldn't it); wouldn't you need to use:
instead???
Do all of the ~6000 text files you want to combine have names that end in .txt (or some other single filename extension)? Are there any other files in that directory that have names that end in the same filename extension? If the 1st answer is yes and the 2nd answer is no, why do you need master-infile? Why not just use for i in *.txt? And, no, you don't want nested loops for this. What you have shown us in post #1 will combine two copies of infile1 into a files named 1 through 6000, and then for each of your other input files overwrite each of those numbered files with a combination of two copies of the next file in master-infile keeping only the 6000 copies of the combination of two copies of the last input file named in master-infile.
Is my interpretation of the commands you need to run correct? If not, please explain more clearly what arguments you are trying to pass to the command combining.executable.
Do you really want to keep outfile1 through outfile6000, or do you just want one outfile to be the combination of the 6000 input files? (Putting 12000 files in a single directory is usually a great way to slow down processing any files in that directory.)
Are you always processing exactly 6000 files, or do you just want to combine all of your text files (either based on a filename matching pattern or on the list of files in master-infile) no matter whether than is two files or a thousand files?
What is the format of the real names of your input files?
What name do you really want for your output file(s)?
The syntax for the combining.executable is correct and it would not put the data from infile3 in front of the infile2 in outfile2. The excitable will add columns from infile3 to the end of outfile1. I've used the syntax to illustrate that the outcome of one command should be the input for the other.
I'm sorry, but your interpretation of the commands is not correct. We do not need to keep outfile1 through outfile6000, we just need outfile1 to be input for the executable in order to generate outfile2, then outfile1 can be deleted. Outfile2 will be the input for the executable to generate outfile3. Then outfile2 can be deleted and outfile3 can be used as input to generate outfile4, etc.
I may not need a master file or the loop, it was just the way I tried to solve the problem. But it does not necessarily have to be the case. The point is that I need to run a program (executable) where the output of step1 is the input of step2, and the output of step2 is the input for step 3, etc.
The format of my input files is text. They are the only ones in the folder.
The name of the output files does not matter.
The syntax for the combining.executable is correct and it would not put the data from infile3 in front of the infile2 in outfile2. The excitable will add columns from infile3 to the end of outfile1. I've used the syntax to illustrate that the outcome of one command should be the input for the other.
I'm sorry, but your interpretation of the commands is not correct. We do not need to keep outfile1 through outfile6000, we just need outfile1 to be input for the executable in order to generate outfile2, then outfile1 can be deleted. Outfile2 will be the input for the executable to generate outfile3. Then outfile2 can be deleted and outfile3 can be used as input to generate outfile4, etc.
In post #3 in this thread, you said that the command:
puts the entire contents of lines from infile1 (the 1st input file operand) followed by the the last three fields of matching lines from infile2 (the 2nd input file operand) into outfile1.
Then you said that the command:
changes its behavior completely and puts the entire contents of lines from outfile1 (the 2nd input file operand) followed by the last three fields of matching lines from infile3 (the 1st input file operand) into outfile2.
Are you absolutely positive that combining.executable magically knows that the behavior should be different for the 1st pair of files being combined than it is for every other pair of files it is asked to combine?
Quote:
I may not need a master file or the loop, it was just the way I tried to solve the problem. But it does not necessarily have to be the case. The point is that I need to run a program (executable) where the output of step1 is the input of step2, and the output of step2 is the input for step 3, etc.
The format of my input files is text. They are the only ones in the folder.
The name of the output files does not matter.
I understand the concept of running a program repeatedly with the output of subsequent invocations being one of the inputs from the previous invocation. And that isn't hard to do; it just can't be done with the nested loops you showed us. If you answer the questions I posed above and the remaining questions I asked in my previous post, I think we will easily be able to suggest something that will work for you.
Therefore, I repeat: What is the format of the real names of your input files? (Please show us the actual names of your 1st input file and your last input file.) If all files don't have the same text with just a number that changes from file to file, we need to have a filename matching pattern that will match all of the input filenames you want to process and a way to determine the order in which those files should be processed.
And, is the number of input files a constant?
And, what name do you really want for your output file?
The executable command takes the columns of file specified in argument 1 (infile1) and adds them to the file specified in argument 2 (infile2). It creates a new file that starts with columns from infile2, followed by columns from infile1. So, the first command line is:
combining.executable infile1 infile2 > outfile1
This will take infile2 as the basis and add columns to it from infile1. The outfile1 will start with columns from infile2 and finish with columns from infile1.
The we run the following command:
combining.executable infile3 outfile1 > outfile2
This will take the columns from the file specified as argument 1 (infile3) and add them to the file specified in the argument 2 (outfile1). It is the same procedure as in step1, so the program does the same thing, nothing changes. The resulting outfile2 will start with columns from outfile1 and finish with columns from infile3. So, all 3 files are combined and nothing is deleted/replaced.
For the next step, we'll need outfile2 and infile4. So, we can delete the outfile1 as its info is now in outfile2. And so on...
This is your next question:
What is the format of the real names of your input files? (Please show us the actual names of your 1st input file and your last input file.)
The answer: the files are simple text files, here is the name of my first input file:
9464294024_R01C01header
and the last file name is:
9479475073_R12C02header
here is a small subsample of the files:
Next question: And, is the number of input files a constant?
I'm not sure I understand this question. There is exactly 5427 files than need to be combined into one.
Next question: And, what name do you really want for your output file?
The name of the output file does not really matter for me. It can be anything as long as it contains the columns of all the 5427 files combined into one text file.
Many thanks!
Last edited by Don Cragun; 08-20-2016 at 09:53 PM..
Reason: Change ICODE tags to CODE tags for long data lines.
Hi,
I have some output from 'ls' command and I want to loop over the output in a bash script. What would be a good way to go about it?
For example, if the output of the ls command gives me 'prefix1 prefix2 prefix3', how can I set a loop that will iterate over these?
many thanks! (5 Replies)
Hi all,
I am trying to loop through the string contents of an array, to add it during the saving of the output files. I am trying this code to print each column and save it to unique file name, but it doesn't work. Thanks for any help.
fnam=(japan usa uk)
alldata.dat contained sample data... (1 Reply)
I have been trying this program for a long time. I am trying to read a file named "odon" line by line; read the first line, send it to do a command saved in a file "perm", once the first line has finished going through the content of the file perm, the result is saved with the number of the line.... (17 Replies)
Legends,
I want to remain in the script until user passes the correct name.
I had tried the below code; but it didn't work out.
Please help
echo "\nPlease enter the source system: \c"
while read SYSTEM_NAME
do
if ];
then
echo "\nMaking $SYSTEM_NAME as source system for particular... (5 Replies)
Hi friends,
I'm newbie to SVM.
Just wanna try installed it on one of our server (to do mirroring for disk0 and disk1) but i think im lost until now. :(
the steps i've taken is as below:-
1.prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
2.metadb -a -c 3 -f c1t0d0s7... (3 Replies)
Hi,
I have two files , one file with data file with attributes that need to be sent to another file to generate a predefined format.
Example:
File.txt
AP|{SSHA}VEEg42CNCghUnGhCVg==
APVG3|{SSHA}XK|"password"
AP3|{SSHA}XK|"This is test"
....
etc
---------
test.sh has... (1 Reply)
Hi there
could anybody point me in the right direction when it comes to looping through the output of a system command in perl (i.e. df -k) doing a test against each line to see if it matches?
for example if i have a df -k output like this and I wanted to grab the lines that matched "sda" or... (3 Replies)
Hi,
I am having an Input file .which is having a list of names.
comapring with our database , needs to write the out put in file called output.txt , format should be name--> country--->phone number
could you please help me..
thanks in advance (7 Replies)
This site has been very helpful thus far.. I thank you all in advance for sharing the knowledge. Let me get to it.
I am trying to write a very small script to take away from the boredom of doing the same thing over and over.
Everynow and again I have to get the hex value of a file using a... (2 Replies)