Looping through input/output


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Looping through input/output
# 1  
Old 08-20-2016
Looping through input/output

Hi,

I've got a directory of about 6000 txt files that look like this:

Code:
a b c d
e f g h
k l m n

I need to execute a command on them to combine them and, in the end, have one big file with all the needed columns taken form all the 6000 files. I've got the "combining" program, but my problem is that once I've combined the first two files that output file should be the input file for adding the third one and so on.
Here is a schematic:

Code:
combining.executable infile1 infile2 > outfile1
combining.executable infile3 outfile1 > outfile2
combining.executable infile4 outfile2 > outfile3
combining.executable infile5 outfile3 > outfile4
etc

I've created a list of all the files than need to be combined (named master-infile) and wrote this loop:

Code:
for i in $(cat master-infile);
do
   for((a=1;a<=6000;i++);
   do
      combining.executable ${i} ${i} > ${a}
   done
done

But the looping variables seem to be all wrong and all sorts of weird combinations of files get combined. I guess I need to get a count of output files? Any ideas?

Any help would be greatly appreciated!

Last edited by vbe; 08-20-2016 at 10:55 AM.. Reason: icode tags => code tags
# 2  
Old 08-20-2016
That's quite an academic specification. How about showing some more details about the concatention of lines and columns therein? The logics behind it?
Do you know about the paste command?
# 3  
Old 08-20-2016
Each of the files consisted of 7 columns: the first three columns are the same in each file and the other 4 columns are the actual data (but we need to use only the last three of those). The merging would mean to keep the first three columns and then add the the last three columns of each file matched by the first column. For example, infile1 could be

Code:
a b c 1 2 3 4
d e f 5 6 7 8
g h j 9 10 11 12

infile2 could be:

Code:
a b c 11 3 4 5 
g h j 9 8 7 6
d e f 1 2 3 4

infile3 could be

Code:
d e f 2 3 4 5
a b c 5 6 7 8
g h j 9 10 11 12



then the after the merging step1, the outfile1 would look like like this:

Code:
a b c 1 2 3 4 3 4 5
d e f 5 6 7 8 2 3 4
g h j 9 10 11 12 8 7 6

then after adding infile3, the outfile2 would look like this:

Code:
a b c 1 2 3 4 3 4 5 6 7 8
d e f 5 6 7 8 2 3 4 3 4 5
g h j 9 10 11 12 8 7 6 10 11 12

I hope this clears things up.

Many thanks!

Last edited by vbe; 08-20-2016 at 10:54 AM.. Reason: code tags please not icode
# 4  
Old 08-20-2016
Hi zajtat,
If combining.executable infile1 infile2 > outfile1 produces the output you said it does, then running the command:
Code:
combining.executable infile3 outfile1 > outfile2

won't give you the output you said you want (it would put the data from infile3 before the data from end of the infile2 in outfile2 discarding everything from infile1, wouldn't it); wouldn't you need to use:
Code:
combining.executable outfile1 infile3 > outfile2

instead???

Do all of the ~6000 text files you want to combine have names that end in .txt (or some other single filename extension)? Are there any other files in that directory that have names that end in the same filename extension? If the 1st answer is yes and the 2nd answer is no, why do you need master-infile? Why not just use for i in *.txt? And, no, you don't want nested loops for this. What you have shown us in post #1 will combine two copies of infile1 into a files named 1 through 6000, and then for each of your other input files overwrite each of those numbered files with a combination of two copies of the next file in master-infile keeping only the 6000 copies of the combination of two copies of the last input file named in master-infile.

Is my interpretation of the commands you need to run correct? If not, please explain more clearly what arguments you are trying to pass to the command combining.executable.

Do you really want to keep outfile1 through outfile6000, or do you just want one outfile to be the combination of the 6000 input files? (Putting 12000 files in a single directory is usually a great way to slow down processing any files in that directory.)

Are you always processing exactly 6000 files, or do you just want to combine all of your text files (either based on a filename matching pattern or on the list of files in master-infile) no matter whether than is two files or a thousand files?

What is the format of the real names of your input files?

What name do you really want for your output file(s)?
# 5  
Old 08-20-2016
Hi,

The syntax for the combining.executable is correct and it would not put the data from infile3 in front of the infile2 in outfile2. The excitable will add columns from infile3 to the end of outfile1. I've used the syntax to illustrate that the outcome of one command should be the input for the other.

I'm sorry, but your interpretation of the commands is not correct. We do not need to keep outfile1 through outfile6000, we just need outfile1 to be input for the executable in order to generate outfile2, then outfile1 can be deleted. Outfile2 will be the input for the executable to generate outfile3. Then outfile2 can be deleted and outfile3 can be used as input to generate outfile4, etc.

I may not need a master file or the loop, it was just the way I tried to solve the problem. But it does not necessarily have to be the case. The point is that I need to run a program (executable) where the output of step1 is the input of step2, and the output of step2 is the input for step 3, etc.

The format of my input files is text. They are the only ones in the folder.
The name of the output files does not matter.
# 6  
Old 08-20-2016
Quote:
Originally Posted by zajtat
Hi,

The syntax for the combining.executable is correct and it would not put the data from infile3 in front of the infile2 in outfile2. The excitable will add columns from infile3 to the end of outfile1. I've used the syntax to illustrate that the outcome of one command should be the input for the other.

I'm sorry, but your interpretation of the commands is not correct. We do not need to keep outfile1 through outfile6000, we just need outfile1 to be input for the executable in order to generate outfile2, then outfile1 can be deleted. Outfile2 will be the input for the executable to generate outfile3. Then outfile2 can be deleted and outfile3 can be used as input to generate outfile4, etc.
In post #3 in this thread, you said that the command:
Code:
combining.executable infile1 infile2 > outfile1

puts the entire contents of lines from infile1 (the 1st input file operand) followed by the the last three fields of matching lines from infile2 (the 2nd input file operand) into outfile1.

Then you said that the command:
Code:
combining.executable infile3 outfile1 > outfile2

changes its behavior completely and puts the entire contents of lines from outfile1 (the 2nd input file operand) followed by the last three fields of matching lines from infile3 (the 1st input file operand) into outfile2.

Are you absolutely positive that combining.executable magically knows that the behavior should be different for the 1st pair of files being combined than it is for every other pair of files it is asked to combine?
Quote:
I may not need a master file or the loop, it was just the way I tried to solve the problem. But it does not necessarily have to be the case. The point is that I need to run a program (executable) where the output of step1 is the input of step2, and the output of step2 is the input for step 3, etc.

The format of my input files is text. They are the only ones in the folder.
The name of the output files does not matter.
I understand the concept of running a program repeatedly with the output of subsequent invocations being one of the inputs from the previous invocation. And that isn't hard to do; it just can't be done with the nested loops you showed us. If you answer the questions I posed above and the remaining questions I asked in my previous post, I think we will easily be able to suggest something that will work for you.

Therefore, I repeat: What is the format of the real names of your input files? (Please show us the actual names of your 1st input file and your last input file.) If all files don't have the same text with just a number that changes from file to file, we need to have a filename matching pattern that will match all of the input filenames you want to process and a way to determine the order in which those files should be processed.

And, is the number of input files a constant?

And, what name do you really want for your output file?
# 7  
Old 08-20-2016
I'm sorry my explanations were not clear.

The executable command takes the columns of file specified in argument 1 (infile1) and adds them to the file specified in argument 2 (infile2). It creates a new file that starts with columns from infile2, followed by columns from infile1. So, the first command line is:

combining.executable infile1 infile2 > outfile1

This will take infile2 as the basis and add columns to it from infile1. The outfile1 will start with columns from infile2 and finish with columns from infile1.

The we run the following command:

combining.executable infile3 outfile1 > outfile2

This will take the columns from the file specified as argument 1 (infile3) and add them to the file specified in the argument 2 (outfile1). It is the same procedure as in step1, so the program does the same thing, nothing changes. The resulting outfile2 will start with columns from outfile1 and finish with columns from infile3. So, all 3 files are combined and nothing is deleted/replaced.

For the next step, we'll need outfile2 and infile4. So, we can delete the outfile1 as its info is now in outfile2. And so on...

This is your next question:

What is the format of the real names of your input files? (Please show us the actual names of your 1st input file and your last input file.)

The answer: the files are simple text files, here is the name of my first input file:

9464294024_R01C01header

and the last file name is:

9479475073_R12C02header

here is a small subsample of the files:

Code:
9464294024_R01C01header  9477371149_R12C02header  9477871078_R06C01header  9477875165_R03C01header  9477885102_R05C01header  9479475073_R10C02header
9464294024_R01C02header  9477371157_R01C01header  9477871078_R06C02header  9477875165_R03C02header  9477885102_R05C02header  9479475073_R11C01header

Next question: And, is the number of input files a constant?

I'm not sure I understand this question. There is exactly 5427 files than need to be combined into one.

Next question: And, what name do you really want for your output file?

The name of the output file does not really matter for me. It can be anything as long as it contains the columns of all the 5427 files combined into one text file.

Many thanks!

Last edited by Don Cragun; 08-20-2016 at 09:53 PM.. Reason: Change ICODE tags to CODE tags for long data lines.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Looping over output of 'ls'

Hi, I have some output from 'ls' command and I want to loop over the output in a bash script. What would be a good way to go about it? For example, if the output of the ls command gives me 'prefix1 prefix2 prefix3', how can I set a loop that will iterate over these? many thanks! (5 Replies)
Discussion started by: pc2001
5 Replies

2. UNIX for Dummies Questions & Answers

Looping through the contents of array for output name

Hi all, I am trying to loop through the string contents of an array, to add it during the saving of the output files. I am trying this code to print each column and save it to unique file name, but it doesn't work. Thanks for any help. fnam=(japan usa uk) alldata.dat contained sample data... (1 Reply)
Discussion started by: ida1215
1 Replies

3. Shell Programming and Scripting

looping and saving output of each line separately

I have been trying this program for a long time. I am trying to read a file named "odon" line by line; read the first line, send it to do a command saved in a file "perm", once the first line has finished going through the content of the file perm, the result is saved with the number of the line.... (17 Replies)
Discussion started by: iconig
17 Replies

4. Shell Programming and Scripting

Looping through for user input

Legends, I want to remain in the script until user passes the correct name. I had tried the below code; but it didn't work out. Please help echo "\nPlease enter the source system: \c" while read SYSTEM_NAME do if ]; then echo "\nMaking $SYSTEM_NAME as source system for particular... (5 Replies)
Discussion started by: sdosanjh
5 Replies

5. Solaris

SVM Solaris 8 Problem. Metastat output looping

Hi friends, I'm newbie to SVM. Just wanna try installed it on one of our server (to do mirroring for disk0 and disk1) but i think im lost until now. :( the steps i've taken is as below:- 1.prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2 2.metadb -a -c 3 -f c1t0d0s7... (3 Replies)
Discussion started by: kronenose
3 Replies

6. Shell Programming and Scripting

Dynamic output file generation using a input text file with predefined output format

Hi, I have two files , one file with data file with attributes that need to be sent to another file to generate a predefined format. Example: File.txt AP|{SSHA}VEEg42CNCghUnGhCVg== APVG3|{SSHA}XK|"password" AP3|{SSHA}XK|"This is test" .... etc --------- test.sh has... (1 Reply)
Discussion started by: hudson03051nh
1 Replies

7. Shell Programming and Scripting

perl: looping through the output of a 'system' command

Hi there could anybody point me in the right direction when it comes to looping through the output of a system command in perl (i.e. df -k) doing a test against each line to see if it matches? for example if i have a df -k output like this and I wanted to grab the lines that matched "sda" or... (3 Replies)
Discussion started by: rethink
3 Replies

8. Shell Programming and Scripting

input -output file

Hi, I am having an Input file .which is having a list of names. comapring with our database , needs to write the out put in file called output.txt , format should be name--> country--->phone number could you please help me.. thanks in advance (7 Replies)
Discussion started by: hegdeshashi
7 Replies

9. UNIX for Advanced & Expert Users

input/Output settings

How can we view all the input/output settings of unix environment for a session (6 Replies)
Discussion started by: paritoshc
6 Replies

10. Shell Programming and Scripting

Using Output from one command as input to another

This site has been very helpful thus far.. I thank you all in advance for sharing the knowledge. Let me get to it. I am trying to write a very small script to take away from the boredom of doing the same thing over and over. Everynow and again I have to get the hex value of a file using a... (2 Replies)
Discussion started by: BkontheShell718
2 Replies
Login or Register to Ask a Question