Split files by pairwise combination


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split files by pairwise combination
# 1  
Old 12-06-2014
Split files by pairwise combination

I have 2 files

Code:
 
 $ cat  tmp
 A1 File1a B1 File1b
 A2 File2a B2 File2b
 A1 File1a B3 File3b


and

Code:
 
 $ cat  tmp1
 A1/B1 File3
 A1/B1 File4
 A1/B1 File5
 A1/B1 File6
 A1/B1 File7
 A2/B2 File8
 A2/B2 File9
 A2/B2 File10
 A2/B2 File11
 A2/B2 File12
 A2/B2 File13
 A1/B3 File14
 A1/B3 File15
 A1/B3 File16
 A1/B3 File17



I want to split the files having each A, B and A/B combinations , naming the files according to combinations also hardcoding the values A, B and A/B in a third column

Code:
 
 out_A1_B1
  
 A1 File1a A
 B1 File1b B
 A1/B1 File3 A/B
 A1/B1 File4 A/B
 A1/B1 File5 A/B
 A1/B1 File6 A/B
 A1/B1 File7 A/B
  
  
 out_A2_B2
  
 A2 File2a A
 B2 File2b B
 A2/B2 File8 A/B
 A2/B2 File9 A/B
 A2/B2 File10 A/B
 A2/B2 File11 A/B
 A2/B2 File12 A/B
 A2/B2 File13 A/B
  
 out_A1_B3
  
 A1 File1a A
 B3 File3b B
 A1/B3 File14 A/B
 A1/B3 File15 A/B
 A1/B3 File16 A/B
 A1/B3 File17 A/B





What is wrong with my try

Code:
 
 awk '{print $1 FS $2 FS "A""\n"$3 FS $4 FS "B" >> tmp_$1_$2 ; grep "$1/$3" tmp1 >> tmp_$1_$2 }' tmp  
  
 for file in tmp_*
 do
 awk 'NF==2{$3="A/B"; print $0}' $file > out-$file
 done


Last edited by senhia83; 12-06-2014 at 06:50 PM..
# 2  
Old 12-07-2014
You intermix awk and shell, which cannot work. Even if you'd get your approach running, like
Code:
awk    '       {print $1 FS $2 FS "A""\n"$3 FS $4 FS "B" >> "tmp_"$1"_"$3 
                 system ("grep "$1"/"$3 " file2 >> tmp_"$1"_"$3) }' file1

, you had the out_* files with field 3 missing, so overall you would have to run quite some programs in quite some processes, touching files several times, which is not too efficient.


Try
Code:
awk     'NR==FNR        {OUT[$1,$3]=$1 FS $2 FS substr($1,1,1) "\n"
                         OUT[$1,$3]=OUT[$1,$3] $3 FS $4 FS substr($3,1,1) "\n"
                         next}
                        {n=split ($1, T, "/")
                         OUT[T[1],T[2]]=OUT[T[1],T[2]] $1 FS $2 FS substr(T[1],1,1) "/" substr(T[2],1,1) "\n"
                        }
         END            {for (i in OUT) printf "%s",  OUT[i] > "out_" i}
        ' SUBSEP="_" file1 file2
cf out*
out_A1_B1:
A1 File1a A
B1 File1b B
A1/B1 File3 A/B
A1/B1 File4 A/B
A1/B1 File5 A/B
A1/B1 File6 A/B
A1/B1 File7 A/B
out_A1_B3:
A1 File1a A
B3 File3b B
A1/B3 File14 A/B
A1/B3 File15 A/B
A1/B3 File16 A/B
A1/B3 File17 A/B
out_A2_B2:
A2 File2a A
B2 File2b B
A2/B2 File8 A/B
A2/B2 File9 A/B
A2/B2 File10 A/B
A2/B2 File11 A/B
A2/B2 File12 A/B
A2/B2 File13 A/B

That would rum awk only once and also touch any file involved only once. Depending on data sizes, some decent memory might have to be allocated, though.

Last edited by RudiC; 12-07-2014 at 06:03 AM..
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Create 'n' number random pairwise combination of words

File 1 contains the list of words that needed to be randomly paired: Tiger Cat Fish Frog Dog Mouse Elephant Monkey File 2 contains the pairs that should not be used (in any solution) during random pairing. Elephant-Dog Cat-Fish Monkey-Frog Dog-Elephant, Fish-Cat, Frog-Monkey... (1 Reply)
Discussion started by: sammy777888
1 Replies

2. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

3. UNIX for Beginners Questions & Answers

Split and Rename Split Files

Hello, I need to split a file by number of records and rename each split file with actual filename pre-pended with 3 digit split number. What I have tried is the below command with 2 digit numeric value split -l 3 -d abc.txt F (# Will Produce split Files as F00 F01 F02) How to produce... (19 Replies)
Discussion started by: techedipro
19 Replies

4. Shell Programming and Scripting

Split and rename files

Hello, Need to split files into n number of files and rename the files Example: Input: transaction.txt.1aa transaction.txt.1ab ...... Output: transaction.txt.1 transaction.txt.2 transaction.txt.3 (3 Replies)
Discussion started by: krux_rap
3 Replies

5. Shell Programming and Scripting

Split files

Hi , I have 100 records in a.txt file Need to split the a.txt file in to 5 files 1ST File: ex: My file name should be a1.txt - line count in file should be 1 to 15 2ND File: ex: My file name should be a2.txt - line count in file should be 16 to 40 3ND File: ex: My file name... (1 Reply)
Discussion started by: satish1222
1 Replies

6. UNIX for Dummies Questions & Answers

Extract unique combination of rows from text files

Hi Gurus, I have 100 tab-delimited text files each with 21 columns. I want to extract only 2nd and 5th column from each text file. However, the values in both 2bd and 5th column contain duplicate values but the combination of these values in a row are not duplicate. I want to extract only those... (3 Replies)
Discussion started by: Unilearn
3 Replies

7. Shell Programming and Scripting

how to calculate all pairwise distances in two dimensions and transform them into a matrix

Hello to all, I am very new in the shell scripting and I need help. I have data for several individuals in several rows followed by a tag and by 5 values per row, with the name of the individual in the first column, e.g.: IND1 H1 12 13 12 15 14 IND2 H2 12 12 15 14 14 IND3 H1 12 15... (2 Replies)
Discussion started by: Bemar
2 Replies

8. Shell Programming and Scripting

Split a files into many files when condition

Hi Everyone, file.txt +++ a b c +++ d +++ asdf fefe fff Would like to have the output: file1.txt (22 Replies)
Discussion started by: jimmy_y
22 Replies

9. UNIX for Dummies Questions & Answers

to split gz files

Hi, I want to know how to split a gz file ( with out uncompressing it ) Eg:- split -b 10m file.gz (2 Replies)
Discussion started by: daptal
2 Replies

10. UNIX for Dummies Questions & Answers

split files into specified number of output files

Hi everyone, I have some large text files that I need to split into a specific number of files of equal size. As far as I know (and I don't really know that much :)) the split command only lets you specify the number of lines or bytes. The files are all of a different size, so the number of... (4 Replies)
Discussion started by: Migrainegirl
4 Replies
Login or Register to Ask a Question