Make multiple files of equal length


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Make multiple files of equal length
# 1  
Old 08-15-2011
Make multiple files of equal length

I have 150 files with 4 columns each but variable row lengths that I need to combine by column. I do not have any common column. I want to use "paste " command in unix to do it but before that I have to get all my files to be of equal length.

Is there a way using awk or sed to fill up n no. of rows (where n is the length of the biggest file) in each of the smaller files with zeros (0) to make all files of equal length?
Below are examples of my input files:
HTML Code:
01-16A1-325     01-16A1-325     01-16A1-325     01-16A1-325
01-16A1-325     01-16A1-325     01-16A1-325     01-16A1-325
A     T     G     C
11     47     0     1
11     47     0     0
11     48     0     0
12     50     0     0
12     53     0     0
13     56     0     0
13     60     0     0
13     62     0     0
13     63     0     0
13     64     0     0
13     66     0     0
14     68     0     0
14     70     0     0
14     72     0     0
Thanks
# 2  
Old 08-15-2011
Quote:
Originally Posted by manishabh
I want to use "paste " command in unix to do it but before that I have to get all my files to be of equal length.
No, you don't. You can use paste on files of different lengths and then replace empty fields with the desired default value. In my opinion, that's the easiest approach.

Regards,
Alister
# 3  
Old 08-15-2011
Code:
awk 'NF<4 {for (i=NF+1;i<=4;i++) $NF=$NF OFS "0"}1' OFS="\t" infile

# 4  
Old 08-16-2011
If your really want to get a file with 600 columns and with some arbitrary 0-s in missed columns, this pipe may work:
Code:
grep . *
one.t:0 9 8 7
three.t:q w e r
three.t:a s d f
three.t:z x y v
two.t:1 2 3 4
two.t:5 6 7 8

find . -type f | while read f; do
  wc -l "$f"
done | sort -nr -k1,1 | cut -d' ' -f2- | 
xargs paste -d' ' | awk '
  NR == 1 { N=NF }
  NR != 1 { for (i=1; i<=N; i++) if (!$i) $i=0 }
  1'
q w e r 1 2 3 4 0 9 8 7
a s d f 5 6 7 8 0 0 0 0
z x y v 0 0 0 0 0 0 0 0

# 5  
Old 08-16-2011
Quote:
Originally Posted by alister
No, you don't. You can use paste on files of different lengths and then replace empty fields with the desired default value. In my opinion, that's the easiest approach.

Regards,
Alister
Hi Alister,

Thanks for your reply. I wanted to do exactly what you suggested but after running paste I realized that if we have two files of different row lengths, then this is what it does to it:
% paste file1 file2 > file3
If the content of file1 is:
1
2
3
and file2 is:
a
b
c
d
the resulting file3 would be:
1 a
2 b
3 c
d



That is why I wanted to make the row lengths of my files equal.
Am I right?




---------- Post updated at 06:29 AM ---------- Previous update was at 06:28 AM ----------

Quote:
Originally Posted by yazu
If your really want to get a file with 600 columns and with some arbitrary 0-s in missed columns, this pipe may work:
Code:
grep . *
one.t:0 9 8 7
three.t:q w e r
three.t:a s d f
three.t:z x y v
two.t:1 2 3 4
two.t:5 6 7 8

find . -type f | while read f; do
  wc -l "$f"
done | sort -nr -k1,1 | cut -d' ' -f2- | 
xargs paste -d' ' | awk '
  NR == 1 { N=NF }
  NR != 1 { for (i=1; i<=N; i++) if (!$i) $i=0 }
  1'
q w e r 1 2 3 4 0 9 8 7
a s d f 5 6 7 8 0 0 0 0
z x y v 0 0 0 0 0 0 0 0

Hi Yazu,

Thanks for your reply. I want to make my row lengths same, my column lengths are equal.
# 6  
Old 08-16-2011
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.

If any of those assumptions is invalid, only minor changes are required.

Code:
$ cat c1
1       2       3       4
$ cat c2
5       6       7       8
5       6       7       8
$ cat c3
9       10      11      12
9       10      11      12
9       10      11      12
$ paste -d: c1 c2 c3 | awk '{$1=$1; for(i=1; i<=NF; i++) if(!length($i)) $i=0 OFS 0 OFS 0 OFS 0}1' FS=: OFS=\\t          
1       2       3       4       5       6       7       8       9       10      11      12
0       0       0       0       5       6       7       8       9       10      11      12
0       0       0       0       0       0       0       0       9       10      11      12

This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.

Regards,
Alister

Last edited by alister; 08-16-2011 at 12:34 PM..
# 7  
Old 08-16-2011
Quote:
Originally Posted by rdcwayx
Code:
awk 'NF<4 {for (i=NF+1;i<=4;i++) $NF=$NF OFS "0"}1' OFS="\t" infile

Thanks for the code. The only issue is I am trying to make the row lengths equal as my column lengths are already equal.
I have tried to modify your code but I still havenot reached the result I need.
My file with the longest row length has 3581 rows, so I have done the following:

awk 'NR< 3581 {for (i=NR+1;i<=3581;i++) $NR=$NR OFS "0"}1' OFS="\t" infile
However what I get when I do this is the following:
01-16A1-325 0 0 0 0 0 0 0 0 0 01-16A1-325 01-16A1-325 0 0 0 0 0 0 0 0 A T G 0 0 0 0 0 0 0 11 47 0 1 0 0 0 0 0 0 11 47 0 0
0 0 0 0 0 11 48 0 0

0 0 0 0 12 50 0 0


0 0 0 12 53 0 0



0 0 13 56 0 0




0 13 60 0 0






Can you help me modify the code correctly?

Thanks

---------- Post updated at 07:44 AM ---------- Previous update was at 07:31 AM ----------

Quote:
Originally Posted by alister
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.

If any of those assumptions is invalid, only minor changes are required.

Code:
$ cat c1
1       2       3       4
$ cat c2
5       6       7       8
5       6       7       8
$ cat c3
9       10      11      12
9       10      11      12
9       10      11      12
$ paste -d: c1 c2 c3 | awk '{$1=$1; for(i=1; i<=NF; i++) if(!length($i)) $i=0 OFS 0 OFS 0 OFS 0}1' FS=: OFS=\\t          
1       2       3       4       5       6       7       8       9       10      11      12
0       0       0       0       5       6       7       8       9       10      11      12
0       0       0       0       0       0       0       0       9       10      11      12

This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.

Regards,
Alister
Hi Alister,

Your assumptions are correct, but I am not sure am getting the correct format after running your code. I want to make my row lengths equal because I have for eg. 3 files with different row lenghts, file 1: 3581 rows, file2: 3578 rows and file3: 3508.
All I want to do is make the row lengths equal and then use paste to combine them. All files have 4 columns. So I am wondering if awk '{$1=$1; for(i=1; i<=NF; i++) is appropriate. If I know correctly it should be NR instead of NF?
Thanks again

---------- Post updated at 08:07 AM ---------- Previous update was at 07:44 AM ----------

Quote:
Originally Posted by alister
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.

If any of those assumptions is invalid, only minor changes are required.

Code:
$ cat c1
1       2       3       4
$ cat c2
5       6       7       8
5       6       7       8
$ cat c3
9       10      11      12
9       10      11      12
9       10      11      12
$ paste -d: c1 c2 c3 | awk '{$1=$1; for(i=1; i<=NF; i++) if(!length($i)) $i=0 OFS 0 OFS 0 OFS 0}1' FS=: OFS=\\t          
1       2       3       4       5       6       7       8       9       10      11      12
0       0       0       0       5       6       7       8       9       10      11      12
0       0       0       0       0       0       0       0       9       10      11      12

This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.

Regards,
Alister
Hi Alister,

Your code worked like a charm for me! Please ignore my other post that mentioned that I was not getting the correct format. Thanks a lot. Also thanks for this great forum that helps so many techies when they need it the most..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Make it to fixed length

Hi Team, I have a different length records in my text file.I would like to make all the records with same length. I want to check the maximum lenth and all other records make the same length It's urgent request. Thanks in Advance (2 Replies)
Discussion started by: Anthuvan
2 Replies

2. Shell Programming and Scripting

Checking File record equal to multiple of 70 or nearest number to multiple of 70

Hello, I have a file with below content - Example 3 6 69 139 210 345 395 418 490 492 I would like the result as - Multiple of 70 or nearest number in the file less than the multiple of 70 69 139 (5 Replies)
Discussion started by: Mannu2525
5 Replies

3. SuSE

How To make bootable USB with multiple ISO Files?

Hi All, I would need your assistance to make a bootable USB with SUSE LINUX Enterprise Server I have already downloaded relevant OS (Trail Version) packages @ 1) SLES-11-SP4-DVD-i586-GM-DVD1 2) SLES-11-SP4-DVD-i586-GM-DVD2 when I tried to open these packages with PowerISO one of the... (7 Replies)
Discussion started by: Leaner_963
7 Replies

4. Shell Programming and Scripting

How to add extra spaces to make all lines the same length?

Hello to all, I'm trying to format a file to have all lines with the same length (the length of the longest line) adding needed extra spaces at the end. Currently I have the awk script below that adds one space the end of each that have a lenght lower than 35, but I don't know how to add... (3 Replies)
Discussion started by: Ophiuchus
3 Replies

5. Shell Programming and Scripting

Append spaces the rows to make it into a required fixed length file

I want to make a script to read row by row and find its length. If the length is less than my required length then i hav to append spaces to that paritucular row. Each row contains special characters, spaces, etc. For example my file contains , 12345 abcdef 234 abcde 89012 abcdefgh ... (10 Replies)
Discussion started by: Amrutha24
10 Replies

6. Shell Programming and Scripting

Flat file-make field length equal to header length

Hello Everyone, I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table. Input Data- ------ ------------------------ ---- ----------------- WFI001 Xxxxxx Control Work Item A Number of Records ------ ------------------------... (5 Replies)
Discussion started by: sonali.s.more
5 Replies

7. Shell Programming and Scripting

Make multiple awk files into an executable

Hello everyone, The following are my input files. The following are my sequence of steps. Can someone please let me know about how to make these bunch of steps into a single script so that I start the script with 1.txt and 2.txt, after execution gives me the final... (11 Replies)
Discussion started by: jacobs.smith
11 Replies

8. UNIX for Dummies Questions & Answers

Display all the words whose length is equal to the longest word in the text

Hi Guys, I was going some trial and error to see if I can find the longest word in a text. I was using Pipes because they are easier to use in this case. I was stuck on this for a while so I thought I'll get some help with it. I tried this code to separate all the words in a text in... (4 Replies)
Discussion started by: bawse.c
4 Replies

9. Shell Programming and Scripting

How to make an editing script work for multiple files?

Hey everybody, I have a script for making a string substitution in a file. I am trying to modify it in order to make the same modifcation to multiples files. here is what I have so far. #!/bin/csh set p1="$1" shift set p2="$1" shift foreach x ($*) if ( { grep -w -c "$p1" $x } ) then mv... (7 Replies)
Discussion started by: iwatk003
7 Replies

10. Shell Programming and Scripting

Make variable length record a fixed length

Very, very new to unix scripting and have a unique situation. I have a file of records that contain 3 records types: (H)eader Records (D)etail Records (T)railer Records The Detail records are 82 bytes in length which is perfect. The Header and Trailer records sometimes are 82 bytes in... (3 Replies)
Discussion started by: jclanc8
3 Replies
Login or Register to Ask a Question