I have 150 files with 4 columns each but variable row lengths that I need to combine by column. I do not have any common column. I want to use "paste " command in unix to do it but before that I have to get all my files to be of equal length.
Is there a way using awk or sed to fill up n no. of rows (where n is the length of the biggest file) in each of the smaller files with zeros (0) to make all files of equal length?
Below are examples of my input files:
I want to use "paste " command in unix to do it but before that I have to get all my files to be of equal length.
No, you don't. You can use paste on files of different lengths and then replace empty fields with the desired default value. In my opinion, that's the easiest approach.
No, you don't. You can use paste on files of different lengths and then replace empty fields with the desired default value. In my opinion, that's the easiest approach.
Regards,
Alister
Hi Alister,
Thanks for your reply. I wanted to do exactly what you suggested but after running paste I realized that if we have two files of different row lengths, then this is what it does to it:
% paste file1 file2 > file3
If the content of file1 is:
1
2
3
and file2 is:
a
b
c
d
the resulting file3 would be:
1 a
2 b
3 c
d
That is why I wanted to make the row lengths of my files equal.
Am I right?
---------- Post updated at 06:29 AM ---------- Previous update was at 06:28 AM ----------
Quote:
Originally Posted by yazu
If your really want to get a file with 600 columns and with some arbitrary 0-s in missed columns, this pipe may work:
Hi Yazu,
Thanks for your reply. I want to make my row lengths same, my column lengths are equal.
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.
If any of those assumptions is invalid, only minor changes are required.
This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.
Thanks for the code. The only issue is I am trying to make the row lengths equal as my column lengths are already equal.
I have tried to modify your code but I still havenot reached the result I need.
My file with the longest row length has 3581 rows, so I have done the following:
awk 'NR< 3581 {for (i=NR+1;i<=3581;i++) $NR=$NR OFS "0"}1' OFS="\t" infile
However what I get when I do this is the following:
01-16A1-325 0 0 0 0 0 0 0 0 0 01-16A1-325 01-16A1-325 0 0 0 0 0 0 0 0 A T G 0 0 0 0 0 0 0 11 47 0 1 0 0 0 0 0 0 11 47 0 0
0 0 0 0 0 11 48 0 0
0 0 0 0 12 50 0 0
0 0 0 12 53 0 0
0 0 13 56 0 0
0 13 60 0 0
Can you help me modify the code correctly?
Thanks
---------- Post updated at 07:44 AM ---------- Previous update was at 07:31 AM ----------
Quote:
Originally Posted by alister
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.
If any of those assumptions is invalid, only minor changes are required.
This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.
Regards,
Alister
Hi Alister,
Your assumptions are correct, but I am not sure am getting the correct format after running your code. I want to make my row lengths equal because I have for eg. 3 files with different row lenghts, file 1: 3581 rows, file2: 3578 rows and file3: 3508.
All I want to do is make the row lengths equal and then use paste to combine them. All files have 4 columns. So I am wondering if awk '{$1=$1; for(i=1; i<=NF; i++) is appropriate. If I know correctly it should be NR instead of NF?
Thanks again
---------- Post updated at 08:07 AM ---------- Previous update was at 07:44 AM ----------
Quote:
Originally Posted by alister
You have not stated anything specific about your file format except for the number of columns, so I have made the following assumptions: The 4 column input files are tab-delimited. All whitespace below consists of tabs, not spaces. Also, I'm assuming that the colon character, :, does not appear in any of the input.
If any of those assumptions is invalid, only minor changes are required.
This can be simplified a tiny little bit if there are never any null fields in the input data. If that's the case, then the delimiter used by paste can be the same as the delimiter used by the input data. Since there would never be any need to distinguish between a null input data field and paste simulating an empty line in a source file.
Regards,
Alister
Hi Alister,
Your code worked like a charm for me! Please ignore my other post that mentioned that I was not getting the correct format. Thanks a lot. Also thanks for this great forum that helps so many techies when they need it the most..
Hi Team,
I have a different length records in my text file.I would like to make all the records with same length. I want to check the maximum lenth and all other records make the same length
It's urgent request.
Thanks in Advance (2 Replies)
Hello,
I have a file with below content - Example
3
6
69
139
210
345
395
418
490
492
I would like the result as - Multiple of 70 or nearest number in the file less than the multiple of 70
69
139 (5 Replies)
Hi All,
I would need your assistance to make a bootable USB with SUSE LINUX Enterprise Server
I have already downloaded relevant OS (Trail Version) packages @
1) SLES-11-SP4-DVD-i586-GM-DVD1
2) SLES-11-SP4-DVD-i586-GM-DVD2
when I tried to open these packages with PowerISO one of the... (7 Replies)
Hello to all,
I'm trying to format a file to have all lines with the same length (the length of the longest line) adding needed extra spaces at the end.
Currently I have the awk script below that adds one space the end of each that have a lenght lower than 35, but I don't know
how to add... (3 Replies)
I want to make a script to read row by row and find its length. If the length is less than my required length then i hav to append spaces to that paritucular row. Each row contains special characters, spaces, etc.
For example my file contains ,
12345 abcdef
234 abcde
89012 abcdefgh
... (10 Replies)
Hello Everyone,
I am stuck with one issue while working on abstract flat file which i have to use as input and load data to table.
Input Data-
------ ------------------------ ---- -----------------
WFI001 Xxxxxx Control Work Item A Number of Records
------ ------------------------... (5 Replies)
Hello everyone,
The following are my input files.
The following are my sequence of steps.
Can someone please let me know about how to make these bunch of steps into a single script so that I start the script with 1.txt and 2.txt, after execution gives me the final... (11 Replies)
Hi Guys,
I was going some trial and error to see if I can find the longest word in a text.
I was using Pipes because they are easier to use in this case.
I was stuck on this for a while so I thought I'll get some help with it.
I tried this code to separate all the words in a text in... (4 Replies)
Hey everybody, I have a script for making a string substitution in a file. I am trying to modify it in order to make the same modifcation to multiples files. here is what I have so far.
#!/bin/csh
set p1="$1"
shift
set p2="$1"
shift
foreach x ($*)
if ( { grep -w -c "$p1" $x } ) then
mv... (7 Replies)
Very, very new to unix scripting and have a unique situation. I have a file of records that contain 3 records types:
(H)eader Records
(D)etail Records
(T)railer Records
The Detail records are 82 bytes in length which is perfect. The Header and Trailer records sometimes are 82 bytes in... (3 Replies)