join on a file with multiple lines, fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting join on a file with multiple lines, fields
# 1  
Old 12-14-2008
join on a file with multiple lines, fields

I've looked at the join command which is able to perform what I need on two rows with a common field, however if I have more than two rows I need to join all of them.

Thus I have one file with multiple rows to be joined on an index number:

Code:
1 randomtext1
2 rtext2
2 rtext3
3 rtext4
3 rtext5
3 rtext6
.
.
.

I need this:
Code:
1 randomtext1
2 rtext2 rtext3
3 rtext4 rtext5 rtext6
.
.
.

where the repeated index number may have an arbitrary number of rtext's to be pivoted to columns for that index number. The number suffixes on rtext are only there for clarity.

I've considered using uniq to extracting the repeated text and using the results from that to remove that specific line in awk, then just reiterate this process in a script and use 'join', but I'm convinced there is a complete, easier solution.

updates: I have progressed to a solution with gawk... now I am having a problem with losing the order of rows as columns.
Code:
BEGIN {
FS=" ";
maxFLD=2;

getline var_tempnew
split(var_tempnew, array_temp, " ")
}
{

if ($1 == array_temp[1]){

	do
	{
	$0 = $0 " " array_temp[2]
	getline var_tempnew
	split(var_tempnew, array_temp, " ")
	} while ($1 == array_temp[1]) 
	print $0
	
} else {
    print var_tempnew
	var_tempnew = $0
	split(var_tempnew, array_temp, " ")
}
}
END {
print var_tempnew
}


Last edited by crimper; 12-14-2008 at 04:20 AM.. Reason: progress updates
# 2  
Old 12-14-2008
Given you have GNU AWK, you may try something like this:

Code:
WHINY_USERS=unix.com awk '
END {
  for (i in _) 
    printf "%d %s\n", i, _[i]
  }
{
  idx = sprintf("%.10d", $1)
  _[idx] = _[idx] ? _[idx] FS $2 : $2
  }' infile

Or you want to preserve the original order of $1 which is not numeric?

Last edited by radoulov; 12-14-2008 at 10:05 AM..
# 3  
Old 12-14-2008
if you try to preserve the original order, you can try this
Code:
awk '{
  if (cnt[$1] ++ == 0)
    idx[j++] = $1
  rst[$1] = $2 " " rst[$1]
}
END {
  for (i = 0; i < j; i ++)
    print idx[i], rst[idx[i]]
}'

# 4  
Old 12-14-2008
Both solutions seems to work, however there are caveats for each one given I wasn't very specific on the requirements.
new data set:
Code:
10a textile
b10b wtf
b20b omg
b20b woot
20b teasdf ha
20b tesdf2 he
30c woot1
30c woot2

radoulov solutions works, though it assumes integers. I like the simplicity so I've tried modifying this code to work for non-integer indices, and it seems to lose the row order significantly(row to column transposed order is OK though):
Code:
END {
  for (i in _) 
    printf "%s\n", _[i]
  }
{
  _[$1] = _[$1] ? _[$1] FS $0 : $0
}

This implies a problem with the string as array indices, but ivhb's solution uses the same feature.

Code:
{
  if (cnt[$1] ++ == 0)
  {
    idx[j++] = $1
     rst[$1] = $0
  } else {
     rst[$1] = rst[$1] FS $0 
  }
}
END {
  for (i = 0; i < j; i ++)
    print rst[idx[i]]
}

This works OK as modified.

Final script FTW:
Code:
{
  if (cnt[$1] ++ == 0)
  {
    idx[j++] = $1
     rst[$1] = $0
  } else {
    vindex = $1
     sub(/^[[:alnum:]_]+ \y/, "") 
     rst[vindex] = rst[vindex] FS $0 
  }
}
END {
  for (i = 0; i < j; i ++)
    print rst[idx[i]]
    #print idx[i], rst[idx[i]]
}

The only problem remains is that the space is assumed as the FS in the regexp, is there a way to escape the FS in a regexp?
# 5  
Old 12-15-2008
Code:
awk '{_[$1]=sprintf("%s %s",_[$1],$2)}
END{
for(i in _)
      print i" "_[i]
}' filename

# 6  
Old 12-15-2008
Quote:
Originally Posted by crimper
[...]
The only problem remains is that the space is assumed as the FS in the regexp, is there a way to escape the FS in a regexp?
What should be the desired output, given the sample data above?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join multiple lines from text file

Hi Guys, Could you please advise how to join multiple details lines into single row, with HEADER 1 as the record separator and comma(,) as the field separator. Input: HEADER 1, HEADER 2, HEADER 3, 11,22,33, COLUMN1,COLUMN2,COLUMN3, AA1, BB1, CC1, END: ABC HEADER 1, HEADER 2,... (3 Replies)
Discussion started by: budz26
3 Replies

2. Shell Programming and Scripting

Join files on multiple fields

Hello all, I want to join 2 tabbed files on the first 2 fields, and filling the missing values with 0. The 3rd column in each file is constant for the entire file. file1 12658699 ST5 XX2720 0 1 0 1 53039541 ST5 XX2720 1 0 1.5 1 file2 ... (6 Replies)
Discussion started by: sheetalk
6 Replies

3. Shell Programming and Scripting

Join fields in a same file based on condition

I have an input file like this... All iI want to do is If the lines are identical except for the last field i want to merge them into single line input_file I feel something is nothing I feel something is everything apple mango banana apple mango grapes I want to get output like this:... (3 Replies)
Discussion started by: raj_k
3 Replies

4. Shell Programming and Scripting

Join common patterns in multiple lines into one line

Hi I have a file like 1 2 1 2 3 1 5 6 11 12 10 2 7 5 17 12 I would like to have an output as 1 2 3 5 6 10 7 11 12 17 any help would be highly appreciated Thanks (4 Replies)
Discussion started by: Harrisham
4 Replies

5. Shell Programming and Scripting

Join multiple lines

Hi I have a source file ( written i C ) where a funtion call is spread over multiple lines, for example : func( a, b, c ); I want this to be joined into one single line : func(a,b,c); How can this be done with awk and sed ? Regards. Hench (2 Replies)
Discussion started by: hench
2 Replies

6. Shell Programming and Scripting

Join fields from files with duplicate lines

I have two files, file1.txt: 1 abc 2 def 2 dgh 3 ijk 4 lmn file2.txt 1 opq 2 rst 3 uvw My desired output is: 1 abc opq 2 def rst 2 dgh rst 3 ijk uvw (2 Replies)
Discussion started by: xan.amini
2 Replies

7. UNIX for Dummies Questions & Answers

Need help with Join on multiple fields

Hi, I need help with the join command I have 2 files that I want to join on multiple fields. I want to return all records from file 1 I also want empty fields in my joined file if there isn't a match in file 2 I have already sorted them so I know they are in the same order. file1 ... (0 Replies)
Discussion started by: shunter0810
0 Replies

8. Shell Programming and Scripting

How to use SED to join multiple lines?

Hi guys, anyone know how can i join multiples lines using sed till the end of a file and output to another file in a single line? The end of each line will be replaced with a special char "#". I am using the below SED command, however it seems to remove the last 2 lines. Also not all lines... (12 Replies)
Discussion started by: DrivesMeCrazy
12 Replies

9. Shell Programming and Scripting

Awk Join multiple lines

Hi, I have data with broken lines: Sample data: "12"|"25"|"a"|"b"|"c"|"d"|"e"|"f"|"2453748"|"08:10:50" "16"|"25"|"a"|"b"|"c"|"d"|"e"|"f"|" 2453748"|"08:15:50" "16"|"25"|"a"|"b"|" c"|"d"|"e"|"f"|"2453748"|"08:19:50" "16"|"25"|"a"|"b"|"c"|"d"|"e"|"f"|"2453748"|"08:19:50" In the... (5 Replies)
Discussion started by: hitmansilentass
5 Replies

10. Shell Programming and Scripting

join on multiple fields

Is it possible to do a join on multiple fields of two files? I am trying to do something like join -t, -1 2,3 -2 2,3 -o 2.1,2.2,2.3,1.3 filea fileb I want the join to be on columns 2 and 3 of filea and columns 2 and 3 of fileb. What is hapenning is that the second file that I want to do the join... (1 Reply)
Discussion started by: reggiej
1 Replies
Login or Register to Ask a Question