Combining many lines to one using awk or any unix cmd


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Combining many lines to one using awk or any unix cmd
# 8  
Old 05-06-2009
Quote:
Originally Posted by vgersh99
make sure your fields have no 'leading' and/or 'trailing spaces.
Otherwise:
Code:
BEGIN {
  FS=OFS=","
  SEP=" "
}
function trim(str)
{
    sub("^[ ]*", "", str);
    sub("[ ]*$", "", str);
    return str;
}
 
{
  idx=trim($1) OFS trim($2)
  idxA[idx]
  for(i=3; i<=NF; i++) {
    n=split(cols[idx, i], tmp, OFS)
    if (n==0)
       cols[idx,i]=trim($i)
    else {
      for(j=1; j<=n; j++)
        if (tmp[j]==trim($i)) break
      if (j>n) cols[idx,i]=cols[idx,i] SEP trim($i)
    }
  }
  nf=NF
}
END {
  for(i in idxA) {
    printf("%s%c", i, OFS)
    for(j=3; j<=nf; j++)
      printf("%s%c", cols[i,j], (j==nf)?RS:OFS)
  }
}


Thank you very much vgersh99.
Works absolutley fantastic.
Would appreciate if you can explain the code.
As this will help me and readers of this forum.
I think this is one of the toughest thread on this fourm board.
Again thank you sir
# 9  
Old 05-06-2009
Quote:
Originally Posted by zenith
The key is first 2 columns of file.
If the first 2 columns matches then the remaining columns are combined to on column for different records

This is complex to implement.
Help is highly appreciated
Assuming the first two keys are already sorted in your file:

Code:
$ 
$ cat input.txt
ID,place,org,animal,country
ITS234,chicago,zoo,Tiger,America
ITS234,chicago,USzoo,lion,America
ITS234,chicago,INzoo,zebra,America
ITS235,New York,zoo_1,Tiger,America
ITS235,New York,zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger,America
ITS236,Dallas,zoo,Camel,America
ITS237,Seattle,zoo,Tiger,America
ITS237,Seattle,zoo,Tiger,Russia
ITS237,Seattle,zoo,Tiger,Australia
ITS238,Memphis,park,Tiger,Russia
ITS238,Memphis,zoo,Eagle,America
ITS238,Memphis,library,Kangaroo,Australia
ITS299,Moscow,Mall,Jaguar,Russia
$
$ awk -F"," '$1","$2 == LastKey {
>   if ($3 != ORG) {ORG = ORG" "$3}
>   if ($4 != ANML) {ANML = ANML" "$4}
>   if ($5 != CTRY) {CTRY = CTRY" "$5}
> }
> $1","$2 != LastKey {
>   if (ORG != "") {print LastKey","ORG","ANML","CTRY}
>   LastKey = $1","$2
>   ORG = $3; ANML = $4; CTRY = $5
> }
> END {print LastKey","ORG","ANML","CTRY}' input.txt
ID,place,org,animal,country
ITS234,chicago,zoo USzoo INzoo,Tiger lion zebra,America
ITS235,New York,zoo_1 zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger Camel,America
ITS237,Seattle,zoo,Tiger,America Russia Australia
ITS238,Memphis,park zoo library,Tiger Eagle Kangaroo,Russia America Australia
ITS299,Moscow,Mall,Jaguar,Russia
$
$

And if they are not, then you will have to sort them before you pipe it to the awk script:

Code:
$ 
$ # first 2 keys are not sorted in this file
$                                           
$ cat input.txt                             
ID,place,org,animal,country                 
ITS237,Seattle,zoo,Tiger,Australia          
ITS234,chicago,zoo,Tiger,America
ITS234,chicago,USzoo,lion,America
ITS234,chicago,INzoo,zebra,America
ITS235,New York,zoo_1,Tiger,America
ITS235,New York,zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger,America
ITS299,Moscow,Mall,Jaguar,Russia
ITS236,Dallas,zoo,Camel,America
ITS237,Seattle,zoo,Tiger,America
ITS237,Seattle,zoo,Tiger,Russia
ITS238,Memphis,park,Tiger,Russia
ITS238,Memphis,zoo,Eagle,America
ITS238,Memphis,library,Kangaroo,Australia
$
$ sort -t"," -k1,2 input.txt |
> awk -F"," '$1","$2 == LastKey {
>   if ($3 != ORG) {ORG = ORG" "$3}
>   if ($4 != ANML) {ANML = ANML" "$4}
>   if ($5 != CTRY) {CTRY = CTRY" "$5}
> }
> $1","$2 != LastKey {
>   if (ORG != "") {print LastKey","ORG","ANML","CTRY}
>   LastKey = $1","$2;
>   ORG = $3; ANML = $4; CTRY = $5
> }
> END {print LastKey","ORG","ANML","CTRY}'
ID,place,org,animal,country
ITS234,chicago,INzoo USzoo zoo,zebra lion Tiger,America
ITS235,New York,zoo_1 zoo_2,Tiger,America
ITS236,Dallas,zoo,Camel Tiger,America
ITS237,Seattle,zoo,Tiger,America Australia Russia
ITS238,Memphis,library park zoo,Kangaroo Tiger Eagle,Australia Russia America
ITS299,Moscow,Mall,Jaguar,Russia
$
$

Hope that helps,
tyler_durden

__________________________________________________
"Without pain, without sacrifice, we would have nothing."
# 10  
Old 05-06-2009
if you have Python and able to use it
Code:
f=open("file")
firstline = f.readline().strip()
d={}
for line in f:
    line=line.strip().split(",")    
    d.setdefault(line[0],[])
    for v in line[1:]:
        if v not in d[line[0]]:
            d[line[0]].append(v)
print firstline
for i,j in d.iteritems():
    print "%s,%s"%(i,','.join(j))

output:
Code:
# ./test.py
ID,place,org,animal,country
ITS234,chicago,zoo,Tiger,America,lion,zebra

# 11  
Old 05-06-2009
Quote:
Originally Posted by durden_tyler
Assuming the first two keys are already sorted in your file:

Code:
$ 
$ cat input.txt
ID,place,org,animal,country
ITS234,chicago,zoo,Tiger,America
ITS234,chicago,USzoo,lion,America
ITS234,chicago,INzoo,zebra,America
ITS235,New York,zoo_1,Tiger,America
ITS235,New York,zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger,America
ITS236,Dallas,zoo,Camel,America
ITS237,Seattle,zoo,Tiger,America
ITS237,Seattle,zoo,Tiger,Russia
ITS237,Seattle,zoo,Tiger,Australia
ITS238,Memphis,park,Tiger,Russia
ITS238,Memphis,zoo,Eagle,America
ITS238,Memphis,library,Kangaroo,Australia
ITS299,Moscow,Mall,Jaguar,Russia
$
$ awk -F"," '$1","$2 == LastKey {
>   if ($3 != ORG) {ORG = ORG" "$3}
>   if ($4 != ANML) {ANML = ANML" "$4}
>   if ($5 != CTRY) {CTRY = CTRY" "$5}
> }
> $1","$2 != LastKey {
>   if (ORG != "") {print LastKey","ORG","ANML","CTRY}
>   LastKey = $1","$2
>   ORG = $3; ANML = $4; CTRY = $5
> }
> END {print LastKey","ORG","ANML","CTRY}' input.txt
ID,place,org,animal,country
ITS234,chicago,zoo USzoo INzoo,Tiger lion zebra,America
ITS235,New York,zoo_1 zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger Camel,America
ITS237,Seattle,zoo,Tiger,America Russia Australia
ITS238,Memphis,park zoo library,Tiger Eagle Kangaroo,Russia America Australia
ITS299,Moscow,Mall,Jaguar,Russia
$
$

And if they are not, then you will have to sort them before you pipe it to the awk script:

Code:
$ 
$ # first 2 keys are not sorted in this file
$                                           
$ cat input.txt                             
ID,place,org,animal,country                 
ITS237,Seattle,zoo,Tiger,Australia          
ITS234,chicago,zoo,Tiger,America
ITS234,chicago,USzoo,lion,America
ITS234,chicago,INzoo,zebra,America
ITS235,New York,zoo_1,Tiger,America
ITS235,New York,zoo_2,Tiger,America
ITS236,Dallas,zoo,Tiger,America
ITS299,Moscow,Mall,Jaguar,Russia
ITS236,Dallas,zoo,Camel,America
ITS237,Seattle,zoo,Tiger,America
ITS237,Seattle,zoo,Tiger,Russia
ITS238,Memphis,park,Tiger,Russia
ITS238,Memphis,zoo,Eagle,America
ITS238,Memphis,library,Kangaroo,Australia
$
$ sort -t"," -k1,2 input.txt |
> awk -F"," '$1","$2 == LastKey {
>   if ($3 != ORG) {ORG = ORG" "$3}
>   if ($4 != ANML) {ANML = ANML" "$4}
>   if ($5 != CTRY) {CTRY = CTRY" "$5}
> }
> $1","$2 != LastKey {
>   if (ORG != "") {print LastKey","ORG","ANML","CTRY}
>   LastKey = $1","$2;
>   ORG = $3; ANML = $4; CTRY = $5
> }
> END {print LastKey","ORG","ANML","CTRY}'
ID,place,org,animal,country
ITS234,chicago,INzoo USzoo zoo,zebra lion Tiger,America
ITS235,New York,zoo_1 zoo_2,Tiger,America
ITS236,Dallas,zoo,Camel Tiger,America
ITS237,Seattle,zoo,Tiger,America Australia Russia
ITS238,Memphis,library park zoo,Kangaroo Tiger Eagle,Australia Russia America
ITS299,Moscow,Mall,Jaguar,Russia
$
$

Hope that helps,
tyler_durden



__________________________________________________
"Without pain, without sacrifice, we would have nothing."

tyler_durden
Can you please explain the code.
# 12  
Old 05-07-2009
Quote:
Originally Posted by zenith
tyler_durden
Can you please explain the code.
Nope, I really don't want to rob from you the joy of discovering things by yourself. Smilie

The script is pretty brief and self-explanatory. Try it out on a sample data, comment out different portions and see how the result changes, check the syntax from the man pages or the online gawk manual ("http://www.gnu.org/software/gawk/manual/gawk.html"), put in some effort and soon enough, you'll figure it out yourself. And then you'll *know* it real good.

tyler_durden

__________________________________________________
"Without pain, without sacrifice, we would have nothing."
# 13  
Old 05-07-2009
Code:
nawk -F"," '{
if (NR==1)
{
print
next
}
else
{
	item=sprintf("%s,%s,%s",$1,$2,$3)
	arr[item]=sprintf("%s,%s",arr[item],$4)
	tmp=$5
}
}
END{
	for(i in arr)
		print i""arr[i]","tmp
}
' a.txt

# 14  
Old 05-18-2009
Quote:
Originally Posted by vgersh99
make sure your fields have no 'leading' and/or 'trailing spaces.
Otherwise:
Code:
BEGIN {
  FS=OFS=","
  SEP=" "
}
function trim(str)
{
    sub("^[ ]*", "", str);
    sub("[ ]*$", "", str);
    return str;
}
 
{
  idx=trim($1) OFS trim($2)
  idxA[idx]
  for(i=3; i<=NF; i++) {
    n=split(cols[idx, i], tmp, OFS)
    if (n==0)
       cols[idx,i]=trim($i)
    else {
      for(j=1; j<=n; j++)
        if (tmp[j]==trim($i)) break
      if (j>n) cols[idx,i]=cols[idx,i] SEP trim($i)
    }
  }
  nf=NF
}
END {
  for(i in idxA) {
    printf("%s%c", i, OFS)
    for(j=3; j<=nf; j++)
      printf("%s%c", cols[i,j], (j==nf)?RS:OFS)
  }
}



I have modified the above script to handle more columns:
Code:
  
    nawk 'BEGIN {
      FS=OFS="|"
      SEP=" "
    }
    function trim(str)
    {
        sub("^[ ]*", "", str);
        sub("[ ]*$", "", str);
        return str;
    }
    {
      idx=trim($1) OFS trim($2) OFS trim($3) OFS trim($4) OFS trim($5) OFS trim($6) OFS trim($7) OFS trim($8)
      idxA[idx]
      for(i=9; i<=NF; i++) {
        n=split(cols[idx, i], tmp, OFS)
        if (n==0)
           cols[idx,i]=trim($i)
        else {
          for(j=1; j<=n; j++)
            if (tmp[j]==trim($i)) break
          if (j>n) cols[idx,i]=cols[idx,i] SEP trim($i)
        }
      }
      nf=NF
    }
    END {
      for(i in idxA) {
        printf("%s%c", i, OFS)
        for(j=9; j<=nf; j++)
          printf("%s%c", cols[i,j], (j==nf)?RS:OFS)
      }
    }' file

file
Quote:
MTPA|ABC|input|L|||||tlmpaa|lmno|beint||||||||||tlmpaa - lmno - beint
KLMP|ABC|tmandu|L|12354|45687|54521|This is test'|sonsula-9|ukpar|tmdbp||||||||sonsula-9 - ukpar - tmdbp

Output :

Quote:
KLMP|ABC|tmandu|L|12354|45687|54521|This is test'|sonsula-9|ukpar|tmdbp||||||||sonsula-9 - ukpar - tmdbp
MTPA|ABC|input|L|||||tlmpaa|lmno|beint||||||||

It is eating away last 3 columns when there are empty fields in first 8 fields
Quote:
MTPA|ABC|input|L|||||tlmpaa|lmno|beint||||||||
Help is appreciated
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining two lines into one, UNIX

Hi All, I have a file which has the following sample lines -- <Member name="Canada" Currency="CAD" -- <Member name="UK" Currency="GBP" -- <Member name="Switzerland" Currency="CHF" -- <Member name="Germany" Currency="EUR" -- (11 Replies)
Discussion started by: dev.devil.1983
11 Replies

2. Shell Programming and Scripting

Reading two lines in a while loop and combining the lines

Dear all, I have a file like this: imput scaffold_0 1 scaffold_0 10000 scaffold_0 20000 scaffold_0 25000 scaffold_1 1 scaffold_1 10000 scaffold_1 20000 scaffold_1 23283 and I want the output like this: scaffold_0 1 scaffold_0 10000 scaffold_0 10000 scaffold_0 20000... (6 Replies)
Discussion started by: valente
6 Replies

3. Shell Programming and Scripting

awk and combining lines to stdout

I am trying to come up with a good approach to taking a file and only printing 10 columns. The input file has duplicate lines but only the 6th column has real value. I just need to combine the lines and output 1 line per example file: 1 2.0765 AA 10 0.6557 ..... 1 2.0765 AA 10 0.6655 ..... 2... (12 Replies)
Discussion started by: mykey242
12 Replies

4. Shell Programming and Scripting

Unix cmd prompt how to get old cmd run?

Hi, I am using SunOS I want to serch my previous command from unix prompt (like on AIX we can search by ESC -k) how to get in SunOs urgent help require. (10 Replies)
Discussion started by: RahulJoshi
10 Replies

5. Shell Programming and Scripting

searching thru or combining multiple lines in a unix file

This is the problem actually: This regex: egrep "low debug.*\".*\"" $dbDir/alarmNotification.log is looking for data between the two quotation marks: ".*\" When I hate data like this: low debug 2009/3/9 8:30:20.47 ICSNotificationAlarm Prodics01ics0003 IC... (0 Replies)
Discussion started by: ndedhia1
0 Replies

6. Shell Programming and Scripting

combining lines between 2 pattern using awk

Hi I am fairly new to shell scripting i have some file with outout 1011 abc fyi 6.1.4.5 abr tio 70986 dfb hji 4.1.7 ....some text 111114 i have to format this text to 1011 abc fyi 6.1.4.5 abr tio 70986 dfb hji 4.1.7 ....some text 111114 (3 Replies)
Discussion started by: shell.scriptor
3 Replies

7. Shell Programming and Scripting

combining unix commands and awk program

Dear Experts I am trying to find if it is possible to combine unix commands in awk program. For example if it is possible embed rm or ls or any unix command inside the awk program and while it is reading the file besides printing be able to do some unix commands. I am thinking may be just print... (2 Replies)
Discussion started by: Reza Nazarian
2 Replies

8. Shell Programming and Scripting

help combining lines in awk

I seem to have gotten myself in over my head on this one. I need help combining lines together. I have a text file containing 24,000 lines (exactly why I need awk) due to bad formatting it has separated the lines (ideally it should be 12,000 lines total). Example of file: ... (2 Replies)
Discussion started by: blueheed
2 Replies

9. Shell Programming and Scripting

need help appending lines/combining lines within a file...

Is there a way to combine two lines onto a single line...append the following line onto the previous line? I have the following file that contains some blank lines and some lines I would like to append to the previous line... current file: checking dsk c19t2d6 checking dsk c19t2d7 ... (2 Replies)
Discussion started by: mr_manny
2 Replies

10. UNIX for Dummies Questions & Answers

get only a few lines from a unix cmd

I'd like to get only the first 5 lines of the ls -lt command, i tried to pass to head as a file ip but didnt work, is there any other way to do it. I am trying to find the lates log files for the last 5 days. what i tried head -5 < ls -lt alog* Thanks. -d (1 Reply)
Discussion started by: dharma
1 Replies
Login or Register to Ask a Question