Group By in Unix


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Group By in Unix
# 1  
Old 11-04-2010
Group By in Unix

Hi,
I have file with Header Data and trailer records

Head|currentdate|EOF
Data|AAA|BBB|CCC|DDD|EEE|Source1
Data|AAA|BBB|CCC|DDD|EEE|Source1
Data|AAA|BBB|CCC|DDD|EEE|Source2
Data|AAA|BBB|CCC|DDD|EEE|Source2
Data|AAA|BBB|CCC|DDD|EEE|Source2
End|rec|EOF

Now I need the count of only "Data" records (5 records as per abv ex)
group by the Source system. My output should be

Source1 2
Source2 3
How can i achive this functionality in UNIX, Your help will be highly appreciated.
# 2  
Old 11-04-2010
Code:
 
grep '^Data|' | cut -d'|' -f 6 | sort | uniq -c

Get the data lines, cut out the system field, sort them and count each value. Someone once did a sort of SQL in shell, years ago. Now you can get JDBC tools for flat files that let you query.
# 3  
Old 11-04-2010
or:
Code:
sed -n 's/^Data.*|//p' file | sort | uniq -c

Code:
sed '1d;$d;s/.*|//' file | sort | uniq -c

awk:
Code:
awk -F'|' 'NF>3{A[$NF]++}END{for(i in A) print i,A[i]}' file

# 4  
Old 11-05-2010
I wrote an aggregator in C:
Code:
sed -n 's/^Data.*|//p' file | aggsx -l
 
$ aggsx --help
Usage:
aggsx [ -b ] [ -l ] [ -p <prefix> ] [ -u ] [ -d ] [ -h ]
Computes the count distinct, count null, min, count of min, max,
count of max, average (mean) of not null values if numeric,
median of not null values, largest of the most popular values,
count of that most popular value.
If -l is present, first prints out all values in order and their counts,
null last, but no aggregates.
If -b is present, prints out like -l and then prints aggregates.
If -p is present, the aggregate is prefixed with '<prefix>|'.
If -u is present, just immediately prints out unique values.
If -d is present, just immediately prints out duplicated values.
If -h is present, prefixes values line with header line:
CtD|CtN|Min|CtMin|Max|CtMax|Avg|Med|MPop|CtMPop
 
$ cat mysrc/aggsx.c
#include <stdio.h>
#include <limits.h>
#include <errno.h>
#include <stdlib.h>
#include <strings.h>
static  long double     sum = 0.0 ;
static  long double     nval ;
static  unsigned long   lct = 0 ;
static  unsigned long   nct = 0 ;
static  unsigned long   ll2 ;
static  unsigned long   nvc = 0 ;
static  unsigned long   dct = 0 ;
static  unsigned long   act = 0 ;
static  unsigned long   ll ;
static  unsigned long   mpc = 0 ;
static  unsigned long   *vct = NULL ;   /* value counts */
static  unsigned long   *lp ;
static  char            **vl = NULL ;   /* value list */
static  char            **cpp ;
static  char            *cp ;
static  char            *cp2 ;
static  char            *cp3 ;
static  char            *me = "" ;
static  char            *mp = "" ;
static  char            *pfx = NULL ;
static  int             i ;
static  int             d = 0 ; /* -d option */
static  int             u = 0 ; /* -u option */
static  int             l = 0 ; /* -l option */
static  int             b = 0 ; /* -b option */
static  int             num = 1 ;
static  int             lfm ;   /* line feed missing */
static  char            buf[66000] ;
static  void            fmv( char *val )
{
        unsigned long   cv ;
        unsigned long   cl = 0 ;
        unsigned long   ch ;
        int             r ;
        char            **cf ;
        char            **ct ;
        char            **ce ;
        unsigned long   *lf ;
        unsigned long   *lt ;
        if ( dct )
                for ( cl = 0, ch = dct - 1 ; cl <= ch ; )
                {
                        cv = ( ch + cl ) >> 1 ;
                        r = strcmp( val, vl[cv] );
                        if ( r > 0 )
                        {
                                cl = cv + 1 ;
                        }
                        else if ( r < 0 )
                        {
                                if ( cv )
                                        ch = cv - 1 ;
                                else
                                        break ;
                        }
                        else
                        {
                                lt = vct + cv ;
                                *lt += 1 ;
                                if ( d
                                  && *lt == 2 ) /* report dups */
                                {
                                        if ( 0 > printf( "%s\n", val )
                                          || fflush( stdout ) )
                                        {
                                                if ( ferror( stdout ) )
                                                {
                                                        perror( "stdout" );
                                                        exit( 1 );
                                                }
                                                exit( 0 );
                                        }
                                }
                                return ;
                        }
                }
        if ( u ) /* report unique */
        {
                if ( 0 > printf( "%s\n", val )
                  || fflush( stdout ) )
                {
                        if ( ferror( stdout ) )
                        {
                                perror( "stdout" );
                                exit( 1 );
                        }
                        exit( 0 );
                }
        }
        cv = dct ;
        if ( ++dct > act )
        {
                act += 1024 ;
                if ( !( vl = realloc( vl, act * sizeof( char* ) ) ) )
                {
                        perror( "realloc()" );
                        exit( 1 );
                }
                if ( !( vct = realloc( vct, act * sizeof( long ) ) ) )
                {
                        perror( "realloc()" );
                        exit( 1 );
                }
        }
        for ( ce = vl + cl,
                cf = ( ( ct = vl + cv ) - 1 ),
                lf = ( ( lt = vct + cv ) - 1 ) ;
              ct > ce ;
              cf--, ct--, lf--, lt-- )
        {
                *ct = *cf ;
                *lt = *lf ;
        }
        *lt = 1 ;
        if ( !( *ct = malloc( strlen( val ) + 1 ) ) )
        {
                perror( "malloc()" );
                exit( 1 );
        }
        strcpy( *ct, val );
        return ;
}
int main( int argc, char **argv ){
        setvbuf( stdin, NULL, _IOFBF, PIPE_MAX );
        setvbuf( stdout, NULL, _IOFBF, PIPE_MAX );
        for ( i = 1 ; i < argc ; i++ )
        {
                if ( !strcmp( argv[1], "-b" ) )
                {
                        b = 1 ;
                        continue ;
                }
                if ( !strcmp( argv[1], "-l" ) )
                {
                        l = 1 ;
                        continue ;
                }
                if ( !strcmp( argv[1], "-p" )
                  && ++i < argc )
                {
                        pfx = argv[i];
                        continue ;
                }
                if ( !strcmp( argv[1], "-u" ) )
                {
                        u = 1 ;
                        continue ;
                }
                if ( !strcmp( argv[1], "-d" ) )
                {
                        d = 1 ;
                        continue ;
                }
                if ( !strcmp( argv[1], "-h" ) )
                {
                        fputs( 
"CtD|CtN|Min|CtMin|Max|CtMax|Avg|Med|MPop|CtMPop|Ct\n",
                                stdout );
                        continue ;
                }
                fputs(
"Usage:\n"
"\n"
"aggsx [ -b ] [ -l ] [ -p <prefix> ] [ -u ] [ -d ] [ -h ]\n"
"\n"
"Computes the count distinct, count null, min, count of min, max,\n"
"count of max, average (mean) of not null values if numeric,\n"
"median of not null values, largest of the most popular values,\n"
"count of that most popular value.\n"
"\n"
"If -l is present, first prints out all values in order and their counts,\n"
"null last, but no aggregates.\n"
"If -b is present, prints out like -l and then prints aggregates.\n"
"If -p is present, the aggregate is prefixed with '<prefix>|'.\n"
"If -u is present, just immediately prints out unique values.\n"
"If -d is present, just immediately prints out duplicated values.\n"
"If -h is present, prefixes values line with header line:\n"
"CtD|CtN|Min|CtMin|Max|CtMax|Avg|Med|MPop|CtMPop\n"
"\n"                    , stderr );
                exit( 1 );
        }
        while( fgets( buf, sizeof( buf ), stdin ) )
        {
                lct++ ;
                for ( cp = buf, cp2 = cp3 = NULL, lfm = 1 ; *cp ; cp++ )
                {
                        switch( *cp )
                        {
                        case '\n':
                                lfm = 0 ;
                                /* intentional fall through */
                        case '\r':
                                /* intentional fall through */
                        case ' ':
                                /* intentional fall through */
                        case '\t':
                                continue ;
                                /* intentional fall through */
                        default:
                                if ( !cp2 )
                                {
                                        cp2 = cp ;
                                }
                                cp3 = cp ;
                        }
                }
                if ( lfm )
                {
                        fprintf( stderr, "\nFatal: Data line %lu too long!\n",
                                                lct );
                        exit( 1 );
                }
                if ( cp3 )
                {
                        *(++cp3) = NULL ;
                        cp = cp2 ;
                }
                if ( strcmp( cp, "<null>" ) )
                {
                        fmv( cp );
                }
                else
                {
                        nct++ ;
                }
        }
        if ( ferror( stdin ) )
        {
                perror( "stdin" );
                exit( 1 );
        }
        if ( u
          || d )
                exit( 0 );
        if ( l || b )
        {
                for ( ll = 0, cpp = vl, lp = vct ;
                        ll < dct ;
                        ll++, lp++, cpp++ )
                {
                        if ( 0 > printf( "%lu\t%s\n", *lp, *cpp ) )
                        {
                                if ( ferror( stdout ) )
                                {
                                        perror( "stdout" );
                                        exit( 1 );
                                }
                                exit( 0 );
                        }
                }
                if ( nct
                  && 0 > printf( "%lu\t%s\n", nct, "<null>" ) )
                {
                        if ( ferror( stdout ) )
                        {
                                perror( "stdout" );
                                exit( 1 );
                        }
                        exit( 0 );
                }
                if ( !b )
                {
                        exit( 0 );
                }
        }
        for ( ll = 0L, cpp = vl, lp = vct, ll2 = ( ( lct - nct ) >> 1 ) + nct ;
              ll < dct ; ll++, lp++, cpp++ )
        {
                cp = *cpp ;
                if ( *lp >= mpc )
                {
                        mpc = *lp ;
                        mp = cp ;
                }
                if ( *cp
                  && num )
                {
                        errno = 0 ;
                        nval = strtod( cp, &cp2 );
                        if ( errno              /* underflow or overflow */
                          || ( cp2 == cp )      /* didn't like the characters */
                          || *cp2 )             /* didn't like some */
                        {
                                num = 0 ;
                        }
                        else
                        {
                                sum += ( nval * *lp ) ;
                                nvc += *lp ;
                        }
                }
                if ( ll2 <= lct  )
                {
                        me = cp ;
                        ll2 += *lp ;
                }
        }
        if ( num
          && nvc )
        {
                sum /= nvc ;
                sprintf( buf, "%-30.20LG", sum );
                for ( cp = buf + strlen( buf ) - 1 ;
                      cp >= buf && *cp == ' ' ;
                      cp-- )
                {
                        *cp = NULL ;
                }
        }
        else
        {
                strcpy( buf, "N/A" );
        }
        if ( ( ( pfx
              && 0 > printf( "%s|", pfx ) )
            || 0 > printf( "%lu|%lu|%s|%lu|%s|%lu|%s|%s|%s|%lu|%lu\n",
                                dct, nct,
                                ( dct ? vl[0] : "" ),
                                ( dct ? vct[0] : 0 ),
                                ( dct ? vl[dct - 1] : "" ),
                                ( dct ? vct[dct - 1] : 0 ),
                                buf, me, mp, mpc, lct ) )
          && ferror( stdout ) )
        {
                perror( "stdout" );
                exit( 1 );
        }
        exit( 0 );
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To group records in the file in UNIX

Hi All, I am using RHEL 6.9. I got a requirement to group the records in a file.The file content as shown below. #### FAILED JOBS IN XXX ##### 1> ABCD failed in the project XXX 2> HJK Job is in compiled state in the project XXX 3> ILKD failed in the project XXX 4> DFG failed in the... (5 Replies)
Discussion started by: ginrkf
5 Replies

2. Shell Programming and Scripting

Group by in UNIX

Hi team i have input file name,dep,sal xxx,1,100 yyy,2,,200 zzz,1,3000 eeee,1,200 ttttt,2,500 zzz,2,123 xyxy,3,1000 and output i require as below i.e highest value from colum3 grouping bydep, with all three columns name,dep,sal name,dep,sal zzz,1,3000 ttttt,2,500 xyxy,3,1000 (3 Replies)
Discussion started by: zozoo
3 Replies

3. UNIX for Advanced & Expert Users

Adding UNIX user to a group

Hi, I am new to unix. I am facing access permission issue I want to access path /app/compress from a user "test" but getting permission denied error This path exist in "Main" user So after some googling i came to know we need to add "test" user in "main" group so path /app/compress ... (7 Replies)
Discussion started by: sv0081493
7 Replies

4. Cybersecurity

UNIX group id

How to add a user to the existing user account of a solarise server? In our solarise server we hav a acc in the name telecom ,now how to add a new user to teleco acc so that he can login in to server through telecom acc. Thanks to help (4 Replies)
Discussion started by: kkalyan
4 Replies

5. UNIX for Dummies Questions & Answers

Portland UNIX student group

In CS140 .... I am having a very hard time with lab 4. I am wondering if we could put together a study group in portland. This could help all of us. Post here and I will PM you my # and we can set it up over the phone. (0 Replies)
Discussion started by: aeamacman
0 Replies

6. Shell Programming and Scripting

create group in unix

Hi, I want to create group in unix. what is the command? how to create a group and add a user into that group? Thanks in advance (2 Replies)
Discussion started by: senthil_is
2 Replies

7. Shell Programming and Scripting

Achieving group by logic via Unix

HI friends, select count(*),country_code from employees_table group by country_code having com_country_code in ("US","UK") CAn we have an equivalent command in Unix to achieve this Thanks in advance Suresh (5 Replies)
Discussion started by: sureshg_sampat
5 Replies

8. Solaris

unix group file limitation

Does anyone know how to get around the unix group file limitation whereby you have a limit of 1024 characters when adding users to a unix group? (3 Replies)
Discussion started by: asmillie
3 Replies

9. UNIX for Dummies Questions & Answers

listing members of a unix group

I know there is a "groups" command to list the groups a user belongs to, but how about the opposite? Is there a standard command to find out which users belong to a particular group? (2 Replies)
Discussion started by: ovaska
2 Replies

10. UNIX for Dummies Questions & Answers

how to define permission of unix group

While logged on as root, I created a user 'usera' I also created a group called 'groupa' I need to modify the permission of the user i created to not have root privileges. I also need to change groupa to be in 'others' please help! thanks, nieves (3 Replies)
Discussion started by: mncapara
3 Replies
Login or Register to Ask a Question