sed or awk editing help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed or awk editing help
# 43  
Old 11-03-2010
Try some c:
Code:
$ cat mysrc/tps.c
 
#include <stdio.h>
 
main( int argc, char ** argv ){
 
        int c ;         /* character read */
        int is = '|' ;  /* in sep. */
        int os = '|' ;  /* out sep. */
        int sf = 0 ;    /* spaces found */
        int pf = 0 ;    /* pipes found */
        int lf = 1 ;    /* linefeed found */
 
        for ( c = 1 ; c < argc ; c++ ){
 
                if ( !strcmp( argv[c], "-is" )
                  && ++c < argc ){
                        is = argv[c][0] ;
                        continue ;
                 }
 
                if ( !strcmp( argv[c], "-os" )
                 && ++c < argc ){
                        os = argv[c][0] ;
                        continue ;
                 }
 
                fputs(
"\n"
"Usage: tps [ -is <in_sep> ] [ -os <out_sep> ]\n"
"\n"
"Trims leading and trailing spaces from <in_sep> delimited fields.  Removes\n"
"trailing delimiters.  Replaces delimiter with <out_sep> if present.\n"
"The default <in_sep> is a pipe (|).\n"
"\n"
                        , stderr );
 
                exit( 1 );
         }
 
        do {
                switch( c = getchar()){
 
                case EOF:
                        if ( ferror( stdin )){
                                perror( "stdin" );
                                exit( 1 );
                         }
                        exit( 0 );
 
                case ' ':
                        if ( !pf && !lf ){
                                sf++ ;
                         }
                        continue ;
 
                case '\n':
                        pf = 0 ;
                        lf = 1 ;
                        sf = 0 ;
                        break ;
 
                default:
                        if ( c == is ){
                                pf++ ;
                                lf = 0 ;
                                sf = 0 ;
                                c = os ;
                                continue ;
                         }
 
                        while ( pf ){
                                pf-- ;
                                if ( EOF == putchar( os )){
                                        if ( ferror( stdin )){
                                                perror( "stdin" );
                                                exit( 1 );
                                         }
                                        exit( 0 );
                                 }
                         }
 
                        lf = 0 ;
 
                        while ( sf ){
                                sf-- ;
                                if ( EOF == putchar( ' ' )){
                                        if ( ferror( stdin )){
                                                perror( "stdin" );
                                                exit( 1 );
                                         }
                                        exit( 0 );
                                 }
                         }
 
                        break ;
                 }
 
                if ( EOF == putchar( c )){
                        if ( ferror( stdin )){
                                perror( "stdin" );
                                exit( 1 );
                         }
                        exit( 0 );
                 }
 
        } while ( c != EOF );
 
        exit( 0 );
 }

BTW: Once, I took a 9 GB Sharebase DB and trimed and compress'd (not even gzip) the tables as flat files, and it came out about 450 MB = 5%! There are some nice JDBC products out there that act like entire RDBMS to query delimited files. One that I inspired to expand with my suggestions and potential patronage takes multiple delimited flat files in zip file subdirectories and models them, concatenated, as a table (the zip file is just modeled as another directory, so you can select directories, zip files, zipped directories and zipped files with a mix of constants and file wild cards in the "table" path); you can partition and compress your data and still query it without the RDBMS and related disk cost.

Last edited by DGPickett; 11-03-2010 at 10:58 AM..
# 44  
Old 11-03-2010
Code:
bash-3.00$ time echo "   ,abc,  def,ghi,   ,     ,  jk lm,   " | perl -plne 's/(,*)\s+,+/$1,/g;'
,abc,  def,ghi,,,  jk lm,   

real    0m0.022s
user    0m0.009s
sys     0m0.013s


Last edited by ahmad.diab; 11-03-2010 at 12:48 PM..
# 45  
Old 11-03-2010
What's that, ahmad.diab, I get lost, 2 orders of magnitude? Wait, you're timing echo!
Code:
echo "   ,abc,  def,ghi,   ,     ,  jk lm,   " | time perl -plne 's/(,*)\s+,+/$1,/g;'

Scrutinizer can generate comparable # since he has the same big data set and CPU/system as prior benchmarks.

It looks like you are trimming just leading spaces, not either "just all spaces" or else "both leading and trailing".

Last edited by DGPickett; 11-03-2010 at 12:58 PM..
# 46  
Old 11-03-2010
what do you mean by "2 orders of magnitude?" ?

---------- Post updated at 18:01 ---------- Previous update was at 17:53 ----------

what about the spaces in between in below.try it your self by adding spaces in between.

Code:
echo "   ,abc,  def,ghi,   ,     ,  jk lm,   " | time perl -plne 's/(,*)\s+,+/$1,/g;'

Code:
time perl -plne 's/(,*)\s+,+/$1,/g;' infile
,abc,,def,ghi,,,,,jk,lm,,
,abc,,def,ghi,,,,,jk,lm,,
,abc,,def,ghi,,,,,jk,lm,,
,abc,,def,ghi,,,,,jk,lm,,

real    0m0.018s
user    0m0.008s
sys     0m0.009s

cat infile:-
Code:
   ,abc,  def,ghi,   ,     ,   ,  jk lm,   
   ,abc,  def,ghi,   ,     ,   ,  jk lm,   
   ,abc,  def,ghi,   ,     ,   ,  jk lm,   
   ,abc,  def,ghi,   ,     ,   ,  jk lm,


Last edited by ahmad.diab; 11-03-2010 at 01:07 PM..
# 47  
Old 11-03-2010
@Ahmad that will not do, it will add comma's.
Code:
     ,          abc,             ,  sd   ,      ,   ,     ,

becomes:
Code:
,,          abc,,  sd,,,,,,,,

@DG, I could only get your C-program to work with the default separator (|) but then it eats separators:
Code:
     |          abc|             |  sd   |      |   |     |

becomes
Code:
|abc||sd


Last edited by Scrutinizer; 11-03-2010 at 01:12 PM..
# 48  
Old 11-03-2010
try it again I just misspell the command:-

Code:
perl -plne 's/(,*)\s+,+/$1,/g;' infile

# 49  
Old 11-03-2010
Code:
mawk -F, '{for (i=1;i<=NF;i++)sub(/^ +$/,"",$i)}1' OFS=,  infile

is on par with the fastest sed. gawk took 2,5 times as long

---------- Post updated at 17:25 ---------- Previous update was at 17:23 ----------

Quote:
Originally Posted by ahmad.diab
try it again I just misspell the command:-

Code:
perl -plne 's/(,*)\s+,+/$1,/g;' infile

That looks better, but in the field that contains "sd", spaces get cut after the characters and it leaves spaces after the last comma (in the last field).
Code:
,          abc,,  sd,,,,

---------- Post updated at 17:38 ---------- Previous update was at 17:25 ----------

Code:
mawk '{sub(/^ +,/,",");gsub(/, +,/,",,");gsub(/, +,/,",,");sub(/, +$/,",")}1'  infile

is the fastest of all solutions tested so far (20% faster then the fastest sed).

Last edited by Scrutinizer; 11-03-2010 at 02:12 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Editing files with sed or something similar

{ "AFafa": "FAFA","AFafa": "FAFA" "baseball":"soccer","wrestling":"dancing" "rhinos":"crocodiles","roles":"foodchain" } I need to insert a new line before the closing brackets "}" so that the final output looks like this: { "AFafa": "FAFA","AFafa": "FAFA"... (6 Replies)
Discussion started by: SkySmart
6 Replies

2. Shell Programming and Scripting

editing file with awk cut and sed

HI All, I am new to unix. I have a file would like to do some editing by using awk, cut and sed. Could anyone help? This file contain 100 lines. There are one line for example: 2,"102343454",5060,"579668","579668","579668","SIP",,,"825922","035885221283026",1,268,"00:59:00.782 APR 17... (2 Replies)
Discussion started by: mimilaw
2 Replies

3. UNIX for Dummies Questions & Answers

sed help finding and editing

With sed 1. I need to find a line that contains "DVM" and "73069". 2. I need to insert a double quote at the beginning of the first line of the file. These two have been driving me crazy for the last 45 minutes. Any help would be greatly appreciated. Thanks (3 Replies)
Discussion started by: nlassiter
3 Replies

4. UNIX for Dummies Questions & Answers

sed editing help....

Hello all, I need some help with sed. seems like i cant get through it. So here is what i am trying. when i do ps -ef|grep bla blah ...like below...i get /u01/app/oracle/11g/bin/tnslsnr .... but i want to replace that string with something using sed. So basically i want to get rid of... (3 Replies)
Discussion started by: abdul.irfan2
3 Replies

5. Shell Programming and Scripting

Line/Variable Editing for Awk sed Cut

Hello, i have a file, i open the file and read the line, i want to get the first item in the csv file and also teh third+6 item and wirte it to a new csv file. only problem is that using echo it takes TOO LONG: please help a newbie. below is my code: WorkingDir=$1 FileName=`cut -d ',' -f... (2 Replies)
Discussion started by: limamichelle
2 Replies

6. Shell Programming and Scripting

Comparison and editing of files using awk.(And also a possible bug in awk for loop?)

I have two files which I would like to compare and then manipulate in a way. File1: pictures.txt 1.1 1.3 dance.txt 1.2 1.4 treehouse.txt 1.3 1.5 File2: pictures.txt 1.5 ref2313 1.4 ref2345 1.3 ref5432 1.2 ref4244 dance.txt 1.6 ref2342 1.5 ref2352 1.4 ref0695 1.3 ref5738 1.2... (1 Reply)
Discussion started by: linuxkid
1 Replies

7. Shell Programming and Scripting

problem in using sed command in editing a file

Hi all, I have a conf file, i want to update some entries in that conf file. Below is the code for that using a temporary file. sed '/workgroup=/ c\workgroup=Workgroup' /usr/local/netx.conf > /usr/local/netx.conf.tmp mv -f /usr/local/netx.conf.tmp /usr/local/netx.conf Sample contents of... (9 Replies)
Discussion started by: ranj14r
9 Replies

8. Homework & Coursework Questions

String editing using sed? awk?

1. The problem statement, all variables and given/known data: Problem Statement for project: When an account is created on the CS Unix network, a public html directory is created in the account's home directory. A default web page is put into that directory. Some users replace or... (13 Replies)
Discussion started by: peage1475
13 Replies

9. Shell Programming and Scripting

Editing Commas in a textfile using sed

Hi guys task removing the last commas of 5th and 6th columns. The bug in the script is causing effect because of whitespaces around commas. I tried to delete white spaces first and running the above script. but still some where getting the results wrong. I already have a script to do this... (12 Replies)
Discussion started by: repinementer
12 Replies

10. Shell Programming and Scripting

Editing File using awk/sed

Hello Awk Gurus, Can anyone of you help me with the below problem. I have got a file having data in below format pmFaultyTransportBlocks ----------------------- 9842993 pmFrmNoOfDiscRachFrames ----------------------- NULL pmNoRecRandomAccSuccess -----------------------... (4 Replies)
Discussion started by: Mohammed
4 Replies
Login or Register to Ask a Question