Sort, sed, and zero padding date column csv bash scripting | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Sort, sed, and zero padding date column csv bash scripting

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 07-20-2013
sean1357 sean1357 is offline
Registered User
 
Join Date: Jul 2013
Last Activity: 20 July 2013, 7:05 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Sort, sed, and zero padding date column csv bash scripting

Hello people,

I am having problem to sort, sed and zero padding of column in csv file.

7th column only.

Input of csv file:

Code:
1,2,3,4,5,6,4/1/2010 12:00 AM,8
1,2,3,4,5,6,3/11/2010 9:39 AM,8
1,2,3,4,5,6,5/12/2011 3:43 PM,8
1,2,3,4,5,6,12/20/2009 7:23 PM,8

Output:

Code:
1,2,3,4,5,6,12/20/2009,8
1,2,3,4,5,6,03/11/2010,8
1,2,3,4,5,6,04/01/2010,8
1,2,3,4,5,6,05/12/2011,8

Very thank you for you help.

Last edited by Don Cragun; 07-20-2013 at 03:11 PM.. Reason: Add CODE tags
Sponsored Links
    #2  
Old 07-20-2013
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 26 November 2014, 4:40 PM EST
Location: Galactic Empire
Posts: 3,389
Thanks: 236
Thanked 1,210 Times in 1,136 Posts
How about using awk and sort?

Code:
awk -F, '
        {
                sub( / .*/, X, $7 )
                split ( $7, V, "/" )
                $7 = sprintf ( "%02d/%02d/%d", V[1], V[2], V[3] )
                print
        }
' OFS=, file | sort -t"," -k 7.9 -k 7.5 -k 7

Sponsored Links
    #3  
Old 07-20-2013
sean1357 sean1357 is offline
Registered User
 
Join Date: Jul 2013
Last Activity: 20 July 2013, 7:05 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
It works. Thank you very muh for your help. Now, I having another problem how to remove row from range date.

range from 04/01/2010 - 05/04/2013, but the problem is that 04/01/2010 or 05/04/2013 maybe doesn't existed in csv file. Is the a way that you just compare date of input?

Example:

Code:
1,2,3,4,5,6,03/02/2009,8   -> need to remove column
1,2,3,4,5,6,08/03/2009,8   -> need to remove column
1,2,3,4,5,6,04/01/2010,8   -> keep from here
....
....
1,2,3,4,5,6,05/01/2013,8   -> keep to end
1,2,3,4,5,6,05/11/2013,9  -> need to remove from this point

    #4  
Old 07-20-2013
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 26 November 2014, 4:40 PM EST
Location: Galactic Empire
Posts: 3,389
Thanks: 236
Thanked 1,210 Times in 1,136 Posts
What is your SHELL and OS ?
Sponsored Links
    #5  
Old 07-20-2013
sean1357 sean1357 is offline
Registered User
 
Join Date: Jul 2013
Last Activity: 20 July 2013, 7:05 PM EDT
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks. I'm using Cygwin bash script on Windows 7. Is this will be a problem?
Sponsored Links
    #6  
Old 07-20-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 28 November 2014, 7:01 AM EST
Location: San Jose, CA, USA
Posts: 5,109
Thanks: 196
Thanked 1,707 Times in 1,450 Posts
Quote:
Originally Posted by Yoda View Post
How about using awk and sort?

Code:
awk -F, '
        {
                sub( / .*/, X, $7 )
                split ( $7, V, "/" )
                $7 = sprintf ( "%02d/%02d/%d", V[1], V[2], V[3] )
                print
        }
' OFS=, file | sort -t"," -k 7.9 -k 7.5 -k 7

I'm a little bit confused by this. Yoda's code is usually right on, but the sort keys used here make the last two digits of the year to the end of the line the primary sort key, the last digit of the day to the end of the line the secondary sort key, and the start of the date field to the end of the line the tertiary sort key. (This seems to have worked with the sample data only because the 8th field in all of the input lines was identical and the last digit of the day was a constant on the two input lines with the same year.) I think the sort he intended to perform on the awk output was:

Code:
sort -t"," -k 7.7,7 -k 7.1,7.5

which makes the entire year the primary sort key and the month and day together as the secondary sort key.

With the new requirement to exclude a date range from the output, you could try something like:

Code:
awk -F, -v sd="04/01/2010" -v ed="05/04/2013" '
BEGIN { ss = substr(sd, 7) substr(sd,1,2) substr(sd,4,2)
        es = substr(ed, 7) substr(ed,1,2) substr(ed,4,2)
}
{       sub(/ .*/, X, $7)
        split($7, V, "/")
        cs = sprintf("%d%02d%02d", V[3], V[1], V[2])
        if(cs >= ss && cs <= es) next
        $7 = sprintf("%02d/%02d/%d", V[1], V[2], V[3])
        print
}' OFS=, input.csv | sort -t"," -k 7.7,7 -k 7.1,7.5

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /usr/bin/awk or /bin/awk .

With the input file:

Code:
1,2,3,4,5,6,4/1/2010 12:00 AM,1
1,2,3,4,5,6,3/11/2010 9:39 AM,2
1,2,3,4,5,6,5/12/2011 3:43 PM,3
1,2,3,4,5,6,12/20/2009 7:23 PM,4
1,2,3,4,5,6,03/02/2009 1:23 AM,5
1,2,3,4,5,6,08/03/2009 2:34 AM,6
1,2,3,4,5,6,04/01/2010 3:45 AM,7
1,2,3,4,5,6,05/01/2013 4:56 AM,8
1,2,3,4,5,6,05/11/2013 12:03 PM,9

the output produced is:

Code:
1,2,3,4,5,6,03/02/2009,5
1,2,3,4,5,6,08/03/2009,6
1,2,3,4,5,6,12/20/2009,4
1,2,3,4,5,6,03/11/2010,2
1,2,3,4,5,6,05/11/2013,9

which drops the lines shown in red in the input because they are in the date exclusion range.
The Following User Says Thank You to Don Cragun For This Useful Post:
Yoda (07-20-2013)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Sort data by date and then search by column samrat dutta Shell Programming and Scripting 2 05-17-2013 11:23 AM
Sort csv file by duplicated column value jl487 UNIX for Dummies Questions & Answers 3 04-26-2013 02:56 AM
Use sed on column (csv) file if data in colmns is greater > than? Chris Eagleson UNIX for Dummies Questions & Answers 2 10-17-2012 03:18 PM
How to insert a sequence number column inside a pipe delimited csv file using shell scripting? nithins007 Shell Programming and Scripting 5 09-25-2011 04:03 AM
CSV formatting with prefixing, appending and padding field meself Shell Programming and Scripting 7 01-13-2009 12:40 AM



All times are GMT -4. The time now is 08:11 AM.