It's not working for me, maybe cos I'm an idiot.

I'm writing my code in notepad++ and running it through a shell. I'm using a .sh file to combine all my .dat files into one big .csv file, then a GAWK file to edit the format.

.sh file

# input files from each day of data in june and combine into one big file
find /u/Picarro/DataLog/2011/june -type f -name "*dat" -exec cat {} > june.dat \;

#use new combined data as input file
# the csv file to create for all data called 'june.csv' in the respective directory

# gawk files to create csv file

#produce the OUT file from the IN file(s)
$GAWK $IN_all > $OUT_all

GAWK file
#!/bin/gawk -f

# This file is to restructure the picarro data into the correct .csv columns for R

# create a header with same headings as variable in table
# also set other variables before parsing data
        OFS="," 	# tells awk that the output separator is a comma
        ORS=""  	# tells awk to not print newline after each print command so all records are
					# on the same line until we want a new line "\n"			
		getline}	# removes 1st line of input file ie header so we can replace it with correct one

# rearrange the yyyy-mm-dd | hh:mm:ss date and time to single date column of yyyy/mm/dd hh:mm:ss needed for openair
{print substr($1,9,2) "/" substr($1,6,2) "/" substr($1,1,4) " " substr($2,1,5)}		

# print the rest of variables as columns 
{print (" ", $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20)

$1 $2 != prev {
	{print "\n"}	# newline after each 5 seconds of data has been parsed
; prev=$1 $2}

I've tried putting the code you gave me into various places in the GAWK file but it doesn't seem to work.

Where am I going wrong Smilie
Don't append it to your code, just run that line on the file that you need to clean headers from. Put this after the find line in your script:
awk '!/^\/\//' /u/Picarro/DataLog/2011/june/june.dat > /u/Picarro/DataLog/2011/june/june.dat.tmp
mv /u/Picarro/DataLog/2011/june/june.dat.tmp /u/Picarro/DataLog/2011/june/june.dat

Thanks for the help bartus, still not working. The headers are still interspersed throughout the data frame. I've tried putting the new code you gave me in different places too but it doesn't do anything to the file.

Any way around this?
that smells like some windows mess... can you please post the output of this:
head -1 /u/Picarro/DataLog/2011/june/june.dat | od -c

which will show what exactly are the header characters

---------- Post updated at 01:53 AM ---------- Previous update was at 01:44 AM ----------

And you could try this to strip the headers
awk  '!/\/\/ *DATE .*/' /u/Picarro/DataLog/2011/june/june.dat

Looking at your awk script, I think this might be the culprit:
# print the rest of variables as columns 
{print (" ", $3, ...

If you have a space there at the beginning, than the regex in awk will not match that line.
When you pipe the header through 'od', as i suggested, it will show

Hey mirni,

when I put in the code you gave me

head -1 /u/Picarro/DataLog/2011/june/june.dat | od -c

it returned
0000000   D   A   T   E                                                
0000020                                           T   I   M   E        
0000060                   F   R   A   C   _   D   A   Y   S   _   S   I
0000100   N   C   E   _   J   A   N   1                           F   R
0000120   A   C   _   H   R   S   _   S   I   N   C   E   _   J   A   N
0000140   1                               E   P   O   C   H   _   T   I
0000160   M   E                                                        
0000200           A   L   A   R   M   _   S   T   A   T   U   S        
0000220                                                   s   p   e   c
0000240   i   e   s                                                    
0000260                           s   o   l   e   n   o   i   d   _   v
0000300   a   l   v   e   s                                            
0000320   M   P   V   P   o   s   i   t   i   o   n                    
0000340                                           O   u   t   l   e   t
0000360   V   a   l   v   e                                            
0000400                   C   a   v   i   t   y   P   r   e   s   s   u
0000420   r   e                                                   C   a
0000440   v   i   t   y   T   e   m   p                                
0000460                                   W   a   r   m   B   o   x   T
0000500   e   m   p                                                    
0000520           E   t   a   l   o   n   T   e   m   p                
0000540                                                   D   a   s   T
0000560   e   m   p                                                    
0000600                           C   O   2   _   s   y   n   c        
0000640   C   O   2   _   d   r   y   _   s   y   n   c                
0000660                                           C   H   4   _   s   y
0000700   n   c                                                        
0000720                   C   H   4   _   d   r   y   _   s   y   n   c
0000740                                                           H   2
0000760   O   _   s   y   n   c                                        
0001000                                  \r  \n

those are the headers that are interspersed throughout the combined .dat file. Is this bad?
I don't see the "//" characters in from of the header. Can you also post output of:
head -1 /u/Picarro/DataLog/2011/june/june.dat | cat -Te

The "'DATE TIME" header is in the .csv file, I think it appears once I combined the two columns using the GAWK code I posted above.

The result from
head -1 /u/Picarro/DataLog/2011/june/june.dat | cat -Te

The result
DATE                      TIME                      FRAC_DAYS_SINCE_JAN1      FRAC_HRS_SINCE_JAN1       EPOCH_TIME                ALARM_STATUS              species                   solenoid_valves           MPVPosition               OutletValve               CavityPressure            CavityTemp                WarmBoxTemp               EtalonTemp                DasTemp                   CO2_sync                  CO2_dry_sync              CH4_sync                  CH4_dry_sync              H2O_sync                  ^M$

To clarify,

The headers shown above are in the .dat files (both single files and big combined file)

The "// DATE TIME" (replaces the separate "DATE" and "TIME" header into one column) header arises in the .csv file, after the .dat file has been 'GAWKed'.

Does that help?
