Removing repeating lines from a data frame (AWK)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing repeating lines from a data frame (AWK)
# 1  
Old 07-19-2011
Removing repeating lines from a data frame (AWK)

Hey Guys!

I have written a code which combines lots of files into one big file(.csv).

However, each of the original files had headers on the first line, and now that I've combined the files the headers are interspersed throughout the new combined data frame. For example, throughout the data I will have a line;

//DATE TIME

FRAC_DAYS_SINCE_JAN1

FRAC_HRS_SINCE_JAN1 EPOCH_TIME

ALARM_STATUS

with each of these headers in a different column.

Is there a way of removing these lines when these headers arise?

Thanks!

Last edited by gd9629; 07-19-2011 at 06:50 AM..
# 2  
Old 07-19-2011
Presuming your CSV file has all the columns from the various files alligned then would the following work
Code:
grep -v '^//DATE'

or is it more complex?
# 3  
Old 07-19-2011
Code:
awk '!/^\/\//' file

# 4  
Old 07-19-2011
sorry I edited my original post to make it clearer. Yes the .csv has all the files and headers aligned, I just need to remove all the text (except the very first line) in the .csv file so that I can process the data.

grep doesn't seem to work? as I'm editing the file using GAWK? I'm pretty new to programming so I'm not sure how to find my way around this apparent conflict lol.

bartus, that code just added the first two column headers together and put them after the rest of the other headers.

Any ideas?

Cheers!
# 5  
Old 07-19-2011
Can you post sample data showing some lines that should be kept and the header to be removed and the desired output for that sample? Use code tags for that please.
# 6  
Old 07-19-2011
My .csv file looks like below, with each header separated out into a separate column. And data
in the columns below. These headers are interspersed throughout the file.

Code:
//DATE TIME 	             FRAC_DAYS_SINCE_JAN1	EPOCH_TIME
25/06/2011 07:03	   175.2938079		        1308985385
25/06/2011 07:03	   175.2938657		        1308985390
25/06/2011 07:03	   175.2939236                  1308985395
25/06/2011 07:03	   175.2939815		        1308985400
25/06/2011 07:03	   175.2940394		        1308985405
25/06/2011 07:03	   175.2940972		        1308985410
25/06/2011 07:03	   175.2941551		        1308985415
25/06/2011 07:03	   175.294213		        1308985420

I want to remove all the interspersed headers from the file so that there is just numeric data.
Just like below

Code:
25/06/2011 07:03	   175.2938079		        1308985385
25/06/2011 07:03	   175.2938657		        1308985390
25/06/2011 07:03	   175.2939236                    1308985395
25/06/2011 07:03	   175.2939815		        1308985400
25/06/2011 07:03	   175.2940394		        1308985405
25/06/2011 07:03	   175.2940972		        1308985410
25/06/2011 07:03	   175.2941551		        1308985415
25/06/2011 07:03	   175.294213		        1308985420

So basically I need a way of 'cutting' out lines which begin with "//DATE TIME" per se.

Is there any way of doing this?

Thanks
# 7  
Old 07-19-2011
As you can see below my code is working as expected...
Code:
solaris% cat file
//DATE TIME                  FRAC_DAYS_SINCE_JAN1       EPOCH_TIME
25/06/2011 07:03           175.2938079                  1308985385
25/06/2011 07:03           175.2938657                  1308985390
25/06/2011 07:03           175.2939236                  1308985395
//DATE TIME                  FRAC_DAYS_SINCE_JAN1       EPOCH_TIME
25/06/2011 07:03           175.2939815                  1308985400
25/06/2011 07:03           175.2940394                  1308985405
25/06/2011 07:03           175.2940972                  1308985410
25/06/2011 07:03           175.2941551                  1308985415
25/06/2011 07:03           175.294213                   1308985420
solaris% awk '!/^\/\//' file                 
25/06/2011 07:03           175.2938079                  1308985385
25/06/2011 07:03           175.2938657                  1308985390
25/06/2011 07:03           175.2939236                  1308985395
25/06/2011 07:03           175.2939815                  1308985400
25/06/2011 07:03           175.2940394                  1308985405
25/06/2011 07:03           175.2940972                  1308985410
25/06/2011 07:03           175.2941551                  1308985415
25/06/2011 07:03           175.294213                   1308985420

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing certain lines from results - awk

im using the code below to monitor a file: gawk '{ a += gsub("(^| )accepted( |$)", "&") a += gsub("(^| )open database( |$)", "&") } END { for (i in a) printf("%s=%s\n", i, a) }' /var/log/syslog the code is searching the syslog file for the string "accepted" and "open... (2 Replies)
Discussion started by: SkySmart
2 Replies

2. Shell Programming and Scripting

awk : collecting all data between two time frame

Hi Experts , I need your help to collect the complete data between two time frame from the log files, when I try awk it's collecting the data only which is printed with time stamp for example, awk works well from "16:00 to 17:30" but its not collecting <line*> "from 17:30 to 18:00" ... (8 Replies)
Discussion started by: zenkarthi
8 Replies

3. Shell Programming and Scripting

perform actions at specific locations in data frame

Hi everyone, I got a data frame like the one below and and would like to do the following: Ignore the first 3 rows and check in all following rows the second position. If the value is >500, subtract 100. Example DF: ABC 22 DE 12 BCD 223 GH 12 EFG 2104 DH ... (4 Replies)
Discussion started by: TuAd
4 Replies

4. UNIX for Dummies Questions & Answers

Remove groups of repeating lines

I know uniq exists, but am not sure how to remove repeating lines when they are groups of two different lines repeating themselves, without using sort. I need them to be sorted in the original order, just to remove repeats. cd /media/AUDIO/WAVE/9780743518673/mp3 ~/Desktop/mp3-to-m4b... (1 Reply)
Discussion started by: glev2005
1 Replies

5. Shell Programming and Scripting

awk removing data before or after a pattern

I have the following data: 01:00:00 29 10 20 41 01:20:00 18 6 34 42 01:40:00 28 5 24 43 02:00:01 11 7 8 74 02:20:01 19 15 12 54 02:40:01 1 4 0 95 03:00:01 1... (6 Replies)
Discussion started by: BeefStu
6 Replies

6. UNIX for Dummies Questions & Answers

Extract repeating data from file

I want to extract the last rows of a data file, similar to that one below: C1 xxx C2 rrr C3 ttt .... Cn-1 hhh Cn bbb C1 yyy C2 sss C3 uuu ... Cn-1 iii Cn ccc ... I just want to extract the final rows between C1 and Cn at each data file. n is not a constant,... (2 Replies)
Discussion started by: natasha
2 Replies

7. Shell Programming and Scripting

Merging non-repeating columns of lines

Hello, I have file to work with. It has 5 columns. The first three, altogether, constitutes the position. The 4th column contains some values for downstream analysis and the fifth column contains some values that I want to add to 4th column (only if they happen to be in the same position). My... (5 Replies)
Discussion started by: menenuh
5 Replies

8. UNIX for Advanced & Expert Users

removing frame charecters

Hi I have a requirement as follows. My Input file is as follows. COL1,COL2,COL3,COL4,COL5 987,2,3~7~5,400~468~598,0005~4687~5980 1111,2,2~7,400~468,0005~897 Expected OUTPUT ============ COL1,COL2,COL3,COL4,COL5 987,2,3,400,0005 987,2,7,468,4687 987,2,5,598,5980 1111,2,2,400,0005... (6 Replies)
Discussion started by: tkbharani
6 Replies

9. Shell Programming and Scripting

frame multiple lines into one

Hi, i have a file with contents like below ( any number of entries can be there) 111 222 333 444 555 i need to make another file with single line like below: 111,222,333,444,555 (without ending , ) TIA Prvn (8 Replies)
Discussion started by: prvnrk
8 Replies

10. UNIX for Dummies Questions & Answers

Omit repeating lines

Can someone help me with the following 2 objectives? 1) The following command is just an example. It gets a list of all print jobs. From there I am trying to extract the printer name. It works with the following command: lpstat -W "completed" -o | awk -F- '{ print $1}' Problem is, I want... (6 Replies)
Discussion started by: TheCrunge
6 Replies
Login or Register to Ask a Question