Visit Our UNIX and Linux User Community


How to extract specific data and count number containing sets from a file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract specific data and count number containing sets from a file?
# 1  
Old 08-04-2010
Question How to extract specific data and count number containing sets from a file?

Hello everybody!

I am quit new here and hope you can help me.

Using an awk script I am trying to extract data from several files. The structure of the input files is as follows:

TimeStep parameter1 parameter2 parameter3 parameter4

e.g.

1 X Y Z L
1 D H Z I
1 H Y E W
2 D H G F
2 R T U V
3
.
.
.

I would like to count the number entries from time step 1, 2, 3... containing certain parameters, respectively.

I already wrote a script that is extracting entries with specific parameters. However I am still struggling with getting with the information how many entries of time step 1, 2 and 3 contain e.g. parameter Z in $4 (2 for time step 1 and 0 for all other in the example above).

I Would prefer to do everything within a single script awk because I got a lot of data and later only want to change the parameter selection.

I already tried to do it within for and while loops but it did not work as I wanted it to... well I am just starting with awkSmilie

Thanks for your help guys!
# 2  
Old 08-04-2010
Hi

Not sure whether I understood you correctly.

Code:
# awk '{if ($x==y)a[$1]++;else a[$1]+=0;}END{for (i in a)print i,a[i]}' x=4 y=Z file
1 2
2 0
#

where x represents the column number in which you want to search, Y represents the parameter you want to search.

Guru.
# 3  
Old 08-05-2010
Thanks a lot Guru it almost doing what I wanted Smilie


I use the following script to calculate the number of entries in each $1= 1, 2, 3... consistent with the defined values for parameter A, B, C and D.
Code:
BEGIN     {     
    r=5; #parameterA
    x=9; #parameterB
    solv2="TFE"; #parameterC
    solv1="TIP3"; #parameterD
               }


/CA 1/ #pattern for row to earch in. in order to skip header of the file

    {if ( ($9*1 < r) && (( $7 ~ solv1 )||($7 ~ solv2)) )    a[$1]++;else a[$1]+=0;} #$9*1 to avoid wrong counting that occured sometimes



END{for (i in a)print i,a[i]}

the output is something like this
Code:
.
.
.
 90 CA 1 67 18 5744 TFE O1 8.17278
 90 CA 1 67 19 6988 TFE O1 8.51086
 90 CA 1 67 20 7806 TIP3 OH2 4.75067
 90 CA 1 67 21 10479 TIP3 OH2 4.67777
 90 CA 1 67 22 10845 TIP3 OH2 7.16528
 90 CA 1 67 23 11554 TIP3 OH2 4.19535
10 7
11 6
12 7
13 12
14 6
15 6
NAME 0
16 8
30 4

.
.
.

So it is messing up the order and including parts of the header in the output and printing the whole inputfile at the beginning... I am confusedSmilie

The input file looks like this
Code:
 NAME DWT26R1_CA1_PEP1.DAT
 FRAMES[PS] 5000
 SKIPPED 500
 STEP 50
 PROCESSED 90
 1 CA 1 98 1 2643 TFE F21 9.5831
 1 CA 1 98 2 2654 TFE O1 6.25134
 1 CA 1 98 3 2681 TFE O1 5.01697
 1 CA 1 98 4 2751 TFE O1 6.45506
 1 CA 1 98 15 5702 TFE O1 9.63541
 1 CA 1 98 16 6096 TFE O1 4.69877
 1 CA 1 98 17 6337 TFE O1 6.64662
 1 CA 1 98 18 8167 TIP3 OH2 5.73264 
 2 CA 1 103 18 6096 TFE O1 6.27655
 2 CA 1 103 19 6337 TFE O1 8.68132
 2 CA 1 103 20 8167 TIP3 OH2 3.85201
 2 CA 1 103 21 8178 TIP3 OH2 7.49269
 2 CA 1 103 22 8481 TIP3 OH2 6.79798
 2 CA 1 103 23 8591 TIP3 OH2 3.98057
 2 CA 1 103 24 9917 TIP3 OH2 5.53047
.
.
.

Cheers,
Daniel

---------- Post updated at 06:18 AM ---------- Previous update was at 02:41 AM ----------

I managed to get the output in the correct order by changing the lor loop:

Code:
END {for ( i=1; i<100; i++) print i,a[i]}

However I have still have a question

I have several input files representing successive data sets. However the time step ($1) starts for each file with 1. I need to continue increasing that value instead of starting at 1 again with reading from a new file.

cheers,
daniel

Last edited by Daniel8472; 08-05-2010 at 04:58 AM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #875
Difficulty: Medium
The primary mission of UNIX.com is to be a shell script writing forum.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How-To Extract specific data from a file.

data.txt has several information like the below.. <SERVER>:WEB:MYDOM01:/tmp/cong/MYDOM01,/tmp/app/MYDOM01 <WEBER>:CANES:https-web01,https-web02:/web/apps/https-web01/config <SERVER>:WEB:MYDOM07:/tmp/cong/MYDOM07,/tmp/app/MYDOM07... (7 Replies)
Discussion started by: mohtashims
7 Replies

2. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

3. Shell Programming and Scripting

how to count how many subdirectory containing more than a certain number of specific file type

hi I want to write a script which count the number of subdirectories in the current root directory that contain more than a specified number of files of a specific type. Is there an easy way to do this? Thanks Robert (2 Replies)
Discussion started by: piynik
2 Replies

4. Shell Programming and Scripting

Extract string from multiple file based on line count number

Hi, I search all forum, but I can not find solutions of my problem :( I have multiple files (5000 files), inside there is this data : FILE 1: 1195.921 -898.995 0.750312E-02-0.497526E-02 0.195382E-05 0.609417E-05 -2021.287 1305.479-0.819754E-02 0.107572E-01 0.313018E-05 0.885066E-05 ... (15 Replies)
Discussion started by: guns
15 Replies

5. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

6. UNIX for Dummies Questions & Answers

Extract a specific number from an XML file based on the start and end tags

Hello People, I have the following contents in an XML file ........... ........... .......... ........... <Details = "Sample Details"> <Name>Bob</Name> <Age>34</Age> <Address>CA</Address> <ContactNumber>1234</ContactNumber> </Details> ........... ............. .............. (4 Replies)
Discussion started by: sushant172
4 Replies

7. Shell Programming and Scripting

Extract data into file with specific field specs

:confused: I have a tab delimited file that I need to extract data from and into a file with specific field specs. Each field has to be a certain amount of characters. So, the name field (from delimited file) might have only 15 characters but needs to be 25 (in new file) so I need to insert spaces... (5 Replies)
Discussion started by: criddel
5 Replies

8. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

9. Shell Programming and Scripting

Extract data from log file from or after the specific date

Hi , I am having a script which will start a process and appends the process related logs to a log file. The log file writes logs with every line starting with date in the format of: date +"%Y %b %d %H:%M:%S". So, in the script, before I start the process, I am storing the date as DATE=`date +"%Y... (5 Replies)
Discussion started by: chiru_h
5 Replies

10. Shell Programming and Scripting

extract specific data from xml format file.

Hi, I need to extract the start time value (bold, red font) under the '<LogEvent ID="Timer Start">' tag (black bold) from a file with the following pattern. There are other LogEventIDs listed in the file as well, making it harder for me to extract out the specific start time that I need. . .... (7 Replies)
Discussion started by: 60doses
7 Replies

Featured Tech Videos