Visit Our UNIX and Linux User Community


How to extract specific data and count number containing sets from a file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract specific data and count number containing sets from a file?
# 1  
Old 08-04-2010
Question How to extract specific data and count number containing sets from a file?

Hello everybody!

I am quit new here and hope you can help me.

Using an awk script I am trying to extract data from several files. The structure of the input files is as follows:

TimeStep parameter1 parameter2 parameter3 parameter4

e.g.

1 X Y Z L
1 D H Z I
1 H Y E W
2 D H G F
2 R T U V
3
.
.
.

I would like to count the number entries from time step 1, 2, 3... containing certain parameters, respectively.

I already wrote a script that is extracting entries with specific parameters. However I am still struggling with getting with the information how many entries of time step 1, 2 and 3 contain e.g. parameter Z in $4 (2 for time step 1 and 0 for all other in the example above).

I Would prefer to do everything within a single script awk because I got a lot of data and later only want to change the parameter selection.

I already tried to do it within for and while loops but it did not work as I wanted it to... well I am just starting with awkSmilie

Thanks for your help guys!
# 2  
Old 08-04-2010
Hi

Not sure whether I understood you correctly.

Code:
# awk '{if ($x==y)a[$1]++;else a[$1]+=0;}END{for (i in a)print i,a[i]}' x=4 y=Z file
1 2
2 0
#

where x represents the column number in which you want to search, Y represents the parameter you want to search.

Guru.
# 3  
Old 08-05-2010
Thanks a lot Guru it almost doing what I wanted Smilie


I use the following script to calculate the number of entries in each $1= 1, 2, 3... consistent with the defined values for parameter A, B, C and D.
Code:
BEGIN     {     
    r=5; #parameterA
    x=9; #parameterB
    solv2="TFE"; #parameterC
    solv1="TIP3"; #parameterD
               }


/CA 1/ #pattern for row to earch in. in order to skip header of the file

    {if ( ($9*1 < r) && (( $7 ~ solv1 )||($7 ~ solv2)) )    a[$1]++;else a[$1]+=0;} #$9*1 to avoid wrong counting that occured sometimes



END{for (i in a)print i,a[i]}

the output is something like this
Code:
.
.
.
 90 CA 1 67 18 5744 TFE O1 8.17278
 90 CA 1 67 19 6988 TFE O1 8.51086
 90 CA 1 67 20 7806 TIP3 OH2 4.75067
 90 CA 1 67 21 10479 TIP3 OH2 4.67777
 90 CA 1 67 22 10845 TIP3 OH2 7.16528
 90 CA 1 67 23 11554 TIP3 OH2 4.19535
10 7
11 6
12 7
13 12
14 6
15 6
NAME 0
16 8
30 4

.
.
.

So it is messing up the order and including parts of the header in the output and printing the whole inputfile at the beginning... I am confusedSmilie

The input file looks like this
Code:
 NAME DWT26R1_CA1_PEP1.DAT
 FRAMES[PS] 5000
 SKIPPED 500
 STEP 50
 PROCESSED 90
 1 CA 1 98 1 2643 TFE F21 9.5831
 1 CA 1 98 2 2654 TFE O1 6.25134
 1 CA 1 98 3 2681 TFE O1 5.01697
 1 CA 1 98 4 2751 TFE O1 6.45506
 1 CA 1 98 15 5702 TFE O1 9.63541
 1 CA 1 98 16 6096 TFE O1 4.69877
 1 CA 1 98 17 6337 TFE O1 6.64662
 1 CA 1 98 18 8167 TIP3 OH2 5.73264 
 2 CA 1 103 18 6096 TFE O1 6.27655
 2 CA 1 103 19 6337 TFE O1 8.68132
 2 CA 1 103 20 8167 TIP3 OH2 3.85201
 2 CA 1 103 21 8178 TIP3 OH2 7.49269
 2 CA 1 103 22 8481 TIP3 OH2 6.79798
 2 CA 1 103 23 8591 TIP3 OH2 3.98057
 2 CA 1 103 24 9917 TIP3 OH2 5.53047
.
.
.

Cheers,
Daniel

---------- Post updated at 06:18 AM ---------- Previous update was at 02:41 AM ----------

I managed to get the output in the correct order by changing the lor loop:

Code:
END {for ( i=1; i<100; i++) print i,a[i]}

However I have still have a question

I have several input files representing successive data sets. However the time step ($1) starts for each file with 1. I need to continue increasing that value instead of starting at 1 again with reading from a new file.

cheers,
daniel

Last edited by Daniel8472; 08-05-2010 at 04:58 AM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #131
Difficulty: Easy
UNIX certification is based on the 'Single Unix Specification' which is an extension of IEEE 1003 (POSIX).
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How-To Extract specific data from a file.

data.txt has several information like the below.. <SERVER>:WEB:MYDOM01:/tmp/cong/MYDOM01,/tmp/app/MYDOM01 <WEBER>:CANES:https-web01,https-web02:/web/apps/https-web01/config <SERVER>:WEB:MYDOM07:/tmp/cong/MYDOM07,/tmp/app/MYDOM07... (7 Replies)
Discussion started by: mohtashims
7 Replies

2. Shell Programming and Scripting

Skip the delimiter with in double quotes and count the number of delimiters during data extract

Hi All, I'm stuck-up in finding a way to skip the delimiter which come within double quotes using awk or any other better option. can someone please help me out. Below are the details: Delimited: | Sample data: 742433154|"SYN|THESIS MED CHEM PTY.... (2 Replies)
Discussion started by: BrahmaNaiduA
2 Replies

3. Shell Programming and Scripting

how to count how many subdirectory containing more than a certain number of specific file type

hi I want to write a script which count the number of subdirectories in the current root directory that contain more than a specified number of files of a specific type. Is there an easy way to do this? Thanks Robert (2 Replies)
Discussion started by: piynik
2 Replies

4. Shell Programming and Scripting

Extract string from multiple file based on line count number

Hi, I search all forum, but I can not find solutions of my problem :( I have multiple files (5000 files), inside there is this data : FILE 1: 1195.921 -898.995 0.750312E-02-0.497526E-02 0.195382E-05 0.609417E-05 -2021.287 1305.479-0.819754E-02 0.107572E-01 0.313018E-05 0.885066E-05 ... (15 Replies)
Discussion started by: guns
15 Replies

5. Shell Programming and Scripting

Extract specific data content from a long list of data

My input: Data name: ABC001 Data length: 1000 Detail info Data Direction Start_time End_time Length 1 forward 10 100 90 1 forward 15 200 185 2 reverse 50 500 450 Data name: XFG110 Data length: 100 Detail info Data Direction Start_time End_time Length 1 forward 50 100 50 ... (11 Replies)
Discussion started by: patrick87
11 Replies

6. UNIX for Dummies Questions & Answers

Extract a specific number from an XML file based on the start and end tags

Hello People, I have the following contents in an XML file ........... ........... .......... ........... <Details = "Sample Details"> <Name>Bob</Name> <Age>34</Age> <Address>CA</Address> <ContactNumber>1234</ContactNumber> </Details> ........... ............. .............. (4 Replies)
Discussion started by: sushant172
4 Replies

7. Shell Programming and Scripting

Extract data into file with specific field specs

:confused: I have a tab delimited file that I need to extract data from and into a file with specific field specs. Each field has to be a certain amount of characters. So, the name field (from delimited file) might have only 15 characters but needs to be 25 (in new file) so I need to insert spaces... (5 Replies)
Discussion started by: criddel
5 Replies

8. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

9. Shell Programming and Scripting

Extract data from log file from or after the specific date

Hi , I am having a script which will start a process and appends the process related logs to a log file. The log file writes logs with every line starting with date in the format of: date +"%Y %b %d %H:%M:%S". So, in the script, before I start the process, I am storing the date as DATE=`date +"%Y... (5 Replies)
Discussion started by: chiru_h
5 Replies

10. Shell Programming and Scripting

extract specific data from xml format file.

Hi, I need to extract the start time value (bold, red font) under the '<LogEvent ID="Timer Start">' tag (black bold) from a file with the following pattern. There are other LogEventIDs listed in the file as well, making it harder for me to extract out the specific start time that I need. . .... (7 Replies)
Discussion started by: 60doses
7 Replies

Featured Tech Videos