Sponsored Content
Top Forums Shell Programming and Scripting Using grep and a parameter file to return unique values Post 302893573 by clippertm on Thursday 20th of March 2014 01:31:22 AM
Old 03-20-2014
Wrench Print unique values across all files

Hello Everyone!

I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18).

I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines:

Quote:
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=748|12=DEF|3078=7481564151|855=748
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=750|12=DEF|3078=41561|6321=748
5021=0|4=750|12=DEF|3078=7812|8412=748
5021=0|4=750|12=DEF|3078=121888|8855=748
5021=0|4=749|12=ABC|3078=12688|2222=748
5021=0|4=748|12=GHI|3078=812135|8745=748
5021=0|4=748|12=ABC|3078=812121|9647=748
5021=0|4=753|12=GHI|7444=748|3078=121888
My intention is to return one line per parameter match across all files.

The first parameter is: '4=[1 to 2000]'

The second parameter is: '3078='

So when grep, awk etc. finds a line that contains both '4=1' and '3078=' it prints the line, and start looking for a line that contains '4=2' and '3078='.

This across all the 500 files (-m 1 does not work in this case as 4=1 and 4=2 might be contained in 1 file and not in the 499 others).

Please also note that '4=[1 to 2000]' and '3078=' are not always at the same position in a line.

Can you please please please help me? I am at loss at what to do Smilie

Last edited by clippertm; 03-21-2014 at 06:33 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies

2. Shell Programming and Scripting

Unique values from a Terabyte File

Hi, I have been dealing with a files only a few gigs until now and was able to get out by using the sort utility. But now, I have a terabyte file which I want to filter out unique values from. I have a server having 8 processor and 16GB RAM with a 5 TB hdd. Is it worthwhile trying to use... (6 Replies)
Discussion started by: Legend986
6 Replies

3. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ... (1 Reply)
Discussion started by: shivi707
1 Replies

4. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

5. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

6. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Hi all, I have a huge csv file with the following format of data, Num SNPs, 549997 Total SNPs,555352 Num Samples, 157 SNP, SampleID, Allele1, Allele2 A001,AB1,A,A A002,AB1,A,A A003,AB1,A,A ... ... ... I would like to write out a list of unique SNP (column 1). Could you... (3 Replies)
Discussion started by: phoeberunner
3 Replies

7. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all... (4 Replies)
Discussion started by: Prega
4 Replies

8. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

9. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

10. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies
RRDGRAPH_DATA(1)						      rrdtool							  RRDGRAPH_DATA(1)

NAME
rrdgraph_data - preparing data for graphing in rrdtool graph SYNOPSIS
DEF:<vname>=<rrdfile>:<ds-name>:<CF>[:step=<step>][:start=<time>][:end=<time>][:reduce=<CF>] VDEF:vname=RPN expression CDEF:vname=RPN expression DESCRIPTION
These three instructions extract data values out of the RRD files, optionally altering them (think, for example, of a bytes to bits conversion). If so desired, you can also define variables containing useful information such as maximum, minimum etcetera. Two of the instructions use a language called RPN which is described in its own manual page. Variable names (vname) must be made up strings of the following characters "A-Z, a-z, 0-9, -,_" and a maximum length of 255 characters. When picking variable names, make sure you do not choose a name that is already taken by an RPN operator. A safe bet it to use lowercase or mixed case names for variables since operators will always be in uppercase. DEF
DEF:<vname>=<rrdfile>:<ds-name>:<CF>[:step=<step>][:start=<time>][:end=<time>][:reduce=<CF>] This command fetches data from an RRD file. The virtual name vname can then be used throughout the rest of the script. By default, an RRA which contains the correct consolidated data at an appropriate resolution will be chosen. The resolution can be overridden with the --step option. The resolution can again be overridden by specifying the step size. The time span of this data is the same as for the graph by default, you can override this by specifying start and end. Remember to escape colons in the time specification! If the resolution of the data is higher than the resolution of the graph, the data will be further consolidated. This may result in a graph that spans slightly more time than requested. Ideally each point in the graph should correspond with one CDP from an RRA. For instance, if your RRD has an RRA with a resolution of 1800 seconds per CDP, you should create an image with width 400 and time span 400*1800 seconds (use appropriate start and end times, such as "--start end-8days8hours"). If consolidation needs to be done, the CF of the RRA specified in the DEF itself will be used to reduce the data density. This behavior can be changed using ":reduce=<CF>". This optional parameter specifies the CF to use during the data reduction phase. Example: DEF:ds0=router.rrd:ds0:AVERAGE DEF:ds0weekly=router.rrd:ds0:AVERAGE:step=7200 DEF:ds0weekly=router.rrd:ds0:AVERAGE:start=end-1h DEF:ds0weekly=router.rrd:ds0:AVERAGE:start=11:00:end=start+1h VDEF
VDEF:vname=RPN expression This command returns a value and/or a time according to the RPN statements used. The resulting vname will, depending on the functions used, have a value and a time component. When you use this vname in another RPN expression, you are effectively inserting its value just as if you had put a number at that place. The variable can also be used in the various graph and print elements. Example: "VDEF:avg=mydata,AVERAGE" Note that currently only aggregation functions work in VDEF rpn expressions. Patches to change this are welcome. CDEF
CDEF:vname=RPN expression This command creates a new set of data points (in memory only, not in the RRD file) out of one or more other data series. The RPN instructions are used to evaluate a mathematical function on each data point. The resulting vname can then be used further on in the script, just as if it were generated by a DEF instruction. Example: "CDEF:mydatabits=mydata,8,*" About CDEF versus VDEF At some point in processing, RRDtool has gathered an array of rates ready to display. CDEF works on such an array. For example, CDEF:new=ds0,8,* would multiply each of the array members by eight (probably transforming bytes into bits). The result is an array containing the new values. VDEF also works on such an array but in a different way. For example, VDEF:max=ds0,MAXIMUM would scan each of the array members and store the maximum value. When do you use VDEF versus CDEF? Use CDEF to transform your data prior to graphing. In the above example, we'd use a CDEF to transform bytes to bits before graphing the bits. You use a VDEF if you want max(1,5,3,2,4) to return five which would be displayed in the graph's legend (to answer, what was the maximum value during the graph period). If you want to apply 'complex' operations to the result of a VDEF you have to use a CDEF again since VDEFs only look like RPN expressions, they aren't really. SEE ALSO
rrdgraph gives an overview of how rrdtool graph works. rrdgraph_data describes DEF,CDEF and VDEF in detail. rrdgraph_rpn describes the RPN language used in the ?DEF statements. rrdgraph_graph page describes all of the graph and print functions. Make sure to read rrdgraph_examples for tips&tricks. AUTHOR
Program by Tobias Oetiker <tobi@oetiker.ch> This manual page by Alex van den Bogaerdt <alex@vandenbogaerdt.nl> with corrections and/or additions by several people 1.4.3 2009-10-14 RRDGRAPH_DATA(1)
All times are GMT -4. The time now is 04:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy