Sponsored Content
Top Forums Shell Programming and Scripting Using grep and a parameter file to return unique values Post 302893573 by clippertm on Thursday 20th of March 2014 01:31:22 AM
Old 03-20-2014
Wrench Print unique values across all files

Hello Everyone!

I have updated the first post so that my intentions are easier to understand, and also attached sample files (post #18).

I have over 500 text files in a directory. Over 1 GB of data. The data in those files is organised in lines:

Quote:
5021=0|4=748|12=ABC|3078=7484561|4102=748
5021=0|4=749|12=ABC|3214=748|3078=7486512
5021=0|4=748|12=DEF|3078=7481564151|855=748
5021=0|4=750|12=ABC|987=748|3078=7481231
5021=0|4=750|12=DEF|3078=41561|6321=748
5021=0|4=750|12=DEF|3078=7812|8412=748
5021=0|4=750|12=DEF|3078=121888|8855=748
5021=0|4=749|12=ABC|3078=12688|2222=748
5021=0|4=748|12=GHI|3078=812135|8745=748
5021=0|4=748|12=ABC|3078=812121|9647=748
5021=0|4=753|12=GHI|7444=748|3078=121888
My intention is to return one line per parameter match across all files.

The first parameter is: '4=[1 to 2000]'

The second parameter is: '3078='

So when grep, awk etc. finds a line that contains both '4=1' and '3078=' it prints the line, and start looking for a line that contains '4=2' and '3078='.

This across all the 500 files (-m 1 does not work in this case as 4=1 and 4=2 might be contained in 1 file and not in the 499 others).

Please also note that '4=[1 to 2000]' and '3078=' are not always at the same position in a line.

Can you please please please help me? I am at loss at what to do Smilie

Last edited by clippertm; 03-21-2014 at 06:33 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting Unique values in a file

Hi, I have a file like this: Some_String_Here 123 123 123 321 321 321 3432 3221 557 886 321 321 I would like to find only the unique values in the files and get the following output: Some_String_Here 123 321 3432 3221 557 886 I am trying to get this done using awk. Can someone please... (5 Replies)
Discussion started by: Legend986
5 Replies

2. Shell Programming and Scripting

Unique values from a Terabyte File

Hi, I have been dealing with a files only a few gigs until now and was able to get out by using the sort utility. But now, I have a terabyte file which I want to filter out unique values from. I have a server having 8 processor and 16GB RAM with a 5 TB hdd. Is it worthwhile trying to use... (6 Replies)
Discussion started by: Legend986
6 Replies

3. UNIX Desktop Questions & Answers

Fetching unique values from file

After giving grep -A4 "feature 1," <file name> I have extracted the following text feature 1, subfeat 2, type 1, subtype 5, dump '30352f30312f323030392031313a33303a3337'H -- "05/01/2009 11:30:37" -- -- ... (1 Reply)
Discussion started by: shivi707
1 Replies

4. Shell Programming and Scripting

Comparing 2 files and return the unique lines in first file

Hi, I have 2 files file1 ******** 01-05-09|java.xls| 02-05-08|c.txt| 08-01-09|perl.txt| 01-01-09|oracle.txt| ******** file2 ******** 01-02-09|windows.xls| 02-05-08|c.txt| 01-05-09|java.xls| 08-02-09|perl.txt| 01-01-09|oracle.txt| ******** (8 Replies)
Discussion started by: shekhar_v4
8 Replies

5. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

6. Shell Programming and Scripting

return a list of unique values of a column from csv format file

Hi all, I have a huge csv file with the following format of data, Num SNPs, 549997 Total SNPs,555352 Num Samples, 157 SNP, SampleID, Allele1, Allele2 A001,AB1,A,A A002,AB1,A,A A003,AB1,A,A ... ... ... I would like to write out a list of unique SNP (column 1). Could you... (3 Replies)
Discussion started by: phoeberunner
3 Replies

7. Shell Programming and Scripting

How to count Unique Values from a file.

Hi I have the following info in a file - <Cell id="25D"/> <Cell id="26A"/> <Cell id="26B"/> <Cell id="26C"/> <Cell id="27A"/> <Cell id="27B"/> <Cell id="27C"/> <Cell id="28A"/> I would like to know how would you go about counting all... (4 Replies)
Discussion started by: Prega
4 Replies

8. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

9. UNIX for Dummies Questions & Answers

Grep to find matching patern and return unique values

Request: grep to find given matching patern and return unique values, eliminate the duplicate values I have to retrieve the unique folder on the below file contents like; /app/oracle/build_lib/pkg320.0_20120927 /app/oracle/build_lib/pkg320.0_20121004_prof... (5 Replies)
Discussion started by: Siva SQL
5 Replies

10. Shell Programming and Scripting

Extracting unique values of a column from a feed file

Hi Folks, I have the below feed file named abc1.txt in which you can see there is a title and below is the respective values in the rows and it is completely pipe delimited file ,. ... (4 Replies)
Discussion started by: punpun66
4 Replies
ABCMATCH(1)						      General Commands Manual						       ABCMATCH(1)

NAME
abcmatch - search sequences of notes in an ABC file SYNOPSIS
abcmatch data-file [number] [-a] [-br d] [-c] [-con] [-ign] [-length_hist] [-pitch_hist] [-qnt] [-r n] [-v] [-ver] DESCRIPTION
abcmatch searches an ABC file containing (potentially) many tunes for specific sequences of notes. For example, if you know a few bars of a tune, you can use this program to find the tune having this sequence and perhaps identify the tune. At a minimum, abcmatch requires two files. A template file called match.abc which contains the bars that you are searching for and a large file consisting of a hundred or more ABC tunes. The program automatically loads up the match.abc file and then scans every tune in the large file. Though the program can be run stand-alone, it is really meant to be run with a GUI such as runabc.tcl (which is not yet part of Debian). Most of its output is rather cryptic. THE MATCHING PROCESS
The template file must be a well-formed ABC file containing the basic X:, M:, L:, and K: headers as well as the bars to be matched. (Nor- mally, this file is created by runabc.tcl.) It is important to finish each bar in the match file with a vertical line. abcmatch uses the key signature to figure out the relative position of the notes in the scale, and to determine all the assumed sharps and flats. Therefore the program can find matching bars in a tune that has been transposed to another key, as long as the key difference is not too large. Matches are output in a list format looking like 29 30 4 30 31 4 Each line indicates a particular match found by the program. The first number on each line gives the relative position of the tune in the data-file, while the next number gives the X: number of that tune. The last number is the bar number of the matching tune. Bar numbers are counted sequentially from the start of the tune, and all V: and P: indications are ignored. That is, the bar number returned by abcmatch may not match bar numbers printed by one of the PostScript-producing ABC processors such as abcm2ps or abcmidi-yaps. For the purposes of matching, abcmatch ignores all guitar chords, lyrics, note decorations (e.g., staccato markings), grace notes, etc. In chords such as [G2c2], only the highest note is considered. Any warnings or error messages from the ABC parser are suppressed unless the -c option is given. OPTIONS
-a Report any matching bars. By default, if the template file contains a sequence of several bars, the program will try to find places in the data file where the whole sequence matches. With this option, it returns all places in the data file where any of the bars in the template file match. -br d `Brief mode' is designed to identify groups of tunes sharing common bars. In this mode, the program determines the number of all bars in each tune from the data file which are also present in the template file. If the number of common bars is greater than or equal to the value of the d parameter, the program reports the tune and the number of common bars. Currently there is no user con- trol of the matching criterion; the rhythm must match exactly, and the notes are transposed to suit the key signature. -c Display error and warning messages from the ABC parser (which are suppressed by default). -con Do a pitch contour match. In this case, the program uses the key signature only to indicate accidentals. The pitch contour is com- puted from the pitch difference (interval) between adjacent notes. That is, C2 DE, c2 de, and G2 AB all have the same pitch contour. -ign Ignore simple bars. -length_hist This does no matching at all but returns a histogram of the distribution of note lengths in the data file. The output looks like length histogram 12 100 24 20 36 6 48 2 72 4 where a quarter note is 24 units, an eight note 12 units, a dotted half note 72 units etc. -pitch_hist This does no matching at all but returns a histogram of the distribution of pitches in the data file. The output looks like pitch histogram 64 2 66 9 67 11 where the first number on each line is a MIDI note number and the second is a count of the number of times that note occurred. -qnt Do a quantized pitch contour match. This works as described above for the -con option, but will also quantize the intervals as fol- lows: Unison and semitone intervals are assigned value 0, major 2nds to major 3rds value 1, and a perfect 4th or greater value 2. Negative numbers are used for descending intervals. -r n Resolution for matching. If the n parameter is zero, a perfect match must be found, meaning that the lengths of each note in a bar must match exactly in order to be reported. The larger the value of n, the looser the match will be. Note lengths are converted into temporal units where a quarter note is normally assigned a value of 24 (therefore an eighth note has a value of 12, a sixteenth a value of 6, a half note a value of 48 etc.) If you specify a temporal resolution of 12, then the pitch values of the notes only need to match at the time units that are multiples of an eight note. This means that the program would match the two bars C2 D2 and C C D D, as well as C2 D2 and C/D/C/D/D2. By selecting a suitable value for n, you can search for matches only at the beginning of a mea- sure or at the beginning of each beat. -v Run verbosely. -ver Display the program's version number. LIMITATIONS
The program has some limitations. For example, the data file must contain bar lines, and tied notes cannot be longer than the equivalent of 8 quarter notes. A resolution (-r option) that is too small may cause some buffers to be exceeded. When there are differences of key signa- tures of more than 5 semitones, the program may transpose the notes in the wrong direction. Also, tunes with more than one key signature or time signature may not be processed correctly. SEE ALSO
abc2midi(1), midi2abc(1), mftext(1) AUTHOR
This manual page was written by Anselm Lingnau <lingnau@debian.org> for the GNU/Linux system. VERSION
This manual page describes abcmatch version 1.42 as of 21 December 2006. 24 August 2007 ABCMATCH(1)
All times are GMT -4. The time now is 10:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy