Dynamically accept search pattern and display lines based on it Post: 302777789

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

I want to search a file for a string and then if the string is found I need the line that the string is on - but also the previous two lines from the file (that the pattern will not be found in) This is on solaris Can you help?

2. Shell Programming and Scripting

Pattern matching in file and then display 10 lines above every time

hiii, i have to write a shell script like this---- i have a huge log file name abc.log .i have to search for a pattern name "pattern",it may occur 1000 times in the log file,every time it finds the pattern it should display the 10 lines above the pattern. I appericiate your help.

3. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Hi all, I am trying to extract the values ( text between the xml tags) based on the Order Number. here is the sample input <?xml version="1.0" encoding="UTF-8"?> <NJCustomer> <Header> <MessageIdentifier>Y504173382</MessageIdentifier> ...

4. Shell Programming and Scripting

Search and replace - pattern-based

Hey folks! I am new to shell-scripting, but I have a problem that I would like to solve using a script. I create very large html forms, used for randomized trials. In these forms, each question is supplied with a variable that looks something like this: PROJECT_formNN Where NN is the question...

5. Shell Programming and Scripting

Extracting few lines from a file based on identifiers dynamically

i have something like this in a file called mysqldump.sql -- -- Table structure for table `Table11` -- DROP TABLE IF EXISTS `Table11`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `Table11` ( `id` int(11) NOT NULL...

6. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made.

7. UNIX for Dummies Questions & Answers

Updating value based on search pattern

I have a file with following data <Field FieldName="CHCFA21_01_01" FieldType="Text"> <Output CapturedValue=""> <DataSource Name="" Value="" /> </Output> </Field> <Field FieldName="CHCFA21_01_02" FieldType="Date"> <Output CapturedValue=""> ...

8. Shell Programming and Scripting

Display 2 lines before and after a particular pattern

Hi team, Is it possible to display 2 lines after a particular pattern in a shell script. For example in a file which has the below contents. Mummy Daddy Son Daughter Children Aunty Uncle Grandma Grandpa Son Father Mother Brother-in-law I want to display 2 lines before and after...

9. Shell Programming and Scripting

Search pattern on logfile and search for day/dates and skip duplicate lines if any

Hi, I've written a script to search for an Oracle ORA- error on a log file, print that line and the .trc file associated with it as well as the dateline of when I assumed the error occured. In most it is the first dateline previous to the error. Unfortunately, this is not a fool proof script....

10. Shell Programming and Scripting

Grep pattern and display all lines below

Hi I need to grep for a patter and display all lines below the pattern. For ex: say my file has the below lines file1 file2 file3 file4 file5 I NEED to grep for patter file3 and display all lines below the pattern. do we have an option to get this data. Let me know if you require...

LEARN ABOUT DEBIAN

kinosearch1::docs::fileformat

KinoSearch1::Docs::FileFormat(3pm)			User Contributed Perl Documentation			KinoSearch1::Docs::FileFormat(3pm)

NAME

       KinoSearch1::Docs::FileFormat - overview of invindex file format

OVERVIEW

       It is not necessary to understand the guts of the Lucene-derived "invindex" file format in order to use KinoSearch1, but it may be helpful
       if you are interested in tweaking for high performance, exotic usage, or debugging and development.

       On a file system, all the files in an invindex exist in one, flat directory.  Conceptually, the files have a hierarchical relationship: an
       invindex is made up of "segments", each of which is an independent inverted index, and each segment is made up of several subsections.

	   [invindex]--|
		       |-"segments" file
		       |
		       |-[segments]------|
					 |--[seg _0]--|
					 |	      |--[postings]
					 |	      |--[stored fields]
					 |	      |--[deletions]
					 |
					 |--[seg _1]--|
					 |	      |--[postings]
					 |	      |--[stored fields]
					 |	      |--[deletions]
					 |
					 |--[ ... ]---|

       The "segments" file keeps a list of the segments that make up an invindex.  When a new segment is being written, KinoSearch1 may put files
       into the directory, but until the segments file is updated, a Searcher reading the index won't know about them.

       Each segment is an independent inverted index.  All the files which belong to a given segment share a common prefix which consists of an
       underscore followed by 1 or more decimal digits: _0, _67, _1058.  A fully optimized index has only a single segment.

       In theory there are many files which make up each segment.  However, when you look inside an invindex not in the process of being updated,
       you'll probably see only the segments file and files with either a .cfs or .del extension.  The .cfs file, a "compound" file which is
       consolidated when a segment is finalized, "contains" all the other per-segment files.

       Segments are written once, and with the exception of the deletions file, are never modified once written.  They are deleted when their data
       is written to new segments during the process of optimization.

A segment's component parts
       Each segment can be said to have four logical parts: postings, stored fields, the deletions file, and the term vectors data.

   Stored fields
       The stored fields are organized into two files.

       o   [seg_name].fdx - Field inDeX - pointers to field data

       o   [seg_name].fdt - Field DaTa - the actual stored fields

       When a document turns up as a hit in a search and must be retrieved, KinoSearch1 looks at the Field inDeX file to see where in the data
       file the document's stored fields start, then retrieves all of them from the .fdt file in one lump.

	   _1.fdx--|
		   |--[doc#0  =>   0]----->_1.fdt--|
		   |				   |--[bodytext]
		   |				   |--[title]
		   |				   |--[url]
		   |--[doc#1  => 305]----->_1.fdt--|		 # byte 305
		   |				   |--[bodytext]
		   |				   |--[title]
		   |				   |--[url]
		   |--[...]--------------->_1.fdt--|--[...]

       If a field is marked as "vectorized", its "term vectors" are also stored in the .fdx file.

   Postings
       "Posting" is a technical term from the field of Information Retrieval which refers to an single instance of a one term indexing one
       document.  If you are looking at the index in the back of a book, and you see that "freedom" is referenced on pages 8, 86, and 240, that
       would be three postings, which taken together form a "posting list".  The same terminology applies to an index in electronic form.

       The postings data is spread out over 4 main files (not including field normalization data, which we'll get to in a moment).  From lowest to
       highest in the hierarchy, they are...

       [seg_name].prx - PRoXimity data. A list of the positions at which terms appear in any given document.  The .prx file is just a raw stream
       of VInts; the document numbers and terms are implicitly indicated by files higher up the hierarchy.

       [seg_name].frq - FReQuency data for terms.  If a term has a frequency of 5 in a given document, that implies that there will be 5 entries
       in the .prx file.  The terms themselves are implicitly specified by the .tis file.

	   _1.frq--|
		   |--[doc#40 => 2]----->_1.prx--|--[54,107]
		   |--[doc#0  => 1]----->_1.prx--|--[6]
		   |--[doc#6  => 1]----->_1.prx--|--[504]
		   |--[doc#36 => 3]----->_1.prx--|--[2,33,747]
		   |--[...]------------->_1.frq--|--[...]

       [seg_name].tis - TermInfoS.  Among the items stored here is the term's doc_freq, which is the number of documents the term appears in.  If
       a term has a doc_freq of 22 in a given collection, that implies that there will be 22 corresponding entries in the .frq file.  Terms are
       ordered lexically, first by field, then by term text.

	   _1.tis--|
		   |--[...]----------------------->_1.frq--|--[...]
		   |--[bodytext:mule	  =>  1]-->_1.frq--|--[doc#40 => 2]
		   |--[bodytext:multitude =>  3]-->_1.frq--|--[doc#0  => 1]
		   |					   |--[doc#6  => 1]
		   |					   |--[doc#36 => 3]
		   |--[bodytext:navigate  =>  1]-->_1.frq--|--[doc#21 => 1]
		   |--[...]----------------------->_1.frq--|--[...]
		   |--[title:amendment	  => 27]-->_1.frq--|--[doc#21 => 1]
		   |					   |--[doc#22 => 1]
		   |--[...]----------------------->_1.frq--|--[...]

       [seg_name].tii - TermInfos Index.  This file, which is decompressed and loaded into RAM as soon as the IndexReader is initialized, contains
       a small subset of the .tis data, with pointers to locations in the .tis file.  It is used to locate the right general vicinity in the .tis
       file as quickly as possible.

	   _1.tii--|
		   |--[bodytext:a => 20]---------->_1.tis--|--[bodytext:a] # byte 20
		   |					   |--[bodytext:about]
		   |					   |--[bodytext:absolute]
		   |					   |--[...]
		   |--[bodytext:mule => 27065]---->_1.tis--|--[bodytext:mule]
		   |					   |--[bodytext:multitude]
		   |					   |--[...]
		   |--[title:amendment => 56992]-->_1.tis--|--[title:amendment]
							   |--[...]

       Here's a simplified version of how a search for "freedom" against a given segment plays out:

       1.  The searcher asks the .tii file, "Do you know anything about 'freedom'?"  The .tii file replies, "Can't say for sure, but if the .tis
	   file does, 'freedom' is probably somewhere around byte 21008".

       2.  The .tis file tells the searcher "Yes, we have 2 documents which contain 'freedom'.	You'll find them in the .frq file starting at byte
	   66991."

       3.  The .frq file says "document number 40 has 1 'freedom', and document 44 has 8.  If you need to know more, like if any 'freedom' is part
	   of the phrase 'freedom of speech', take a look at the .prx file starting at..."

       4.  If the searcher is only looking for 'freedom' in isolation, that's where it stops.  It already knows enough to assign the documents
	   scores against "freedom", with the 8-freedom document scoring higher than the single-freedom document.

   Deletions
       When a document is "deleted" from a segment, it is not actually purged from the postings data and the stored fields data right away; it is
       merely marked as "deleted", via the .del file.  The .del file contains a bit vector with one bit for each document in the segment; if bit
       #254 is set then document 254 is deleted, and if it turns up in a search it will be masked out.

       It is only when a segment's contents are rewritten to a new segment during the segment-merging process that deleted documents truly go
       away.

   Field Normalization Files
       For the sake of simplicity, the example search scenario above omits the role played the field normalization files, or "fieldnorms" for
       short.  These files have the (theoretical) suffix of .f followed by an integer -- .f0, .f1, etc.  Each segment contains one such file for
       every indexed field.

       By default, the fieldnorms' job is to make sure that a field which is 100 terms long and contains 10 mentions of the word 'freedom' scores
       higher than a field which also contains 10 mentions of the word 'freedom', but is 1000 terms in length.	The idea is that the higher the
       density of the desired term, the more relevant the document.

       The fieldnorms files contain one byte per document per indexed field, and all of them must be loaded into RAM before a search can be
       executed.

Document Numbers
       Document numbers are ephemeral.	 They change every time a document gets moved from one segment to a new one during optimization.  If you
       need to assign a primary key to each document, you need to create a field and populate it with an externally generated unique identifier.

Not compatible with Java Lucene
       The file format used by KinoSearch1 is closely related to the Lucene compound index format. (The technical specification for Lucene's file
       format is distributed along with Lucene.)  However, indexes generated by Lucene and KinoSearch1 are not compatible.

COPYRIGHT

       Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc.
       See KinoSearch1 version 1.00.

perl v5.14.2							    2011-11-15					KinoSearch1::Docs::FileFormat(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search file for pattern and grab some lines before pattern

Discussion started by: frustrated1

2. Shell Programming and Scripting

Pattern matching in file and then display 10 lines above every time

Discussion started by: namishtiwari

3. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Discussion started by: oky

4. Shell Programming and Scripting

Search and replace - pattern-based

Discussion started by: Roevhat