Sponsored Content
Full Discussion: awk/sed for parsing file
Top Forums Shell Programming and Scripting awk/sed for parsing file Post 302262011 by radoulov on Wednesday 26th of November 2008 07:16:27 AM
Old 11-26-2008
Quote:
Originally Posted by subin_bala
And can u pls expain simply how code is working ??
Yes.
It's an AWK script:

Code:
awk '...' var1=value [var2=value ... varn=value] inputfile(s)

var=value assigns a value to a variable var accessible inside the AWK code.
So id=123 is the desired id to be passed to the program.

Following the code logic we have:

1. Construct the logical record (r) by concatenating all the records seen so far:

record r -> if record is not empty: r ? -> add a record separator (RS, newline by default) and the current record ($0): r RS $0, else (record is empty, it's the first access -> assign the value of the current record: : $0
This is the meaning of the following expression:

Code:
{ r = r ? r RS $0 : $0 }

2. Check if the current record matches the pattern "cm:" followed by the value of the variable id (see above): $0 ~ "cm:" id. If the test returns true, auto increment the value of the variable f (f for flag, marker): { f++ }.

Code:
$0 ~ "cm:" id { f++ }

3. If the current record does not match the pattern ^[\t ] : the line does not begin with a blank character (tab or space), these are your E, D etc records, do the following:
- check if the value of the variable f in Boolean context returns true (is not an empty string or has a numeric value 0): if it's true (not 0, see 2. above), this logical record contains our id, so we print it: print r.
- reset the r and the f variables, we will initialize them after if needed.

Code:
!/^[\t ]/ {
  if (f) print r
  r = f = 0
  }

4. After reading the entire input check if we have something to print.
This is because of the build (r) -> set (f) -> check after (!/^[\t ]/) logic:
we print the previous when we reach the current. So without the END block we may miss the last one.

Code:
END {
  if (f) print r
  }

Hope this helps.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk sed parsing

hi , i would like to parse some file with the fallowing data : data data data "unwanted data" data data "unwanted data" data data data data #unwanted data. what i want it to have any coments between "" and after # to be erased using awk or/and sed. has anyone an idea? thanks. (3 Replies)
Discussion started by: Darsh
3 Replies

2. Shell Programming and Scripting

parsing xml with awk/sed

Hi people!, I need extract from the file (test-file.txt) the values between <context> and </context> tag's , the total are 7 lines,but i can only get 5 or 2 lines!!:confused: Please look my code: #awk '/context/{flag=1} /\/context/{flag=0} !/context/{ if (flag==1) p rint $0; }'... (3 Replies)
Discussion started by: ricgamch
3 Replies

3. Shell Programming and Scripting

Parsing a file (sed/awk?)

Hello people, newbie question. I'm trying to parse these type of file 1 "CAR " " C1 " " " 6 0 C1 2 "CAR " " O1A" " " 8 0 O1A 3 "CAR " " O1B" " " 8 -1 O1B 4 "CAR " " C2 " " " 6 0 C2 5 "CAR " " C3 " " " 6 ... (10 Replies)
Discussion started by: aristegui
10 Replies

4. Shell Programming and Scripting

String parsing with awk/sed/?

If I have a string that has some name followed by an ID#(ex.B123456) followed by some more #'s and/or letters, would it be possible to just grab the ID portion of this string? If so how? I am pretty new with these text tools so any help is appreciated. Example: "Name_One-B123456A-12348A" (2 Replies)
Discussion started by: airon23bball
2 Replies

5. Shell Programming and Scripting

Line Parsing using sed and awk

Hi Guys, I need help with processing data in a file, line by line. My file test.txt has X_Building_X5946/X0 BUT/U_msp/RdBuMon_d2_B_00 BUT/U_msp/FfRmDaMix_d2_Pi3 Test_Long xp=849.416 yp=245.82 xn=849.488 yn=245.82 w=0.476 l=0.072 fault_layer="Al_T01_Mod" $ $X=849416 $Y=245582... (2 Replies)
Discussion started by: naveen@
2 Replies

6. Shell Programming and Scripting

Another parsing line awk or sed problem

Hi, After looking on different forums, I'm still in trouble to parse a parameters line received in KSH. $* is equal to "/AAA:111 /BBB:222 /CCC:333 /DDD:444" I would like to parse it and be able to access anyone from his name in my KSH after. like echo myArray => display 111 ... (1 Reply)
Discussion started by: RickTrader
1 Replies

7. Shell Programming and Scripting

Parsing with awk or sed

I want to delete corrupt records from a file through awk or sed. Can anyone help me with this Thanks Striker Change subject to a descriptive one, ty. (1 Reply)
Discussion started by: Rahul_us
1 Replies

8. UNIX for Advanced & Expert Users

Parsing through a file with awk/sed

I don't necessary have a problem, as I have a solution. It is just that there may be a better solution. GOAL: Part one: Parse data from a file using the "\" as a delimiter and extracting only the last delimiter. Part two: Parse same file and extract everything but the last delimited item. ... (8 Replies)
Discussion started by: OrangeYaGlad
8 Replies

9. Shell Programming and Scripting

awk/sed line parsing

I'm new to shell programming, but I think I learn best by following an example. I'm trying to cook up an awk/sed script, but I obviously lack the required syntax skills to achieve it. The output that I get from running my ksh script looks like this: I need to search each numbered line for... (10 Replies)
Discussion started by: iskatel
10 Replies

10. UNIX for Advanced & Expert Users

Interesting awk/Perl/sed parsing challenge

I have a log with entries like: out/target/product/imx53_smd/obj/STATIC_LIBRARIES/libwebcore_intermediates/Source/WebCore/bindings/V8HTMLVideoElement.cpp : target thumb C++: libwebcore <=... (8 Replies)
Discussion started by: glev2005
8 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 07:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy