Sponsored Content
Full Discussion: Extract lines in awk
Top Forums UNIX for Dummies Questions & Answers Extract lines in awk Post 302798041 by Yoda on Tuesday 23rd of April 2013 04:47:07 PM
Old 04-23-2013
For 1st condition:
Code:
awk -v st="ASBC, 1845," 'BEGIN{p=1}{n=match($0,st);if(n) p=0}n||p' file

For 2nd condition:
Code:
awk -v st="ASBC, 1845," 'BEGIN{p=0}{n=match($0,st);if(n) p=1}!n&&p' file

This User Gave Thanks to Yoda For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines

Hi, I have a text file with 15 columns and i want to extract those lines of which 7th column is ABCD. I think we can do this using awk but could not frame the command. Please help. TIA Prvn (2 Replies)
Discussion started by: prvnrk
2 Replies

2. Shell Programming and Scripting

Need awk to extract lines and sort

Hi, My data looks like this. CHR SNP BP A1 TEST NMISS OR STAT P 0 SNP_A-8282315 0 2 ADD 1530 1.074 0.7707 0.4409 0 SNP_A-8282315 0 2... (11 Replies)
Discussion started by: genehunter
11 Replies

3. Shell Programming and Scripting

AWK: How to extract text lines between two strings

Hi. I have a text test1.txt file like:Receipt Line1 Line2 Line3 End Receipt Line4 Line5 Line6 Canceled Receipt Line7 Line8 Line9 End (9 Replies)
Discussion started by: TQ3
9 Replies

4. Shell Programming and Scripting

Need to extract some lines from output via AWK

Hello Friends, I have got, this output below and i want to extract the name of symlink which is highlighted in red and the path above it highlighted in blue. At the end i want to append path and symlink. /var/tmp/asirohi/jdk/jre /var/tmp/asirohi/jdk/jre/.systemPrefs... (3 Replies)
Discussion started by: asirohi
3 Replies

5. Shell Programming and Scripting

Awk to extract lines with a defined number of characters

This is my problem, my file (file A) contains the following information: Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID: Then, I need to compare the entries and determine their frequency. Thus, I... (7 Replies)
Discussion started by: Xterra
7 Replies

6. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Discussion started by: genzo
5 Replies

7. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

8. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

9. UNIX for Dummies Questions & Answers

awk - Extract 4 lines in Column to Rows Tab Delimited between tags

I have tried the following to no avail. xargs -n8 < test.txt awk '{if(NR%6!=0){p=""}else{p="\n"};printf $0" "p}' Mod_Alm_log.txt > test.txt I have tried different variations of the above, the problem is mixes lines together. And it includes the tags "%a and %A" I need them to be all tab... (16 Replies)
Discussion started by: mytouchsr
16 Replies

10. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies
PDF2TXT(1)							  PDFMiner Manual							PDF2TXT(1)

NAME
pdf2txt - extracts text contents of PDF files SYNOPSIS
pdf2txt [option...] file... DESCRIPTION
pdf2txt extracts text contents from a PDF file. It extracts all the text that is to be rendered programmatically, i.e. text represented as ASCII or Unicode strings. It cannot recognize text drawn as images that would require optical character recognition. It also extracts the corresponding locations, font names, font sizes, writing direction (horizontal or vertical) for each text portion. You need to provide a password for protected PDF documents when its access is restricted. You cannot extract any text from a PDF document which does not have extraction permission. OPTIONS
-o file Specifies the output file name. The default is to print the extracted contents to standand output in text format. -p pageno[,pageno,...] Specifies the comma-separated list of the page numbers to be extracted. Page numbers start at one. By default, it extracts text from all the pages. -c codec Specifies the output codec. -t type Specifies the output format. The following formats are currently supported: text Text format. This is the default. html HTML format. It is not recommended. xml XML format. It provides the most information. tag "Tagged PDF" format. A tagged PDF has its own contents annotated with HTML-like tags. pdf2txt tries to extract its content streams rather than inferring its text locations. Tags used here are defined in the PDF Reference, Sixth Edition[1] (S10.7 "Tagged PDF"). -D writing-mode Specifies the writing mode of text outputs: lr-tb Left-to-right, top-to-bottom. tb-rl Top-to-bottom, right-to-left. auto Determine writing mode automatically -M char-margin, -L line-margin, -W word-margin These are the parameters used for layout analysis. In an actual PDF file, text portions might be split into several chunks in the middle of its running, depending on the authoring software. Therefore, text extraction needs to splice text chunks. In the figure below, two text chunks whose distance is closer than the char-margin is considered continuous and get grouped into one. Also, two lines whose distance is closer than the line-margin is grouped as a text box, which is a rectangular area that contains a "cluster" of text portions. Furthermore, it may be required to insert blank characters (spaces) as necessary if the distance between two words is greater than the word-margin, as a blank between words might not be represented as a space, but indicated by the positioning of each word. Each value is specified not as an actual length, but as a proportion of the length to the size of each character in question. The default values are char-margin = 1.0, line-margin = 0.3, and W = 0.2, respectively. -n Suppress layout analysis. -A Force layout analysis for all the text strings, including text contained in figures. -V Enable detection of vertical writing. -s scale Specifies the output scale. This option can be used in HTML format only. -m n Specifies the maximum number of pages to extract. By default, all the pages in a document are extracted. -P password Provides the user password to access PDF contents. -d Increase the debug level. EXAMPLES
Extract text as an HTML file whose filename is output.html: $ pdf2txt -o output.html samples/naacl06-shinyama.pdf Extract a Japanese HTML file in vertical writing: $ pdf2txt -c euc-jp -D tb-rl -o output.html samples/jo.pdf Extract text from an encrypted PDF file: $ pdf2txt -P mypassword -o output.txt secret.pdf SEE ALSO
dumppdf(1) AUTHORS
Jakub Wilk <jwilk@debian.org> Wrote this manual page for the Debian system. Yusuke Shinyama <yusuke@cs.nyu.edu> Author of PDFMiner and its original HTML documentation. NOTES
1. PDF Reference, Sixth Edition http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf pdf2txt 08/24/2011 PDF2TXT(1)
All times are GMT -4. The time now is 02:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy