Extracting a block of text from a large file using variables?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extracting a block of text from a large file using variables?
# 1  
Old 03-03-2014
Extracting a block of text from a large file using variables?

Hi UNIX Members,

I've been tasked with performing the following:
Extract a block of data in column form
#This data changes each time, therefore automating future procedures

Please Note the following:
line = reading a line from a file_list that leads to the data
The filename is called mol_1($line).opt.out, the block of text is in the opt.out file

So far I have:
Set variables to assign line numbers the text I require lies between ( $ORB and $MULL)
######I would like to extract the text between these two variables#####
I'd like to make a seperate file 'HOMO'(Highest Occupied Molecular Orbital) in which is a temporary data file which I can work with.

Scope of file : $ORB = 18172, $MULL = 18278
Line numbers needed to extract is 18172-18278 (however these change therefore are created as variables)



The current script I have is:
Code:
#!/bin/bash -f
while read line
do

#mkdir HOMO
#cd ./B3LYP_631Gstar2.$line

OBEGIN=$(cat -n $line.opt.out | grep "ORBITAL ENERGIES" | tail -1 | cut -f 1)
OEND=4
ORB=$(( $OBEGIN + $OEND ))
#echo $ORB

MBEGIN=$(cat -n $line.opt.out | grep "MULLIKEN POPULATION ANALYSIS" | tail -1 | cut -f 1)
MEND=3
MULL=$(( $MBEGIN - $MEND ))

###### Works up till here, above is to show what i'm working with####
###Below is the 3 methods i've tried

#sed '$ORB,$MULL!d' $line.opt.out > HOMO
#awk 'NR==$ORB,NR==$MULL' $line.opt.out | cut -f1  > HOMO
#grep -E '($ORB: ){5}$MULL' < $line.opt.out > HOMO

#echo $MULL
done<$1

~

Summary of task: Extracting a block of text from a large file using variables $ORB and $MULL as numbers

#######EXTRA#####
If you know how to extract data that is in columns ( Say I need only column 2 and column 4 out of 6 columns) to be in columns in another file (sort of like an excel file) that's be much appreciated!


Warm Regards, Klor

Last edited by Scrutinizer; 03-03-2014 at 04:33 PM.. Reason: code tags
# 2  
Old 03-03-2014
Hello,

Could you please confirm this is not a homework, if yes then kindly post the query accordingly. If no, then kindly let us know the input and expected output on same.



Thanks,
R. Singh
# 3  
Old 03-03-2014
Hi Mr.Singh
No this isn't homework, it's a way to automate my tasks and learn scripting whilst working on computer modelling. I am a Chemistry student at a Third level in university, working on computational modelling and my Lecturer proposed this idea as an 'extra learning opportunity'

####It is not graded##

The input file is large but the arrangement follows:
Code:
ORBITAL ENERGIES

Title1      Title2      Title3      Title4      Title5
Data1      Data2      Data3      Data4      Data5
Data1.1    Data2.1   Data3.1   Data4.1   Data5.1
Data1.2    Data2.2   Data3.2   Data4.2   Data5.2

MULLIKEN POPULATION ANALYSIS

the expected output is:

Code:
Data2      Data 3    Data 4
Data2.1   Data3.1   Data4.1

Warm Regards,
Jason Deacon

Last edited by Scrutinizer; 03-03-2014 at 04:40 PM.. Reason: code tags
# 4  
Old 03-03-2014
What would be the parameters to apply on the sample input to get this output?
# 5  
Old 03-03-2014
Hello Klor,

Following could help you.

EDIT: This should help in all in all data input file.

Code:
awk '/ORBITAL ENERGIES/ {getline;getline;getline;{a=$2" "$3" "$4};getline;{b=$2" "$3" "$4};getline;{c=$2" "$3" "$4};getline} END{print a ORS b ORS c}' check_parameters

Output will be as follows.

Code:
Data2 Data3 Data4
Data2.1 Data3.1 Data4.1
Data2.2 Data3.2 Data4.2


NOTE: Following code will work only for given input in thread.

Code:
awk 'NR==4 || NR==5 {print $2" "$3" "$4}' check_parameters

Output will be as follows.

Code:
Data2 Data3 Data4
Data2.1 Data3.1 Data4.1


NOTE: Where check_parameters is the Input file name.

Thanks,
R. Singh

Last edited by RavinderSingh13; 03-03-2014 at 05:48 PM.. Reason: Adding complete solution
# 6  
Old 03-03-2014
Thank you Mr.Singh,
I apologize that I didn't evaluate more.
The input file has many ORBITAL ENERGIES as titles which is why I used tail -1 when cutting the line number, as it is the last one I am interested in.

Also the data line will vary so sometimes i may data.2.5+4.5 ->2.9+4.9 lines, sometimes it may be 2.10+4.10->2.15+4.15 lines so the variables need to change and be integrated into the code as they won't be fixed numbers each time.

@bartus
the parameters are as followed:
#line is defined as mol1 (molecule 1) but it will soon include 3-7 names in columnic order and read the first line then the second as a loop
Anything is # is excluded for now until I can piece together a functioning script.

OBEGIN = Searching to the bottom of the file (text) and extracting the line number from this position (cut -f 1)
OEND = amount of lines until the numeric data starts (important)
therfore ORB = Starting line of important numeric data

MBEGIN = extracting the line number of the next title in the text file
MEND = How many lines until we reach the numeric data
MULL = As it is the next title we must go in reverse and take numbers away therefore MULL is the line number of final numeric data

the #awk #sed #grep afterward are my failed attempts at inputting where fixed numbers would be okay however variables don't work for some reason

the #echos were there to make certain the variables had the correct numbers.

The input file has

~18000lines of text (including 4 other ORBITAL ENERGIES and MULL titles)
ORBITAL
data
MULLIKEN POPULATION ANALYSIS
~500 lines of text

The desired output is

All Datalines,

#######I can manage the below after i extract the data lines####

inwhich I can then do the same to created variables from when
Data = 2.0000
Data = 2.0000
Data = 0.0000
Data = 0.0000
when 2 goes to 0 i will be able to extract those lines using grep

and therefore only have
data = 2.000
data = 0.000

lines in the final file therefore create a graph

Warm Regards, Jason
# 7  
Old 03-04-2014
I think you may be looking for something like this:
Code:
awk '
  $0~ostart{
    c=0
  }
  $0~ostart,$0~mstart{
    A[++c]=$0
  }
  END{
    for(i=oend; i<=c-mend; i++) print A[i]
  }
' ostart="ORBITAL ENERGIES" oend=3 mstart="MULLIKEN POPULATION ANALYSIS" mend=3 file

This User Gave Thanks to Scrutinizer For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: passing shell variables through and extracting text

Hello, new to the forums and to awk. Glad to be here. :o I want to pass two shell (#!/bin/sh) variables through to awk and use them. They will determine where to start and stop text extraction. The code with the variables hard-coded in awk works fine; the same code, but with the shell... (7 Replies)
Discussion started by: bedtime
7 Replies

2. UNIX for Dummies Questions & Answers

Extracting lines from a text file based on another text file with line numbers

Hi, I am trying to extract lines from a text file given a text file containing line numbers to be extracted from the first file. How do I go about doing this? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

3. Shell Programming and Scripting

help extracting text from file

Hello I have a large file with lines beginning with 552, 553, 554, below is a small sample, I need to extract the data you can see below highlighted in bold from this file on the same location on every line and output it to a new file. Thank you in advance for any help 55201KL... (2 Replies)
Discussion started by: firefox2k2
2 Replies

4. Shell Programming and Scripting

extracting block of lines from a file

consider the input file which i am dealing with looks like this.. #cat 11.sql create table abc ( . . . ) engine=Innodb ; . . etc . . . create table UsM ( blah blah blah ) engine=Innodb ; (5 Replies)
Discussion started by: vivek d r
5 Replies

5. Shell Programming and Scripting

splitting a large text file into paragraphs

Hello all, newbie here. I've searched the forum and found many "how to split a text file" topics but none that are what I'm looking for. I have a large text file (~15 MB) in size. It contains a variable number of "paragraphs" (for lack of a better word) that are each of variable length. A... (3 Replies)
Discussion started by: lupin..the..3rd
3 Replies

6. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

7. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Discussion started by: lord_butler
2 Replies

8. Shell Programming and Scripting

Performance issue in UNIX while generating .dat file from large text file

Hello Gurus, We are facing some performance issue in UNIX. If someone had faced such kind of issue in past please provide your suggestions on this . Problem Definition: /Few of load processes of our Finance Application are facing issue in UNIX when they uses a shell script having below... (19 Replies)
Discussion started by: KRAMA
19 Replies

9. UNIX for Dummies Questions & Answers

extracting text and reusing the text to rename file

Hi, I have some ps files where I want to ectract/copy a certain number from and use that number to rename the ps file. eg: 'file.ps' contains following text: 14 (09 01 932688 0)t the text can be variable, the only fixed element is the '14 ('. The problem is that the fixed element can appear... (7 Replies)
Discussion started by: JohnDS
7 Replies

10. Shell Programming and Scripting

Parsing file and extracting the useful data block

Greetings All!! I have a very peculiar problem where I have to parse a big text file and extract useful data out of it with starting and ending block pattern matching. e.g. I have a input file like this: sample data block1 sample data start useful data end sample data block2 sample... (5 Replies)
Discussion started by: arminder
5 Replies
Login or Register to Ask a Question