AWK/SED line based search

01-02-2012

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Maybe something like this?

Code:

awk '(NR % 1000)==1{n++}{print > "file-" n}' file

The files are named: file-1, file-2, file-3....

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

01-02-2012

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

The size of one line i.e. 40556271 1319211119.897235 0.0047939 is roughly 44 bytes. With that assumption the following command will split the file into smaller chunks of 1000 lines each

Code:

split -b 44000 infile

I have created a file with similar entries. The total no of lines in the file is 5531904. Total size around 233MB

Code:

root@bt > wc -l < infile
5531904
root@bt > ls -lrt infile
-rw-r--r-- 1 orange orange 233M Jan  2 18:00 infile

With the above logic,

Code:

root@bt > time split -a 5 -b 44000 infile

real    0m3.896s
user    0m0.052s
sys     0m1.044s

Fast enough?

Generated files have ~ 1000 lines each. Line count of few files

Code:

root@bt > wc -l < xaahcl
1000
root@bt > wc -l < xaahcm
1000

Does this help?

--ahamed

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

01-02-2012

Registered User

9, 0

Join Date: Jan 2012

Last Activity: 5 January 2012, 5:12 AM EST

Posts: 9

Thanks Given: 2

Thanked 0 Times in 0 Posts

Hi Franklin,

NR % 1000 would split every smaller file to 1000 lines, if I am not mistaken.
In my case I am interested in column 1 values of range 1000 (say 1319211119 to 1319212119), and this does not translate into a 1000 lines every time.

I hope I am clear.

new_one

View Public Profile for new_one

Find all posts by new_one

01-02-2012

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Do you provide the range manually? "say 1319211119 to 1319212119"?

No, you are not clear. You are contradicting your own statements.

Let us start once more.
How do you want to split the files? Yes, we know there is a number in the first column. What is next?

--ahamed

Last edited by ahamed101; 01-02-2012 at 09:00 AM..

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

01-02-2012

Registered User

9, 0

Join Date: Jan 2012

Last Activity: 5 January 2012, 5:12 AM EST

Posts: 9

Thanks Given: 2

Thanked 0 Times in 0 Posts

@ahamed-

The logic that I was using so far was like this-

Code:

 a="1319130869"
count="1"
for i in $(seq 1 as_many_times_as_i_need) 
do
 awk -vmyvarA=$a -vmyvarB=$b '($1 >= myvarA && $1 <= myvarB) {print $0 }'in.txt > out$count.txt
    a=$b
    b=$(($a+1000))
    count=`expr $count + 1`
done

As I was saying, when I run this for a file as huge as what I have, to search from the start every time would be a waste. and since my 'a' 'b' and column 1 values increase linearly, I thought it would be quicker if I saved the last matching NR (for previous 'a' and 'b' range), and after I increment 'a' and 'b' values, it would be faster if i pick up from where I left off.

---------- Post updated at 08:12 AM ---------- Previous update was at 08:01 AM ----------

Quote:

Originally Posted by ahamed101

Let us start once more.
How do you want to split the files? Yes, we know there is a number in the first column. What is next?
--ahamed

I know what the first value is, so from the first value('a') to a first value+1000('b') ( not in number of lines, but in actual value) i write it to a new file.
I want to do this repetitively (in some cases upto 10000 times) with different a and b values. At all times after an iteration 'a' becomes 'b' and 'b' is previous value+1000.

This is the main logic, and it works (previous code).

Now for the successive iterations I want to do something like this.

take a variable c. c will take NR values. Once I reach the last row with value 'b' I will store that line number to 'c', increment 'a' and 'b' values and then resume the search from line number 'c', so that I dont have to begin from the start again.

Please bear with me, i hope atleast now the working logic is clear to understand..sorry for the trouble guys

new_one

View Public Profile for new_one

Find all posts by new_one

01-02-2012

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Something like this?

Code:

awk 'BEGIN{p=1}
p{ end=start+1000;p=0;++n;if(!howmanytimes--)exit }
{
  print > "file"n
  if($1>=end){
    p=1; start=$1;
  }
}'  start=1319130869 howmanytimes=10 infile

--ahamed

Last edited by ahamed101; 01-02-2012 at 09:43 AM.. Reason: Updated the code!

This User Gave Thanks to ahamed101 For This Post:

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

01-02-2012

Registered User

9, 0

Join Date: Jan 2012

Last Activity: 5 January 2012, 5:12 AM EST

Posts: 9

Thanks Given: 2

Thanked 0 Times in 0 Posts

wow! Perfect. just great.!!!

Thanks a million ahamed!
cant thank you enough. I can see it works as i want from the output..now am trying to understand it, a bit too advanced for me. Can you tell me what p=1 in the else routine and p=0 in the first loop does.

Thanks again bro!

new_one

View Public Profile for new_one

Find all posts by new_one

Shell Programming and Scripting

AWK/SED line based search

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search a multi-line shell command output and execute logic based on result

Discussion started by: yogi

2. Shell Programming and Scripting

awk command to search based on 5 user input fields

Discussion started by: rmerrird

3. Shell Programming and Scripting

Multiple line search, replace second line, using awk or sed

Discussion started by: KarmaPoliceT2

4. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

Discussion started by: laknar

5. Shell Programming and Scripting

Split a line based on : using sed

Discussion started by: sarbjit

6. Shell Programming and Scripting

Printing previous line based on pattern using sed

Discussion started by: Anjan1

7. Shell Programming and Scripting

Append specific lines to a previous line based on sequential search criteria

Discussion started by: jesse

8. Shell Programming and Scripting

using sed to conditionally extract stanzas of a file based on a search string

Discussion started by: aitayemi

9. Shell Programming and Scripting

sed search and replace in next line

Discussion started by: charissaf67

10. Shell Programming and Scripting

search file, change existing value based on input (awk help)

Discussion started by: nortonloaf