AWK/SED line based search


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK/SED line based search
# 8  
Old 01-02-2012
Maybe something like this?
Code:
awk '(NR % 1000)==1{n++}{print > "file-" n}' file

The files are named: file-1, file-2, file-3....
# 9  
Old 01-02-2012
The size of one line i.e. 40556271 1319211119.897235 0.0047939 is roughly 44 bytes. With that assumption the following command will split the file into smaller chunks of 1000 lines each
Code:
split -b 44000 infile

I have created a file with similar entries. The total no of lines in the file is 5531904. Total size around 233MB
Code:
root@bt > wc -l < infile
5531904
root@bt > ls -lrt infile
-rw-r--r-- 1 orange orange 233M Jan  2 18:00 infile

With the above logic,
Code:
root@bt > time split -a 5 -b 44000 infile

real    0m3.896s
user    0m0.052s
sys     0m1.044s

Fast enough?

Generated files have ~ 1000 lines each. Line count of few files
Code:
root@bt > wc -l < xaahcl
1000
root@bt > wc -l < xaahcm
1000

Does this help?

--ahamed
# 10  
Old 01-02-2012
Hi Franklin,

NR % 1000 would split every smaller file to 1000 lines, if I am not mistaken.
In my case I am interested in column 1 values of range 1000 (say 1319211119 to 1319212119), and this does not translate into a 1000 lines every time.

I hope I am clear.
# 11  
Old 01-02-2012
Do you provide the range manually? "say 1319211119 to 1319212119"?

No, you are not clear. You are contradicting your own statements.

Let us start once more.
How do you want to split the files? Yes, we know there is a number in the first column. What is next?

--ahamed

Last edited by ahamed101; 01-02-2012 at 09:00 AM..
# 12  
Old 01-02-2012
@ahamed-

The logic that I was using so far was like this-

Code:
 a="1319130869"
count="1"
for i in $(seq 1 as_many_times_as_i_need) 
do
 awk -vmyvarA=$a -vmyvarB=$b '($1 >= myvarA && $1 <= myvarB) {print $0 }'in.txt > out$count.txt
    a=$b
    b=$(($a+1000))
    count=`expr $count + 1`
done

As I was saying, when I run this for a file as huge as what I have, to search from the start every time would be a waste. and since my 'a' 'b' and column 1 values increase linearly, I thought it would be quicker if I saved the last matching NR (for previous 'a' and 'b' range), and after I increment 'a' and 'b' values, it would be faster if i pick up from where I left off.

---------- Post updated at 08:12 AM ---------- Previous update was at 08:01 AM ----------

Quote:
Originally Posted by ahamed101
Let us start once more.
How do you want to split the files? Yes, we know there is a number in the first column. What is next?
--ahamed
I know what the first value is, so from the first value('a') to a first value+1000('b') ( not in number of lines, but in actual value) i write it to a new file.
I want to do this repetitively (in some cases upto 10000 times) with different a and b values. At all times after an iteration 'a' becomes 'b' and 'b' is previous value+1000.

This is the main logic, and it works (previous code).

Now for the successive iterations I want to do something like this.

take a variable c. c will take NR values. Once I reach the last row with value 'b' I will store that line number to 'c', increment 'a' and 'b' values and then resume the search from line number 'c', so that I dont have to begin from the start again.

Please bear with me, i hope atleast now the working logic is clear to understand..sorry for the trouble guys Smilie
# 13  
Old 01-02-2012
Something like this?

Code:
awk 'BEGIN{p=1}
p{ end=start+1000;p=0;++n;if(!howmanytimes--)exit }
{
  print > "file"n
  if($1>=end){
    p=1; start=$1;
  }
}'  start=1319130869 howmanytimes=10 infile

--ahamed

Last edited by ahamed101; 01-02-2012 at 09:43 AM.. Reason: Updated the code!
This User Gave Thanks to ahamed101 For This Post:
# 14  
Old 01-02-2012
wow! Perfect. just great.!!!

Thanks a million ahamed!
cant thank you enough. I can see it works as i want from the output..now am trying to understand it, a bit too advanced for me. Can you tell me what p=1 in the else routine and p=0 in the first loop does.

Thanks again bro!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search a multi-line shell command output and execute logic based on result

The following is a multi-line shell command example: $cargo build Compiling prawn v0.1.0 (/Users/ag/rust/prawn) error: failed to resolve: could not find `setup_panix` in `human_panic` --> src/main.rs:14:22 | 14 | human_panic::setup_panix!(); | ... (2 Replies)
Discussion started by: yogi
2 Replies

2. Shell Programming and Scripting

awk command to search based on 5 user input fields

Field1=”” Field2=”” Field3=”” Field4=”” Field5=”” USER INPUT UP TO 5 FIELDS awk -F , '{ if ( $3 == Field1 && $6 == Field2 && $8 == Field3 && $9 == Field4 && $10 == Field5) print $0 }' /tmp/rodney.outD INPUT FILE (Rodney.outD): ... (3 Replies)
Discussion started by: rmerrird
3 Replies

3. Shell Programming and Scripting

Multiple line search, replace second line, using awk or sed

All, I appreciate any help you can offer here as this is well beyond my grasp of awk/sed... I have an input file similar to: &LOG &LOG Part: "@DB/TC10000021855/--F" &LOG &LOG &LOG Part: "@DB/TC10000021852/--F" &LOG Cloning_Action: RETAIN &LOG Part: "@DB/TCCP000010713/--A" &LOG &LOG... (5 Replies)
Discussion started by: KarmaPoliceT2
5 Replies

4. Shell Programming and Scripting

Search several string and convert into a single line for each search string using awk command AIX?.

I need to search the file using strings "Request Type" , " Request Method" , "Response Type" and by using result set find the xml tags and convert into a single line?. below are the scenarios. Cat test Nov 10, 2012 5:17:53 AM INFO: Request Type Line 1.... (5 Replies)
Discussion started by: laknar
5 Replies

5. Shell Programming and Scripting

Split a line based on : using sed

Hi, i have a file say file1 having following data /abc/def:ghi/jkl/ some other text Now i want to extract only ghi/jkl/using sed, can some one please help me. Thanks Sarbjit (2 Replies)
Discussion started by: sarbjit
2 Replies

6. Shell Programming and Scripting

Printing previous line based on pattern using sed

Hi, I have a written a shell script to get the previous line based on the pattern. For example if a file has below lines: ---------------------------------------------- #UNBLOCK_As _per #As per 205.162.42.92 #BLOCK_As_per #----------------------- #input checks abc.com... (5 Replies)
Discussion started by: Anjan1
5 Replies

7. Shell Programming and Scripting

Append specific lines to a previous line based on sequential search criteria

I'll try explain this as best I can. Let me know if it is not clear. I have large text files that contain data as such: 143593502 09-08-20 09:02:13 xxxxxxxxxxx xxxxxxxxxxx 09-08-20 09:02:11 N line 1 test line 2 test line 3 test 143593503 09-08-20 09:02:13... (3 Replies)
Discussion started by: jesse
3 Replies

8. Shell Programming and Scripting

using sed to conditionally extract stanzas of a file based on a search string

Dear All, I have a file with the syntax below (composed of several <log ..... </log> stanzas) I need to search this file for a number e.g. 2348022225919, and if it is found in a stanza, copy the whole stanza/section (<log .... </log>) to another output file. The numbers to search for are... (0 Replies)
Discussion started by: aitayemi
0 Replies

9. Shell Programming and Scripting

sed search and replace in next line

Hello, I am hoping someone can provide some guidance on using context based search and replace to search for a pattern and then do a search and replace in the line that follows it. For example, I have a file that looks like this: <bold>bold text </italic> somecontent morecontent... (3 Replies)
Discussion started by: charissaf67
3 Replies

10. Shell Programming and Scripting

search file, change existing value based on input (awk help)

I have a file (status.file) of the form: valueA 3450 valueB -20 valueC -340 valueD 48 I am tailing a data.file, and need to search and modify a value in status.file...the tail is: tail -f data.file | awk '{ print $3, ($NF - $(NF-1)) }' which will produce lines that look like this: ... (3 Replies)
Discussion started by: nortonloaf
3 Replies
Login or Register to Ask a Question