Awk to extract lines with a defined number of characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk to extract lines with a defined number of characters
# 1  
Old 06-17-2010
Awk to extract lines with a defined number of characters

This is my problem, my file (file A) contains the following information:
Quote:
> ID 1
DFNSALKDNJRGNLANGKNGRIIGINREVN
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 4
DNFKSAD
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID:
Quote:
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
Then, I need to compare the entries and determine their frequency. Thus, I will generate a third file (C) with the following information:
Quote:
> ID 2; freq 2
KJDFKDSJGNIHG
> ID 3; freq 2
BDSBGOBAOEURBOUEABG
I think it could be done using AWK or grep.
Any help will be greatly appreciated.
# 2  
Old 06-17-2010
try:
Code:
$ cat x
> ID 1
DFNSALKDNJRGNLANGKNGRIIGINREVN
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 4
DNFKSAD
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$ awk '/^>/ {R=$0}! />/ && length($0) >= 10 && length($0) < 20 {print R"\n"$0}' x
> ID 2
KJDFKDSJGNIHG
> ID 3
BDSBGOBAOEURBOUEABG
> ID 5
KJDFKDSJGNIHG
> ID 6
BDSBGOBAOEURBOUEABG
$

I didn't understand the file_c requirement.
# 3  
Old 06-17-2010
Try this:


Code:
awk '/\> ID/{x=$0 ; next}{if ( length >= 10 && length < 20 ){a[$0]++;b[x]=$0}}END {for (i in a) for (j in b) if(i==b[j]){print j "\t freq " a[i]"\n" i;break;}}' file
> ID 3   freq 2
BDSBGOBAOEURBOUEABG
> ID 2   freq 2
KJDFKDSJGNIHG


Guru.
This User Gave Thanks to guruprasadpr For This Post:
# 4  
Old 06-17-2010
Anchar,

Your answer work very well! File C should compare each and every line and record the frequency.

Guru,

I am not getting the same result. This is what I am getting:

Quote:
$ awk '/\> ID/{x=$0 ; next}{if ( length >= 10 && length < 20 ){a[$0]++;b[x]=$0}}END {for (i in a) for (j in b) if(i==b[j]){print j "\t freq " a[i]"\n" i;break;}}' TestFas.txt
freq 2
BDSBGOBAOEURBOUEABG
# 5  
Old 06-17-2010
Hi
I think yours is a sun machine. Use 'nawk' in place of 'awk'. It should go fine.

Guru.
# 6  
Old 06-17-2010
nawk

Guru,

It did not work.

Quote:
$ nawk '/\> ID/{x=$0 ; next}{if ( length >= 10 && length < 20 ){a[$0]++;b[x]=$0}}END {for (i in a) for (j in b) if(i==b[j]){print j "\t freq " a[i]"\n" i;break;}}' TestFas.txt
-bash: nawk: command not found
# 7  
Old 06-17-2010
Hi
Which is your unix flavor? Try 'gawk' if Linux.

Guru.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract some characters from lines based on pattern

Hi All, i would like to get some help regarding extracting certain characters from a line grepped. blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah blahblah{1:F01IRVTUS30XXXX0000000001}{2:I103IRVTDEF0XXXXN}{4:blah... (10 Replies)
Discussion started by: mad man
10 Replies

2. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

I have the following script that will print column 4 ("25") when column 1 contains "123". However, I need to ignore the alpha characters that are contained in the input file. If I were to ignore the characters my output would be column 3. What is the best way to print my column of interest... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

3. Shell Programming and Scripting

extract the lines by index number

Hi All, I want to extract the lines from file1 by using the index numbers from file2. In example, cat file1.txt 265 ABC 956 ... 698 DFA 456 ... 456 DDD 145 ... 125 DSG 154 ... 459 CGB 156 ... 490 ASF 456 ... 484 XFH 489 ... 679 hgt 481 ... 111 dfg 986 ... 356 vhn 444 ...... (7 Replies)
Discussion started by: senayasma
7 Replies

4. UNIX for Dummies Questions & Answers

Extract n number of lines from a file successively

Hello, I have a file with over 100,000 lines. I would like to be able extract 5000 lines at a time and give it as an input to another program. sed -n '1,5000p' <myfile> > myOut Similarly for 5001-10000 10001-15000 .... How can I do this in a loop? Thanks, Guss (5 Replies)
Discussion started by: Gussifinknottle
5 Replies

5. UNIX for Dummies Questions & Answers

AWK - number of specified characters in a string

Hello, I'm new to using AWK and would be grateful for some basic advice to get me started. I have a file consisting of 10 fields. Initially I wish to calculate the number of . , ~ and ^ characters in the 9th field ($9) of each line. This particular string also contains alphabetical... (6 Replies)
Discussion started by: Olly
6 Replies

6. Emergency UNIX and Linux Support

Urgent help pls.how to extract two lines having same starting number

Hi , I have a huge file like this =245 this is testing =035 abc123 =245 this is testing1 =035 abc124 =245 this is testing2 =035 abc125 =035 abc126 =245 this is testing3 here i have to pull out those lines having two =035 instead of alternative 035 and 245 i.e extract... (18 Replies)
Discussion started by: umapearl
18 Replies

7. Shell Programming and Scripting

help: Awk to control number of characters per line

Hello all, I have the following problem: My input is two sorted files: file1 >1_19_130_F3 T01220131330230213311013000000110000 >1_23_69_F3 T01200211300200200010000001000000 >1_24_124_F3 T010203113002002111111200002010 file2 >1_19_130_F3 24 18 9 18 23 4 11 4 5 9 5 8 15 20 4 4 7 4... (9 Replies)
Discussion started by: DerSeb
9 Replies

8. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

9. Shell Programming and Scripting

Extract some characters with SED or AWK

Hi, I have the following example string: today_is_a_good_day.txt The character "_" inside the string can sometimes be more or less. The solution for every string equal the count of "_" should be alway the rest after the last underline character. Result: day.txt I want to use awk... (5 Replies)
Discussion started by: climber
5 Replies

10. Shell Programming and Scripting

sed/awk to insert comment at defined line number

Hi there, may someone easily help me on this : I want to insert a text in a specific line number like : linenumb2start=`cat memory_map.dld | nl -ba | egrep -i "label" | cut -f1` line2insert=`expr $linenumb2start + 2` and now I need to replace something like {} with {comment} at... (8 Replies)
Discussion started by: homefp
8 Replies
Login or Register to Ask a Question