Matching 2 chars of a string that repeat


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching 2 chars of a string that repeat
# 1  
Old 12-03-2009
Matching 2 chars of a string that repeat

Hello Unix gurus,

I have a gzipped file where each line contains 2 street addresses in the US. What I want to do is get a count for each state that does not match.

What I have so far is:
$ gzcat matched_10_09.txt.gz |cut -c 106-107,184-185 | head -5
CTCT
CTNY
CTCT
CTFL
CTMA

This cuts the State fields out of each line. What I want to do is count the number of occurrences that each State has where it is not the same. So I want to pipe the output to a sed or awk process that eliminates, in the example above, CTCT (but it also has to eliminate NYNY, FLFL, etc). It will compare the first 2 characters to the last 2 characters and if there is a match, delete (or skip). Then I will pipe the output to 'wc' and get my desired result. In the above case, it would be 3.

I am always very grateful for assistance regarding my questions, but sometimes I feel like each answer I get is like getting a fish as opposed to learning how to fish; so to speak. So, if you could briefly mention how your solution works, I would be MOST grateful.

Thanks in advance.
# 2  
Old 12-03-2009
try it with awk
Code:
gzcat matched_10_09.txt.gz |awk '{if(substr($0,106,2)!=substr($0,184,2)){print substr($0,106,2)substr($0,184,2)}}'

# 3  
Old 12-03-2009
Quote:
Originally Posted by sitney
What I want to do is count the number of occurrences that each State has where it is not the same.
Code:
gzcat matched_10_09.txt.gz | awk '{x=substr($0,106,2);y=substr($0,184,2);if(x != y){a[x]++;a[y]++}}END{for(i in a)print i,a[i]}'

# 4  
Old 12-04-2009
Quote:
Originally Posted by danmero
Code:
gzcat matched_10_09.txt.gz | awk '{x=substr($0,106,2);y=substr($0,184,2);if(x != y){a[x]++;a[y]++}}END{for(i in a)print i,a[i]}'

That was very elegant! It actually gives a count for each state. I will try to parse the syntax so I can build such awk commands myself. Thanks.
# 5  
Old 12-04-2009
Code:
awk 'BEGIN {FS=""} !($106$107==$184$185)' urfile

# 6  
Old 12-04-2009
Code:
$ gzcat matched_10_09.txt.gz |sed '/.\{105\}\(..\).\{76\}\1/d'

-or-
Code:
$ gzcat matched_10_09.txt.gz |sed -r '/.{105}(..).{76}\1/d'


Last edited by Scrutinizer; 12-04-2009 at 09:25 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl to adjust coordinates based on repeat string

In the file below I am trying to count the given repeats of A,T,C,G in each string of letters. Each sequence is below the > and it is possible for a string of repeats to wrap from the line above. For example, in the first line the last letter is a T and the next lines has 3 more. I think the below... (10 Replies)
Discussion started by: cmccabe
10 Replies

2. Shell Programming and Scripting

Add an string at every x chars

Hi All, I have a file fo around 15k bytes which i need to insert a string " + "at every 250 bytes. I found some ideas here using perl to split into lines and tried to addapt it but the results where not satisfactory for instance i tried to change #!/usr/bin/perl $teststring =... (9 Replies)
Discussion started by: kadu
9 Replies

3. Homework & Coursework Questions

How to use xargs to repeat as a loop to grab date string?

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: My goal to find how many requests in 14 days from weblog server. I know to cat a weblog file to wc -l to find the... (8 Replies)
Discussion started by: scopiop
8 Replies

4. UNIX for Dummies Questions & Answers

How to search for a string with special chars?

Hi guys, I am trying to find the following string in a file, but I always get pattern not found error, not sure what is missing here. Can you help please? I do a less to open the xrates.log and then do a /'="18"' in the file and tried various combinations to search the below string. String... (8 Replies)
Discussion started by: santokal
8 Replies

5. Shell Programming and Scripting

If condition matching with special chars

Hi, I have file #cat drivers.txt fcs0 fcs1 vscsi1 vscsi2 In this i need to check the availabality of "fcs" or "vscsi" alone not vscsi0,fcs1 I tried with "if condition" but it is not working. cat drivers.txt| while read ADAP do echo "Checking for $ADAP" if ;then echo "FC... (9 Replies)
Discussion started by: ksgnathan
9 Replies

6. Shell Programming and Scripting

[Solved] print chars of a string

how can i print all the chars of a string one by line? i have thought that use a for cicle and use this command inside: ${VARIABLE:0:last}but how can i make last? because string is random P.S. VARIABLE is the string or can i make a variable for every chars of this string? this was my idea... (10 Replies)
Discussion started by: tafazzi87
10 Replies

7. UNIX for Dummies Questions & Answers

regexp: match string that contains list of chars

Hi, I'm curious about how to do a very simple thing with regular expressions that I'm unable to figure out. If I want to find out if a string contains 'a' AND 'b' AND 'c' it can be very easily done with grep: echo $STRING|grep a|grep b|grep c but, how would you do that in a single... (9 Replies)
Discussion started by: jimcanoa
9 Replies

8. Shell Programming and Scripting

Repeatable chars in a string

I have a string I keep appending too upto certain amount of chars. Is there some sort of way for me to check the string to see if I hit my limit of repeatable characters? For example, assume I allow for 2 repeatable chars, this will be a valid string Xxh03dhJUX, so I can append the last... (3 Replies)
Discussion started by: BeefStu
3 Replies

9. Shell Programming and Scripting

Retreive string between two chars

I want to write a shell script in order to retreive some data from a log file that i have written into. The string that i want to get is the number 2849 (that is located between | | ). To explain further, this is the result i get after running "grep LOGIN filename.log" but i need to get the... (25 Replies)
Discussion started by: danland
25 Replies

10. UNIX for Dummies Questions & Answers

Extracting the last 3 chars from a string using sed

Hi. Can I extract the last 3 characters from a given string using sed ? Why the following doesn't work (it prints the full string) : echo "abcd" | sed '/\.\.\.$/p' doesn't work ? output: abcd Thanks in advance, 435 Gavea. (7 Replies)
Discussion started by: 435 Gavea
7 Replies
Login or Register to Ask a Question