Uppercase/lowercase comparison of one character per line with awk??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Uppercase/lowercase comparison of one character per line with awk??
# 8  
Old 01-07-2010
Hey, ivpz:

a is an array which holds the results of the split operation. It isn't used for anything except that split() requires a place to put the fields it creates. I'm only interested in the return value which indicates how many elements are in the array, which tells me if there are more commas or more periods.

Regarding the errors in the script's output, it may well be related to those lines which should be ignored. You didn't mention any "rogues" in your original post. The code's logic assumes that the file is well-formed, i.e. every line will be used and that they number exactly as required for the number of comparisons to be made.

Could you please provide an example or two of the "rogue" lines which should be ignored? Do they occur between sets of dupes or embedded within them? A minimal data sample with these special cases would help me help you. I assume when you mean that they include letters outside AaCcGgTt that doesn't include the '^F' I'm seeing in your sample data. Is that a form-feed control character in the data or is it a literal caret ("^") followed by a literal upper case eff ("F")?



alister

Last edited by alister; 01-07-2010 at 10:34 AM..
# 9  
Old 01-07-2010
Hi Alister,

I don't think any of those characters cause the error. Many of the lines have them too and the the output results are ok. When I took out those lines with incorrect output and reran them with the script the results were fine. Looking at the lines before I came to realise that there are some lines with no duplication, i.e. they are unique with no . or ,

Correct me if I'm wrong, the last line of your script will always look for duplicate since it is an array statement? Any suggestion how to modify this line?

Below is the example I'm talking about:

Code:
ccCCcc$c$cCC$CC$ccccCc$CCCccccCcccccCCCcCCcCccCccCCCCCCCcCcCCcCCCcccCCCCCC 0 37 0 0 0 32 0 0
ggGGgGGgggGGGGggggGgggggGGGgGGgGggGggGGGGGGGgGgGGgGGGgggGGGGGGg 0 0 35 0 0 0 28 0
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0

If I replaced line 2 of the above example with an empty line, I still got a + but the following lines were then correct:

Code:
ccCCcc$c$cCC$CC$ccccCc$CCCccccCcccccCCCcCCcCccCccCCCCCCCcCcCCcCCCcccCCCCCC 0 37 0 0 0 32 0 0 +
ggGGgGGgggGGGGggggGgggggGGGgGGgGggGggGGGGGGGgGgGGgGGGgggGGGGGGg 0 0 35 0 0 0 28 0 +
 +
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0 -
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0 +
,,.$.,..,....,.G,g,T.gG..Gg.,,,........,g,.....,g...,^F. 0 0 3 1 0 0 5 0 +
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0 +
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0 -
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0 +
...,.,,,..,c,...$..,,...,.cgA,,.GG,...........,,,,.G,,.Ng,,.G.,. 1 0 4 0 0 2 2 0 +

# 10  
Old 01-07-2010
Code:
$ cat dna.awk 
old0!=$0 { old0=$0; i=2 }

i<=5 {
    while (!($i || $(i+4)) && i<=5)
        i++
    if (i<=5) {
        print $0, ($i>$(i+4) ? "+" : "-")
        i++
        next
    }
}

i==6 && /\.|,/ {
    print $0, (split($0, a, /\./) > split($0, a, /,/) ? "+" : "-")
}

$ cat data
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0
ccCCcc$c$cCC$CC$ccccCc$CCCccccCcccccCCCcCCcCccCccCCCCCCCcCcCCcCCCcccCCCCCC 0 37 0 0 0 32 0 0
ggGGgGGgggGGGGggggGgggggGGGgGGgGggGggGGGGGGGgGgGGgGGGgggGGGGGGg 0 0 35 0 0 0 28 0
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0

$ awk -f dna.awk data
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0 +
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0 +
.......GGGG,.G,,G...G.,.T...G.,..,.,,^F, 0 0 8 1 0 0 0 0 +
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0 -
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0 -
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0 -
,gc,,cga,g,c,,,,,,, 0 0 0 0 1 3 3 0 -
ccCCcc$c$cCC$CC$ccccCc$CCCccccCcccccCCCcCCcCccCccCCCCCCCcCcCCcCCCcccCCCCCC 0 37 0 0 0 32 0 0 +
ggGGgGGgggGGGGggggGgggggGGGgGGgGggGggGGGGGGGgGgGGgGGGgggGGGGGGg 0 0 35 0 0 0 28 0 +
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0 +
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0 -
.....,,..,,...,,......,...cA.c,cC. 1 1 0 0 0 3 0 0 +

# 11  
Old 01-08-2010
Hi Alister,

Thanks for the modified script. The errors are not corrected but some of the lines got deleted. When I ran it on my data of about 4.5million lines, nearly 3000 got deleted. However, I managed to correct the bug by removing the back slash in the last 2 lines and everything looks fine now:

Code:
i==6 && /\.|,/ {
    print $0, (split($0, a, /\./) > split($0, a, /,/) ? "+" : "-")

to:

Code:
i==6 && /.|,/ {
    print $0, (split($0, a, /./) > split($0, a, /,/) ? "+" : "-")

Once again, thank you for your help. Took me nearly a week to get this done.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Uppercase to lowercase

Hello, I have a list of files in a directory whose names are all in uppercasse, including the file format for eg *.MP3 . I would like to convert these to the normal way we write it ie ABC.MP3 to be converted to Abc.mp3 . I know that this can be done manually by using a lot of "mv" or rename... (6 Replies)
Discussion started by: ajayram
6 Replies

2. Shell Programming and Scripting

Convert lowercase to uppercase

listprocs.sh contains ps -ef | grep "swikar" 1) Write a shell script to convert an input file to all upper case. Name your shell script toupper.sh. Hint: tr ' ' ' ' will convert all lower case letters to upper case To use your script, try the following command: cat... (1 Reply)
Discussion started by: swikar
1 Replies

3. UNIX for Dummies Questions & Answers

UPPERCASE to lowercase

Hi All, i have a file and i want to convert all uppercase letters to lowercase letters which are in my file. how can i do this. Thanx (3 Replies)
Discussion started by: temhem
3 Replies

4. UNIX Desktop Questions & Answers

Unix: lowercase to uppercase

I just started to learn unix... and i needed to make a basic script. i need to 1. read a file (.txt) 2. count the words of EVERY sentece 3. sentences with odd number of words need to be converted into lowercase sentences with even number of words need to be converted into uppercase ... (6 Replies)
Discussion started by: chilli1988
6 Replies

5. Shell Programming and Scripting

indentation and lowercase to uppercase

hi, i need to write a bash script that does two things. the program will take from the command line a file name, which is a C code, and an integer, which is the size of my indentation i would then have to indent every nested code by the number of columns provided by the user in the... (1 Reply)
Discussion started by: kratos.
1 Replies

6. UNIX for Dummies Questions & Answers

uppercase to lowercase

i have no variable and no file i just want to convert AJIT to ajit with some command in UNIX can anybody help (4 Replies)
Discussion started by: ajit.yadav83
4 Replies

7. AIX

Lowercase to Uppercase

Inside a script I have 2 variables COMP=cy and PT=t. further down the same script I require at the same line to call those 2 variables the first time uppercase and after lowercase ${COMP}${PT}ACE,${COMP}${PT}ace. Can somebody help me Thanks in advance George Govotsis (7 Replies)
Discussion started by: ggovotsis
7 Replies

8. Shell Programming and Scripting

UPPERCASE to lowercase with no overwriting?

Hey, I've just started learning shell script today. How would I write a bash script file that changes file names from uppercase to lowercase in that directory, the program should warn the user and NOT overwrite the existing file if it's already in lowercase? for example in a directory i... (1 Reply)
Discussion started by: lgd923
1 Replies

9. Shell Programming and Scripting

How convert lowercase or uppercase

It will only accept one argument where it should be upper or lowercase. if user choose to convert filnames to upper case than it should convert to upper or vice versa. if no action taken by the user then should not do anything any of the files in the current directory. (5 Replies)
Discussion started by: Alex20
5 Replies

10. Shell Programming and Scripting

uppercase to lowercase

Greetings & Happy New Years To All! A client of mine FTP'ed their files up to the server and it all ended up being in UPPERCASE when it all should be in lowercase. Is there a builtin command or a script anyone knows of that will automagically convert all files to lowercase? Please advise asap... (4 Replies)
Discussion started by: webex
4 Replies
Login or Register to Ask a Question