08-20-2010
OK, here is something to horrify the unix programmers. here i am trying to analyze the datafile "fake", i am trying to do a couple of things
take the lines between matrix and end.
remove first line
remove last line
then i used your awk expression to duplicate all remaining line
then wcount up all of the lines and stick that somewhere
then grep and wc to count the occurrences of each of a number of expressions
then stick the line counts in front of the datafiles.
then remove all of the junk files...
here is another question
if i want to write
./myscript INPUTFILE
how do I code that into the script. here i just put the name of the file into the script. this is probably an easy one...i am just new to this! i don't even know the names of what to search for.
Here is the code: I would love any feedback.
And below that is a datafile
code:
awk '/matrix/,/;end;/' INPUTFILE > ZZoutput
sed '$d' ZZoutput > ZZoutfile
sed '1d' ZZoutfile > ZZoutfile1
awk '
/_X/ || /_Y/ { print; next; }
{ print; print; }
' ZZoutfile1 > ZZ_number_of_taxa
grep 'Gg' ZZ_number_of_taxa > ZAGg
wc -l ZAGg > ZQGg
grep 'Hs' ZZ_number_of_taxa > ZAHs
wc -l ZAHs > ZQHs
grep 'Panp' ZZ_number_of_taxa > ZAPanp
wc -l ZAPanp > ZQPanp
grep 'Ptro' ZZ_number_of_taxa > ZAPtro
wc -l ZAPtro > ZQPtro
grep 'Pts' ZZ_number_of_taxa > ZAPts
wc -l ZAPts > ZQPts
grep 'Ptv' ZZ_number_of_taxa > ZAPtv
wc -l ZAPtv > ZQPtv
wc -l ZZ_number_of_taxa > ZZlinecount
cat ZZlinecount ZQ* ZZ_number_of_taxa > dataset
rm ZZ*
rm ZA*
rm ZQ*
datafile:
junk stuff
matrix
Gg447874 CTTGAACATT
Gg447875 CTTGAACATT
Hs287867 CTTGAACATT
Hs287868 CTTGAACATT
Hs287869 CTTGAACATT
Hs287870 CTTGAACATT
Hs287871 CTTGAACATT
Hs287872 CTTGAACATT
;end;
---------- Post updated at 08:54 PM ---------- Previous update was at 08:53 PM ----------
whoops i meant to analyze the datafile
"INPUTFILE" at the beginning.
you probably knew what i meant.
thanks again for any advice
best
mikey
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hi,
I have a file with duplicate lines in it. I want to keep only the duplicate lines and delete the non duplicates. Can some one please help me?
Regards
Narayana Gupta (3 Replies)
Discussion started by: guptan
3 Replies
2. Shell Programming and Scripting
Hi Guys and Girls
I'm having trouble outputing from a sorted file... i have a looooong list of PVIDs and need to only output only those which occur 4 times!! Any suggestions?
ie I need to uniq (but not uniq (i've been through the man pg) this:
cat /tmp/disk.out|awk '{print $3}' |grep -v... (6 Replies)
Discussion started by: serm
6 Replies
3. UNIX for Dummies Questions & Answers
Hi all,
I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated.
abc gi4597 9997 cgcgtgcg $%^&*()()*
abc gi4597 9997 cgcgtgcg $%^&*()()*
ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies
4. Shell Programming and Scripting
Hi All,
I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has:
00:44,37,67,56,15,12
00:44,34,67,56,15,12
00:44,58,67,56,15,12
00:44,35,67,56,15,12
00:59,37,67,56,15,12
00:59,34,67,56,15,12
00:59,35,67,56,15,12... (7 Replies)
Discussion started by: faiz1985
7 Replies
5. Shell Programming and Scripting
I have a file where some of the lines are duplicates.
How do I use bash to print all the lines that have duplicates? (2 Replies)
Discussion started by: locoroco
2 Replies
6. UNIX for Advanced & Expert Users
Hi All,
I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space.
I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies
7. Shell Programming and Scripting
Hello,
I'm trying to write an script that in a txt with lines with 2 or more columns separated by commas, like
hello, one, two
bye, goal
first, second, third, fourth
hard, difficult.strong, word.line
will create another in which if a line has more than 2 columns, it will have another... (4 Replies)
Discussion started by: clinisbud
4 Replies
8. UNIX for Dummies Questions & Answers
I have a file with following data
A
B
C
I would like to print like this n times(For eg:5 times)
A
B
C
A
B
C
A
B
C
A
B
C
A (7 Replies)
Discussion started by: nsuresh316
7 Replies
9. Shell Programming and Scripting
Dear All,
I have a two-column data file and want to duplicate data in second column w.r.t. first column.
My file looks like:
2 5.672
1 3.593
3 8.260
...
And the desired format:
5.672
5.672
3.593
8.260
8.260
8.260
...
How may I do so please? I appreciate any help you may... (2 Replies)
Discussion started by: sxiong
2 Replies
10. Shell Programming and Scripting
Hi All,
I am storing the result in the variable result_text using the below code.
result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines.
file and time for the interval 03:30 - 03:45
file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies