how to add duplicate lines

08-20-2010

Registered User

25, 0

Join Date: Aug 2009

Last Activity: 9 December 2016, 5:54 AM EST

Posts: 25

Thanks Given: 15

Thanked 0 Times in 0 Posts

how to add duplicate lines

Hi,
I have a file that looks like this:

a_X data
a_Y data
b data
c data
d_X data
d_Y data

I **want** to duplicate the lines without the _X and _Ys. In other words, I want it to look like this

a_X data
a_Y data
b data
b data
c data
c data
d_X data
d_Y data

I have no idea how to go about this. Another detail (or complication) is that there is a header and a footer that I would prefer not to be duplicated. Is there anyway to restrict this operation to just occur between the a line that says "matrix" and ";end;". This last part is really not necessary, but would be helpful.

I have the feeling that something like sed would be good, but I can't figure it out!

Thanks!

Mikey

mikey11415

View Public Profile for mikey11415

Find all posts by mikey11415

08-20-2010

Registered User

64, 0

Join Date: Apr 2008

Last Activity: 5 May 2015, 4:45 PM EDT

Posts: 64

Thanks Given: 0

Thanked 0 Times in 0 Posts

Not sed at all. Use awk. If a line has _ in it, print it once. If not, print it twice.

jpradley

View Public Profile for jpradley

Find all posts by jpradley

08-20-2010

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Something like this:

Code:

awk '
        /_X/ || /_Y/ { print; next; }
        { print; print; }
' input-file-name

This is simple enough that you're on your own to figure out why it works

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

08-20-2010

Registered User

25, 0

Join Date: Aug 2009

Last Activity: 9 December 2016, 5:54 AM EST

Posts: 25

Thanks Given: 15

Thanked 0 Times in 0 Posts

Wow, agama, thanks!

ok, this is my first real use of awk, more or less. here goes:

/_X/ || /_Y/
means search for exactly _X or _Y
print that line, then move onto the next line
then, there is no search term, but you print, then print (effectively printing lines without _X or _Y twice--genius!!!)

THANK YOU!!!

Is there any way to start the awk searching after the first appearance of a string, like the word "matrix"?

Best

Mikey

mikey11415

View Public Profile for mikey11415

Find all posts by mikey11415

08-20-2010

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Quote:

THANK YOU!!!
Is there any way to start the awk searching after the first appearance of a string, like the word "matrix"?

You are most welcome. Your analysis of the programme was spot on.

I'm sure some will suggest something less easy to read -- I prefer to err on the side of easy to maintain:

Code:

awk '
    /matrix/     { snarf = 1; next }    # assumes you dont want matrix lines
    snarf < 1    { next; }
    /_X/ || /_Y/ { print; next; }
                 { print; print; }
'

This User Gave Thanks to agama For This Post:

agama

View Public Profile for agama

Find all posts by agama

08-20-2010

Registered User

25, 0

Join Date: Aug 2009

Last Activity: 9 December 2016, 5:54 AM EST

Posts: 25

Thanks Given: 15

Thanked 0 Times in 0 Posts

OK, here is something to horrify the unix programmers. here i am trying to analyze the datafile "fake", i am trying to do a couple of things

take the lines between matrix and end.
remove first line
remove last line

then i used your awk expression to duplicate all remaining line

then wcount up all of the lines and stick that somewhere

then grep and wc to count the occurrences of each of a number of expressions

then stick the line counts in front of the datafiles.

then remove all of the junk files...

here is another question

if i want to write

./myscript INPUTFILE

how do I code that into the script. here i just put the name of the file into the script. this is probably an easy one...i am just new to this! i don't even know the names of what to search for.

Here is the code: I would love any feedback.
And below that is a datafile

code:

awk '/matrix/,/;end;/' INPUTFILE > ZZoutput
sed '$d' ZZoutput > ZZoutfile
sed '1d' ZZoutfile > ZZoutfile1
awk '
/_X/ || /_Y/ { print; next; }
{ print; print; }
' ZZoutfile1 > ZZ_number_of_taxa
grep 'Gg' ZZ_number_of_taxa > ZAGg
wc -l ZAGg > ZQGg
grep 'Hs' ZZ_number_of_taxa > ZAHs
wc -l ZAHs > ZQHs
grep 'Panp' ZZ_number_of_taxa > ZAPanp
wc -l ZAPanp > ZQPanp
grep 'Ptro' ZZ_number_of_taxa > ZAPtro
wc -l ZAPtro > ZQPtro
grep 'Pts' ZZ_number_of_taxa > ZAPts
wc -l ZAPts > ZQPts
grep 'Ptv' ZZ_number_of_taxa > ZAPtv
wc -l ZAPtv > ZQPtv
wc -l ZZ_number_of_taxa > ZZlinecount
cat ZZlinecount ZQ* ZZ_number_of_taxa > dataset
rm ZZ*
rm ZA*
rm ZQ*

datafile:

junk stuff
matrix
Gg447874 CTTGAACATT
Gg447875 CTTGAACATT
Hs287867 CTTGAACATT
Hs287868 CTTGAACATT
Hs287869 CTTGAACATT
Hs287870 CTTGAACATT
Hs287871 CTTGAACATT
Hs287872 CTTGAACATT
;end;

---------- Post updated at 08:54 PM ---------- Previous update was at 08:53 PM ----------

whoops i meant to analyze the datafile
"INPUTFILE" at the beginning.
you probably knew what i meant.

thanks again for any advice

best

mikey

mikey11415

View Public Profile for mikey11415

Find all posts by mikey11415

08-20-2010

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Quote:

Originally Posted by mikey11415

if i want to write

./myscript INPUTFILE

how do I code that into the script.

Parameters passed from the command line into a script can be referenced in the script using $1, $2, $3.... In your case you just need to change INPUTFILE to $1 in your script.

I prefer to assign input parameters to meaningful variable names so that it's obvious when you use them what they reference. For instance:

Code:

inputfile="$1"

And then you can use $inputfile where you have INPUTFILE in the script.

Do note that there cannot be spaces round the equal or you'll get an error.

agama

View Public Profile for agama

Find all posts by agama

UNIX for Dummies Questions & Answers

how to add duplicate lines

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Discussion started by: nalu

2. Shell Programming and Scripting

Duplicate lines

Discussion started by: sxiong

3. UNIX for Dummies Questions & Answers

Duplicate lines in a file

Discussion started by: nsuresh316

4. Shell Programming and Scripting

Script to duplicate lines

Discussion started by: clinisbud

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Discussion started by: krishnix

6. Shell Programming and Scripting

Print duplicate lines

Discussion started by: locoroco

7. Shell Programming and Scripting

Duplicate lines in a file

Discussion started by: faiz1985

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Discussion started by: dr_sabz

9. Shell Programming and Scripting

Duplicate Lines x 4

Discussion started by: serm

10. UNIX for Advanced & Expert Users

Duplicate lines in the file

Discussion started by: guptan