how to add duplicate lines


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers how to add duplicate lines
# 1  
Old 08-20-2010
how to add duplicate lines

Hi,
I have a file that looks like this:

a_X data
a_Y data
b data
c data
d_X data
d_Y data

I **want** to duplicate the lines without the _X and _Ys. In other words, I want it to look like this

a_X data
a_Y data
b data
b data
c data
c data
d_X data
d_Y data

I have no idea how to go about this. Another detail (or complication) is that there is a header and a footer that I would prefer not to be duplicated. Is there anyway to restrict this operation to just occur between the a line that says "matrix" and ";end;". This last part is really not necessary, but would be helpful.

I have the feeling that something like sed would be good, but I can't figure it out!

Thanks!

Mikey
# 2  
Old 08-20-2010
Not sed at all. Use awk. If a line has _ in it, print it once. If not, print it twice.
# 3  
Old 08-20-2010
Something like this:

Code:
awk '
        /_X/ || /_Y/ { print; next; }
        { print; print; }
' input-file-name

This is simple enough that you're on your own to figure out why it works Smilie
This User Gave Thanks to agama For This Post:
# 4  
Old 08-20-2010
Wow, agama, thanks!

ok, this is my first real use of awk, more or less. here goes:

/_X/ || /_Y/
means search for exactly _X or _Y
print that line, then move onto the next line
then, there is no search term, but you print, then print (effectively printing lines without _X or _Y twice--genius!!!)

THANK YOU!!!

Is there any way to start the awk searching after the first appearance of a string, like the word "matrix"?

Best

Mikey
# 5  
Old 08-20-2010
Quote:
THANK YOU!!!
Is there any way to start the awk searching after the first appearance of a string, like the word "matrix"?
You are most welcome. Your analysis of the programme was spot on.

I'm sure some will suggest something less easy to read -- I prefer to err on the side of easy to maintain:

Code:
awk '
    /matrix/     { snarf = 1; next }    # assumes you dont want matrix lines
    snarf < 1    { next; }
    /_X/ || /_Y/ { print; next; }
                 { print; print; }
'

This User Gave Thanks to agama For This Post:
# 6  
Old 08-20-2010
OK, here is something to horrify the unix programmers. here i am trying to analyze the datafile "fake", i am trying to do a couple of things

take the lines between matrix and end.
remove first line
remove last line

then i used your awk expression to duplicate all remaining line

then wcount up all of the lines and stick that somewhere

then grep and wc to count the occurrences of each of a number of expressions

then stick the line counts in front of the datafiles.

then remove all of the junk files...

here is another question

if i want to write

./myscript INPUTFILE

how do I code that into the script. here i just put the name of the file into the script. this is probably an easy one...i am just new to this! i don't even know the names of what to search for.

Here is the code: I would love any feedback.
And below that is a datafile

code:

awk '/matrix/,/;end;/' INPUTFILE > ZZoutput
sed '$d' ZZoutput > ZZoutfile
sed '1d' ZZoutfile > ZZoutfile1
awk '
/_X/ || /_Y/ { print; next; }
{ print; print; }
' ZZoutfile1 > ZZ_number_of_taxa
grep 'Gg' ZZ_number_of_taxa > ZAGg
wc -l ZAGg > ZQGg
grep 'Hs' ZZ_number_of_taxa > ZAHs
wc -l ZAHs > ZQHs
grep 'Panp' ZZ_number_of_taxa > ZAPanp
wc -l ZAPanp > ZQPanp
grep 'Ptro' ZZ_number_of_taxa > ZAPtro
wc -l ZAPtro > ZQPtro
grep 'Pts' ZZ_number_of_taxa > ZAPts
wc -l ZAPts > ZQPts
grep 'Ptv' ZZ_number_of_taxa > ZAPtv
wc -l ZAPtv > ZQPtv
wc -l ZZ_number_of_taxa > ZZlinecount
cat ZZlinecount ZQ* ZZ_number_of_taxa > dataset
rm ZZ*
rm ZA*
rm ZQ*





datafile:

junk stuff
matrix
Gg447874 CTTGAACATT
Gg447875 CTTGAACATT
Hs287867 CTTGAACATT
Hs287868 CTTGAACATT
Hs287869 CTTGAACATT
Hs287870 CTTGAACATT
Hs287871 CTTGAACATT
Hs287872 CTTGAACATT
;end;

---------- Post updated at 08:54 PM ---------- Previous update was at 08:53 PM ----------

whoops i meant to analyze the datafile
"INPUTFILE" at the beginning.
you probably knew what i meant.

thanks again for any advice

best

mikey
# 7  
Old 08-20-2010
Quote:
Originally Posted by mikey11415
if i want to write

./myscript INPUTFILE

how do I code that into the script.
Parameters passed from the command line into a script can be referenced in the script using $1, $2, $3.... In your case you just need to change INPUTFILE to $1 in your script.

I prefer to assign input parameters to meaningful variable names so that it's obvious when you use them what they reference. For instance:

Code:
inputfile="$1"

And then you can use $inputfile where you have INPUTFILE in the script.

Do note that there cannot be spaces round the equal or you'll get an error.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicate lines?

Hi All, I am storing the result in the variable result_text using the below code. result_text=$(printf "$result_text\t\n$name") The result_text is having the below text. Which is having duplicate lines. file and time for the interval 03:30 - 03:45 file and time for the interval 03:30 - 03:45 ... (4 Replies)
Discussion started by: nalu
4 Replies

2. Shell Programming and Scripting

Duplicate lines

Dear All, I have a two-column data file and want to duplicate data in second column w.r.t. first column. My file looks like: 2 5.672 1 3.593 3 8.260 ... And the desired format: 5.672 5.672 3.593 8.260 8.260 8.260 ... How may I do so please? I appreciate any help you may... (2 Replies)
Discussion started by: sxiong
2 Replies

3. UNIX for Dummies Questions & Answers

Duplicate lines in a file

I have a file with following data A B C I would like to print like this n times(For eg:5 times) A B C A B C A B C A B C A (7 Replies)
Discussion started by: nsuresh316
7 Replies

4. Shell Programming and Scripting

Script to duplicate lines

Hello, I'm trying to write an script that in a txt with lines with 2 or more columns separated by commas, like hello, one, two bye, goal first, second, third, fourth hard, difficult.strong, word.line will create another in which if a line has more than 2 columns, it will have another... (4 Replies)
Discussion started by: clinisbud
4 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. Shell Programming and Scripting

Print duplicate lines

I have a file where some of the lines are duplicates. How do I use bash to print all the lines that have duplicates? (2 Replies)
Discussion started by: locoroco
2 Replies

7. Shell Programming and Scripting

Duplicate lines in a file

Hi All, I am trying to remove the duplicate entries in a file and print them just once. For example, if my input file has: 00:44,37,67,56,15,12 00:44,34,67,56,15,12 00:44,58,67,56,15,12 00:44,35,67,56,15,12 00:59,37,67,56,15,12 00:59,34,67,56,15,12 00:59,35,67,56,15,12... (7 Replies)
Discussion started by: faiz1985
7 Replies

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies

9. Shell Programming and Scripting

Duplicate Lines x 4

Hi Guys and Girls I'm having trouble outputing from a sorted file... i have a looooong list of PVIDs and need to only output only those which occur 4 times!! Any suggestions? ie I need to uniq (but not uniq (i've been through the man pg) this: cat /tmp/disk.out|awk '{print $3}' |grep -v... (6 Replies)
Discussion started by: serm
6 Replies

10. UNIX for Advanced & Expert Users

Duplicate lines in the file

Hi, I have a file with duplicate lines in it. I want to keep only the duplicate lines and delete the non duplicates. Can some one please help me? Regards Narayana Gupta (3 Replies)
Discussion started by: guptan
3 Replies
Login or Register to Ask a Question