Sponsored Content
Top Forums Shell Programming and Scripting Request to check:remove duplicates only in first column Post 302676871 by shitson on Wednesday 25th of July 2012 08:11:55 AM
Old 07-25-2012
Code:
#!/usr/bin/python

import sys

if len(sys.argv) < 2:
        print "usage:",sys.argv[0],"<file_path>"
        sys.exit(69)

f = open(sys.argv[1], 'r')
lines = f.readlines()

count = 0
index = 0

for item in lines:
        if count != 0:
                left  = lines[count].split()    
                right = lines[count-1].split()

                while left[index] == right[index]:
                        index += 1

                print ' '.join(left[index:])
                index = 0
        else:
                print lines[count].rstrip() 

        count += 1

What about this?

To use this code do the following:
  1. copy and paste this into a file, i would recommend calling it duplicate.py
  2. run the command
    Code:
    chmod +x duplicate.py

  3. then to run the script
    Code:
    ./duplicate.py path_to_file


Edit: very basic error checking, was a rush job not very good with python!

Last edited by shitson; 07-25-2012 at 09:36 AM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

2. Shell Programming and Scripting

remove duplicates based on single column

Hello, I am new to shell scripting. I have a huge file with multiple columns for example: I have 5 columns below. HWUSI-EAS000_29:1:105 + chr5 76654650 AATTGGAA HHHHG HWUSI-EAS000_29:1:106 + chr5 76654650 AATTGGAA B@HYL HWUSI-EAS000_29:1:108 + ... (4 Replies)
Discussion started by: Diya123
4 Replies

3. Shell Programming and Scripting

Request to check:remove entries more than once

Hi I have a file like this 1234 2345 567889 567889 2345 234899420 83743 2345 67890 67890 ................ so on I want to delete entries which are more than once like 2345, 567889 and 67890 so that these appear once (4 Replies)
Discussion started by: manigrover
4 Replies

4. Shell Programming and Scripting

Request to check:remove entries with N/A mentioned

Hi I have a file with following entries 122 N/A 123 5654656 123423 43534543 4544 45435 435454 N/A i Have to remove entries with N/A so that only 123 5654656 123423 43534543 4544 45435 remain in output file can anybody guide for a code/unix/perl (2 Replies)
Discussion started by: manigrover
2 Replies

5. Shell Programming and Scripting

Request to check:remove entries more than once in different column

Hi I have a file 12m 345693460 12 1234 12 1234 34 345 34 345 And I want output fiel as 12m 345693460 12 1234 34 345 hw can it be done Thanks (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

Request to check:Remove duplicates

Hi all I have a file with following kind of data I want to remove duplicates according to first column so that output contains Kindly let me scripting regading this. (4 Replies)
Discussion started by: manigrover
4 Replies

7. Shell Programming and Scripting

Request to check remove duplicates but write before it

Hi alll I have a file with following kind input I want in output duplicates should not be there but there should be numbering mentioned before that like (4 Replies)
Discussion started by: manigrover
4 Replies

8. Shell Programming and Scripting

Request to check:remove duplicates and write sytematically

Hi all I have a file with following input It contains 5 columns gene name drug drug ID disease approved Now the same gene is repeated many times with different data in column2,3 ,4,5 I want to arrange dat in such a way that there shuld be one entry in the column(no... (2 Replies)
Discussion started by: manigrover
2 Replies

9. Shell Programming and Scripting

Remove duplicates within row and separate column

Hi all I have following kind of input file ESR1 PA156 leflunomide PA450192 leflunomide CHST3 PA26503 docetaxel Pa4586; thalidomide Pa34958; decetaxel docetaxel docetaxel I want to remove duplicates and I want to separate anything before and after PAxxxx entry into columns or... (1 Reply)
Discussion started by: manigrover
1 Replies

10. Shell Programming and Scripting

Remove duplicates according to their frequency in column

Hi all, I have huge a tab-delimited file with the following format and I want to remove the duplicates according to their frequency based on Column2 and Column3. Column1 Column2 Column3 Column4 Column5 Column6 Column7 1 user1 access1 word word 3 2 2 user2 access2 ... (10 Replies)
Discussion started by: corfuitl
10 Replies
IDLE(1) 						      General Commands Manual							   IDLE(1)

NAME
IDLE - An Integrated DeveLopment Environment for Python SYNTAX
idle [ -dins ] [ -t title ] [ file ...] idle [ -dins ] [ -t title ] ( -c cmd | -r file ) [ arg ...] idle [ -dins ] [ -t title ] - [ arg ...] DESCRIPTION
This manual page documents briefly the idle command. This manual page was written for Debian because the original program does not have a manual page. For more information, refer to IDLE's help menu. IDLE is an Integrated DeveLopment Environment for Python. IDLE is based on Tkinter, Python's bindings to the Tk widget set. Features are 100% pure Python, multi-windows with multiple undo and Python colorizing, a Python shell window subclass, a debugger. IDLE is cross-plat- form, i.e. it works on all platforms where Tk is installed. OPTIONS
-h Print this help message and exit. -n Run IDLE without a subprocess (see Help/IDLE Help for details). The following options will override the IDLE 'settings' configuration: -e Open an edit window. -i Open a shell window. The following options imply -i and will open a shell: -c cmd Run the command in a shell, or -r file Run script from file. -d Enable the debugger. -s Run $IDLESTARTUP or $PYTHONSTARTUP before anything else. -t title Set title of shell window. A default edit window will be bypassed when -c, -r, or - are used. [arg]* and [file]* are passed to the command (-c) or script (-r) in sys.argv[1:]. EXAMPLES
idle Open an edit window or shell depending on IDLE's configuration. idle foo.py foobar.py Edit the files, also open a shell if configured to start with shell. idle -est "Baz" foo.py Run $IDLESTARTUP or $PYTHONSTARTUP, edit foo.py, and open a shell window with the title "Baz". idle -c "import sys; print sys.argv" "foo" Open a shell window and run the command, passing "-c" in sys.argv[0] and "foo" in sys.argv[1]. idle -d -s -r foo.py "Hello World" Open a shell window, run a startup script, enable the debugger, and run foo.py, passing "foo.py" in sys.argv[0] and "Hello World" in sys.argv[1]. echo "import sys; print sys.argv" | idle - "foobar" Open a shell window, run the script piped in, passing '' in sys.argv[0] and "foobar" in sys.argv[1]. SEE ALSO
python(1). AUTHORS
Various. 21 September 2004 IDLE(1)
All times are GMT -4. The time now is 01:27 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy