Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Sort and uniq lines of a file while keeping a header line Post 302314909 by Digby on Monday 11th of May 2009 04:28:14 AM
Old 05-11-2009
Sort and uniq lines of a file while keeping a header line

So, I have a file that has some duplicate lines. The file has a header line that I would like to keep at the top.

I could do this by extracting the header from the file, 'sort -u' the remaining lines, and recombine them. But they are quite big, so if there is a way to do it with a single command, that would be great.

Anyone any ideas? Thanks in advance.

Here's an example file

Code:
MAKEBITS 1.0 574331 /home/woodd/workspace2/ibis_3d/scripts/pdblist2fps.py
a139_371_frag1 1 2 3 4 5 6 10 11
a139_371_frag2 2 4 5 158 159 160 161 162
a139_371_frag3 2 6 159 160 161 258 259 260
a139_371_frag4 1 2 3 4 5 6 10 11
a139_371_frag5 1 2 3 4 5 6 10 11
a139_371_frag6 1 2 3 4 5 6 10 11
a139_IMD_frag1 1 3 4 6 57 58 59 89
a139_371_frag1 1 2 3 4 5 6 10 11
a139_371_frag2 2 4 5 158 159 160 161 162
a139_371_frag3 2 6 159 160 161 258 259 260
a139_371_frag4 1 2 3 4 5 6 10 11
a139_371_frag5 1 2 3 4 5 6 10 11
a139_371_frag6 1 2 3 4 5 6 10 11
a139_IMD_frag1 1 3 4 6 57 58 59 89

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to sort standard input without first line < Header >

Do somebody have idea How to sort standard input without first line which in my case it's header Example: Cnt|VT |STAT|Date |Time |From |Alert Message |Instance | 125| | | 260308 |160026 |ZAMUAT2|ifpollq... (8 Replies)
Discussion started by: pp56825
8 Replies

2. UNIX for Dummies Questions & Answers

duplicated lines not recognized by sort and uniq

Hello all, I've got a strange behaviour of sort and uniq commands: they do not recognise apparently duplicated lines in a file (already sorted). The lines are identical by eye, but they must differ in smth, because when they are put in two files, those have slightly different size. What can make... (8 Replies)
Discussion started by: roussine
8 Replies

3. UNIX for Dummies Questions & Answers

find uniq lines in file, using the first field of line

Hello all, new to unix and have just found the forum. I think I will be here quite often, and hope that in time i will be able to provide soem help, role on not being a newbie anymore :) I have a question which iI am hoping someone could help me with. If i have a file with lines in in thus... (8 Replies)
Discussion started by: grom
8 Replies

4. Shell Programming and Scripting

unix sort according to a header line

Hi, I have a file with a header line, followed by some contents. How can I sort the file according to header lines? eg. /* abcd_005*/ a bc /* abcd_001*/ d e /* abcd_002*/ x y desired output: /*abcd_001*/ (0 Replies)
Discussion started by: neil.0412
0 Replies

5. UNIX for Advanced & Expert Users

$cat file|sort|uniq

I have a file with list of redundant server names and want only unique ones of those. I have used following command but still redudant ones are listing $cat file|sort|uniq where could be the problem. Thanks, Srinivas (3 Replies)
Discussion started by: srinivas Juluri
3 Replies

6. UNIX for Dummies Questions & Answers

Sort a tab file with header.

How to sort a tab delimited file first on col1 and then on col2. Also I need to keep the header intact. file.txt val1 val2 val3 val4 a b c d m n o p e f g h i j k l ... (3 Replies)
Discussion started by: mary271
3 Replies

7. Shell Programming and Scripting

Remove last few characters in a file but keeping Header and trailer intact

Hi All, I am trying write a simple command using AWK and SED to this but without any success. Here is what I am using: head -1 test1.txt>test2.txt|sed '1d;$d' test1.txt|awk '{print substr($0,0,(length($0)-2))}' >>test2.txt|tail -1 test1.txt>>test2.txt Input: Header 1234567 abcdefgh... (2 Replies)
Discussion started by: nvuradi
2 Replies

8. Shell Programming and Scripting

How do a distinct from a file using sort uniq in bash?

I have an output file .dat. From this file i have to do a distinct of the ID using the sort uniq command in bash script. How can i do it? i found : sort -u ${FILEOUT_DAT} but i don't think is my solution because the id isn't specified.. is there other solution? (7 Replies)
Discussion started by: punticci
7 Replies

9. Shell Programming and Scripting

Need to sort text keeping first line always first

I have a file is created from standard output. I have put a leading space to force the first line to collate low vis a vis the rest of the lines. If I pass the entire file to the Linux sort, it ignores the leading space and the first line appears in somewhere in the list. If I add lots of... (15 Replies)
Discussion started by: lsatenstein
15 Replies

10. Shell Programming and Scripting

Find header in a text file and prepend it to all lines until another header is found

I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty. I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies
NODESET(1)						     ClusterShell User Manual							NODESET(1)

NAME
nodeset - compute advanced nodeset operations SYNOPSIS
nodeset [COMMAND] [OPTIONS] [nodeset1 [-ixX] nodeset2|...] DESCRIPTION
nodeset is an utility command provided with the ClusterShell library which implements some features of ClusterShell's NodeSet and RangeSet Python classes. It provides easy manipulation of indexed cluster nodes and node groups. It is automatically bound to the library node group resolution mechanism. Thus, nodeset is especially useful to enhance cluster aware administration shell scripts. OPTIONS
--version show program's version number and exit -h, --help show this help message and exit -s GROUPSOURCE, --groupsource=GROUPSOURCE optional groups.conf(5) group source to use Commands: -c, --count show number of nodes in nodeset(s) -e, --expand expand nodeset(s) to separate nodes (see also -S SEPARATOR) -f, --fold fold nodeset(s) (or separate nodes) into one nodeset -l, --list list node groups, list node groups and nodes (-ll) or list node groups, nodes and node count (-lll). When no argument is specified at all, this command will list all node group names found in selected group source (see also -s GROUPSOURCE). If any nodesets are specified as argument, this command will find node groups these nodes belongs to (individually). Option- ally for each group, the fraction of these nodes being member of the group may be displayed (with -ll), and also member count/total group node count (with -lll). If a single hyphen-minus (-) is given as a nodeset, it will be read from stan- dard input. -r, --regroup fold nodes using node groups (see -s GROUPSOURCE) --groupsources list all configured group sources (see groups.conf(5)) Operations: -x SUB_NODES, --exclude=SUB_NODES exclude specified nodeset -i AND_NODES, --intersection=AND_NODES calculate nodesets intersection -X XOR_NODES, --xor=XOR_NODES calculate symmetric difference between nodesets Options: -a, --all call external node groups support to display all nodes --autostep=AUTOSTEP auto step threshold number when folding nodesets, if not specified, auto step is disabled. Example: autostep=4, "node2 node4 node6" folds in node[2,4,6] but autostep=3, "node2 node4 node6" folds in node[2-6/2] -d, --debug output more messages for debugging purpose -q, --quiet be quiet, print essential output only -R, --rangeset switch to RangeSet instead of NodeSet. Useful when working on numerical cluster ranges, eg. 1,5,18-31 -G, --groupbase hide group source prefix (always @groupname) -S SEPARATOR, --separator=SEPARATOR separator string to use when expanding nodesets (default: ' ') -I SLICE_RANGESET, --slice=SLICE_RANGESET return sliced off result; examples of SLICE_RANGESET are "0" for simple index selection, or "1-9/2,16" for complex range- set selection --split=MAXSPLIT split result into a number of subsets --contiguous split result into contiguous subsets (ie. for nodeset, subsets will contain nodes with same pattern name and a contiguous range of indexes, like foobar[1-100]; for rangeset, subsets with consists in contiguous index ranges)""" For a short explanation of these options, see -h, --help. If a single hyphen-minus (-) is given as a nodeset, it will be read from standard input. EXTENDED PATTERNS
The nodeset command benefits from ClusterShell NodeSet basic arithmetic addition. This feature extends recognized string patterns by sup- porting operators matching all Operations seen previously. String patterns are read from left to right, by proceeding any character opera- tors accordinately. Supported character operators , indicates that the union of both left and right nodeset should be computed before continuing ! indicates the difference operation & indicates the intersection operation ^ indicates the symmetric difference (XOR) operation Care should be taken to escape these characters as needed when the shell does not interpret them literally. Examples of use of extended patterns $ nodeset -f node[0-7],node[8-10] node[0-10] $ nodeset -f node[0-10]!node[8-10] node[0-7] $ nodeset -f node[0-10]&node[5-13] node[5-10] $ nodeset -f node[0-10]^node[5-13] node[0-4,11-13] Example of advanced usage $ nodeset -f @gpu^@slurm:bigmem!@chassis[1-9/2] This computes a folded nodeset containing nodes found in group @gpu and @slurm:bigmem, but not in both, minus the nodes found in odd chassis groups from 1 to 9. EXIT STATUS
An exit status of zero indicates success of the nodeset command. A non-zero exit status indicates failure. EXAMPLES
Getting the node count $ nodeset -c node[0-7,32-159] 136 $ nodeset -c node[0-7,32-159] node[160-163] 140 $ nodeset -c @login 4 Folding nodesets $ nodeset -f node[0-7,32-159] node[160-163] node[0-7,32-163] $ echo node3 node6 node1 node2 node7 node5 | nodeset -f node[1-3,5-7] Expanding nodesets $ nodeset -e node[160-163] node160 node161 node162 node163 Excluding nodes from nodeset $ nodeset -f node[32-159] -x node33 node[32,34-159] Computing nodesets intersection $ nodeset -f node[32-159] -i node[0-7,20-21,32,156-159] node[32,156-159] Computing nodesets symmetric difference (xor) $ nodeset -f node[33-159] --xor node[32-33,156-159] node[32,34-155] Splitting nodes into several nodesets (expanding results) $ nodeset -e --split=3 node[1-9] node1 node2 node3 node4 node5 node6 node7 node8 node9 Splitting non-contiguous nodesets (folding results) $ nodeset -f --contiguous node2 node3 node4 node8 node9 node[2-4] node[8-9] HISTORY
Command syntax has been changed since nodeset command available with ClusterShell v1.1. Operations, like --intersection or -x, are now specified between nodesets in the command line. ClusterShell v1.1: $ nodeset -f -x node[3,5-6,9] node[1-9] node[1-2,4,7-8] ClusterShell v1.2+: $ nodeset -f node[1-9] -x node[3,5-6,9] node[1-2,4,7-8] SEE ALSO
clush(1), clubak(1), groups.conf(5). BUG REPORTS
Use the following URL to submit a bug report or feedback: https://github.com/cea-hpc/clustershell/issues AUTHOR
Stephane Thiell, CEA DAM <stephane.thiell@cea.fr> COPYRIGHT
CeCILL-C V1 1.6 2012-03-31 NODESET(1)
All times are GMT -4. The time now is 08:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy