Sponsored Content
Top Forums Shell Programming and Scripting awk: sort lines by count of a character or string in a line Post 302450563 by Michael Stora on Friday 3rd of September 2010 04:49:13 AM
Old 09-03-2010
awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first).
I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass).

However, can it be done more optimally with a single AWK command? My AWK has improved tremendously in the last few days, but I'm not there yet.

I have written a script that "raises" all files in subdirectores up the supplied path, appending a -1, -2, -3, etc to the end of the filename (before the '.' file extension if present) if there are multiple files with the same name. It leaves all of most of those sub directories empty. I want to clean up the empty directories at the end..

Code:
find "$dir" -mindepth 1 -type d

finds them all but I need to sort them in order of deepest first or my attempts to rmdir them will fail when a folder contains deeper empty folders.

Start with this:
 
./dir
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3
./dir3/dira
./dir3/dirb
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir3/dirc

And end up with this:
 
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3/dira
./dir3/dirb
./dir3/dirc
./dir
./dir2
./dir3

Mike
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to count no of occurences of a character in a string in UNIX

i have a string like echo "a|b|c" . i want to count the | symbols in this string . how to do this .plz tell the command (11 Replies)
Discussion started by: kamesh83
11 Replies

2. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

3. Shell Programming and Scripting

awk find a string, print the line 2 lines below it

I am parsing a nagios config, searching for a string, and then printing the line 2 lines later (the "members" string). Here's the data: define hostgroup{ hostgroup_name chat-dev alias chat-dev members thisisahostname } define hostgroup{ ... (1 Reply)
Discussion started by: mglenney
1 Replies

4. Shell Programming and Scripting

Count character in one line

Please check the attachment for the example. Purpose: count how many "|" character in one line and also display the line number. expect result: Line 1 : there are 473 "|" characters Line 2 : there are 473 "|" characters I have tried to use awk to count it, it's ok when the statistic... (8 Replies)
Discussion started by: ambious
8 Replies

5. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Sample file: This is line one, this is another line, this is the PRIMARY INDEX line l ; This is another line The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Discussion started by: KC_Rules
5 Replies

6. Shell Programming and Scripting

awk new line issue, saying string can't contain new line character

Hi , I am doing some enhancements in an existing shell script. There it used the awk command in a function as below : float_expr() { IFS=" " command eval 'awk " BEGIN { result = $* print result exit(result == 0) }"' } It calls the function float_expr to evaluate two values ,... (1 Reply)
Discussion started by: mady135
1 Replies

7. Shell Programming and Scripting

Character count of each line

Hi, I have a file with more than 1000 lines. Most of the lines have 16 characters. I want to find out lines that have less than 14 characters (usually 12 or 13). wc -l gives me the line count and wc -c gives me the total characters in a file. I could not get the total characters for each line.... (1 Reply)
Discussion started by: bobbygsk
1 Replies

8. Shell Programming and Scripting

awk - count character count of fields

Hello All, I got a requirement when I was working with a file. Say the file has unloads of data from a table in the form 1|121|asda|434|thesi|2012|05|24| 1|343|unit|09|best|2012|11|5| I was put into a scenario where I need the field count in all the lines in that file. It was simply... (6 Replies)
Discussion started by: PikK45
6 Replies

9. UNIX for Dummies Questions & Answers

Getting the character count of the last line

I need the character count of the last line of each file in a directory, and not the total. Now I have been doing this but unfortunately, -exec doesn't support pipes: find sent/ -type f -exec tail -1|wc -c {} \; If I try this: find sent/ -type f -exec tail -1 {} \; | wc -c It will give... (6 Replies)
Discussion started by: MIA651
6 Replies

10. Shell Programming and Scripting

Count specific character of a file in each line and delete this character in a specific position

I will appreciate if you help me here in this script in Solaris Enviroment. Scenario: i have 2 files : 1) /tmp/TRANSACTIONS_DAILY_20180730.txt: 201807300000000004 201807300000000005 201807300000000006 201807300000000007 201807300000000008 2)... (10 Replies)
Discussion started by: teokon90
10 Replies
rdist(1)						      General Commands Manual							  rdist(1)

Name
       rdist - remote file distribution program

Syntax
       rdist [ -nqbRhivwy ] [ -f distfile ] [ -d var=value ] [ -m host ] [ name ...  ]

       rdist [ -nqbRhivwy ] [ -c name ...  [login@]host[:dest]

Description
       The  program  maintains identical copies of files over multiple hosts.  It preserves the owner, group, mode, and mtime of files if possible
       and can update programs that are executing.  reads commands from distfile to direct the updating of files and/or directories.  If  distfile
       is  `-',  the  standard	input  is used.  If no -f option is present, the program looks first for `distfile', then `Distfile' to use as the
       input.  If no names are specified on the command line, will update all of the files and directories listed  in  distfile.   Otherwise,  the
       argument  is  taken  to	be  the name of a file to be updated or the label of a command to execute. If label and file names conflict, it is
       assumed to be a label.  These may be used together to update specific files using specific commands.

Options
       -c   Forces to interpret the remaining arguments as a small distfile.  The equivalent distfile is as follows.

		 ( name ... ) -> [login@]host
		      install	[dest] ;

       -d   Defines var to have value.	The -d option is used to define or override variable definitions in the distfile.  Value can be the  empty
	    string, one name, or a list of names surrounded by parentheses and separated by tabs and/or spaces.

       -m   Limit  which machines are to be updated. Multiple -m arguments can be given to limit updates to a subset of the hosts listed the dist-
	    file.

       -n   Print the commands without executing them. This option is useful for debugging distfile.

       -q   Quiet mode. Files that are being modified are normally printed on standard output. The -q option suppresses this.

       -R   Remove extraneous files. If a directory is being updated, any files that exist on the remote host that do  not  exist  in  the  master
	    directory are removed.  This is useful for maintaining truly identical copies of directories.

       -h   Follow symbolic links. Copy the file that the link points to rather than the link itself.

       -i   Ignore  unresolved	links.	Rdist will normally try to maintain the link structure of files being transferred and warn the user if all
	    the links cannot be found.

       -v   Verify that the files are up to date on all the hosts. Any files that are out of date will be displayed but no files will  be  changed
	    nor any mail sent.

       -w   Whole  mode.  The  whole  file name is appended to the destination directory name. Normally, only the last component of a name is used
	    when renaming files.  This will preserve the directory structure of the files being copied instead of flattening the directory  struc-
	    ture.  For	example,  renaming  a  list  of files such as ( dir1/f1 dir2/f2 ) to dir3 would create files dir3/dir1/f1 and dir3/dir2/f2
	    instead of dir3/f1 and dir3/f2.

       -y   Younger mode. Files are normally updated if their mtime and size (see disagree. The -y option causes rdist not to  update  files  that
	    are  younger than the master copy.	This can be used to prevent newer copies on other hosts from being replaced.  A warning message is
	    printed for files which are newer than the master copy.

       -b   Binary comparison. Perform a binary comparison and update files if they differ rather than comparing dates and sizes.

       Distfile contains a sequence of entries that specify the files to be copied, the destination hosts, and what operations to  perform  to	do
       the updating. Each entry has one of the following formats.

	    <variable name> `=' <name list>
	    [ label: ] <source list> `->' <destination list> <command list>
	    [ label: ] <source list> `::' <time_stamp file> <command list>

       The  first  format  is  used for defining variables.  The second format is used for distributing files to other hosts.  The third format is
       used for making lists of files that have been changed since some given date.  The source list specifies a list of files and/or  directories
       on the local host which are to be used as the master copy for distribution.  The destination list is the list of hosts to which these files
       are to be copied.  Each file in the source list is added to a list of changes if the file is out of date on the host which is being updated
       (second format) or the file is newer than the time stamp file (third format).

       Labels are optional. They are used to identify a command for partial updates.

       Newlines,  tabs, and blanks are only used as separators and are otherwise ignored. Comments begin with a sharp sign (#) and end with a new-
       line.

       Variables to be expanded begin with dollar sign ($) followed by one character or a name enclosed in curly braces (see the examples  at  the
       end).

       The source list and destination list have the following format:

	    <name>
       or
	    `(' <zero or more names separated by white-space> `)'

       The shell meta-characters [, ], {, }, *, and ?  are recognized and expanded (on the local host only) in the same way as They can be escaped
       with a backslash (.  The tilde character (~) is also expanded in the same way as but is expanded separately on the  local  and  destination
       hosts.	When  the  -w  option is used with a file name that begins with tilde (~), everything except the home directory is appended to the
       destination name.  File names which do not begin with / or ~ use the destination user's home directory as the root directory for  the  rest
       of the file name.

       The command list consists of zero or more commands of the following format.

	    `install'  <options>    opt_dest_name `;'
	    `notify'   <name list>  `;'
	    `except'   <name list>  `;'
	    `except_pat'	    <pattern list>`;'
	    `special'  <name list>  string `;'

       The install command is used to copy out of date files and/or directories.  Each source file is copied to each host in the destination list.
       Directories are recursively copied in the same way.  opt_dest_name is an optional parameter to rename files.  If no install command appears
       in  the	command list or the destination name is not specified, the source file name is used.  Directories in the path name will be created
       if they do not exist on the remote host.  To help prevent disasters, a non-empty directory on a target host will never be replaced  with  a
       regular	file or a symbolic link.  However, under the -R option a non-empty directory will be removed if the corresponding filename is com-
       pletely absent on the master host.  The options are -R, -h, -i, -v, -w, -y, and -b and have the same semantics as options  on  the  command
       line  except  they  only  apply to the files in the source list.  The login name used on the destination host is the same as the local host
       unless the destination name is of the format login@host.

       The notify command is used to mail the list of files updated (and any errors that may have occurred) to the listed names.  If  no  at  sign
       (@) appears in the name, the destination host is appended to the name (for example, name1@host, name2@host, ...).

       The except command is used to update all of the files in the source list except for the files listed in name list.  This is usually used to
       copy everything in a directory except certain files.

       The except_pat command is like the except command except that pattern list is a list of regular expressions (see for details).  If  one	of
       the  patterns  matches some string within a file name, that file will be ignored.  Note that since e is a quote character, it must be dou-
       bled to become part of the regular expression.  Variables are expanded in pattern list but not shell file pattern matching characters.	To
       include a dollar sign ($), it must be escaped with e.

       The  special  command  is  used	to  specify  commands that are to be executed on the remote host after the file in name list is updated or
       installed.  If the name list is omitted then the shell commands will be executed for every file updated or installed.  The  shell  variable
       FILE  is  set to the current filename before executing the commands in string.  String starts and ends with double quotes (") and can cross
       multiple lines in distfile.  Multiple commands to the shell should be separated by semi-colons (;).  Commands are executed  in  the  user's
       home  directory	on  the  host being updated.  The special command can be used to rebuild private databases, etc.  after a program has been
       updated.

       The following is a small example.

	    HOSTS = ( matisse root@arpa)

	    FILES = ( /bin /lib /usr/bin /usr/games
		       /usr/include/{*.h,{stand,sys,vax*,pascal,machine}/*.h}
		       /usr/lib /usr/man/man? /usr/ucb /usr/local/rdist )

	    EXLIB = ( Mail.rc aliases aliases.dir aliases.pag crontab dshrc
		       sendmail.cf sendmail.fc sendmail.hf sendmail.st uucp vfont )

	    ${FILES} -> ${HOSTS}
		       install -R ;
		       except /usr/lib/${EXLIB} ;
		       except /usr/games/lib ;
		       special /usr/lib/sendmail "/usr/lib/sendmail -bz" ;

	    srcs:
	    /usr/src/bin -> arpa
		       except_pat ( \.o$ /SCCS$ ) ;

	    IMAGEN = (ips dviimp catdvi)

	    imagen:
	    /usr/local/${IMAGEN} -> arpa
		       install /usr/local/lib ;
		       notify ralph ;

	    ${FILES} :: stamp.cory
		       notify root@cory ;

Restrictions
       Source files must reside on the local host where is executed.

       There is no easy way to have a special command executed after all files in a directory have been updated.

       Variable expansion only works for name lists; there should be a general macro facility.

       aborts on files which have a negative mtime (before Jan 1, 1970).

Diagnostics
       A complaint about mismatch of version numbers may really stem from some problem with starting your shell (that is,  you	are  in  too  many
       groups).

Files
       distfile       input command file
       /tmp/rdist*    temporary file for update lists

See Also
       sh(1), csh(1), stat(2)

																	  rdist(1)
All times are GMT -4. The time now is 01:00 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy