Sponsored Content
Top Forums Shell Programming and Scripting awk: sort lines by count of a character or string in a line Post 302450563 by Michael Stora on Friday 3rd of September 2010 04:49:13 AM
Old 09-03-2010
awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first).
I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass).

However, can it be done more optimally with a single AWK command? My AWK has improved tremendously in the last few days, but I'm not there yet.

I have written a script that "raises" all files in subdirectores up the supplied path, appending a -1, -2, -3, etc to the end of the filename (before the '.' file extension if present) if there are multiple files with the same name. It leaves all of most of those sub directories empty. I want to clean up the empty directories at the end..

Code:
find "$dir" -mindepth 1 -type d

finds them all but I need to sort them in order of deepest first or my attempts to rmdir them will fail when a folder contains deeper empty folders.

Start with this:
 
./dir
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3
./dir3/dira
./dir3/dirb
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir3/dirc

And end up with this:
 
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3/dira
./dir3/dirb
./dir3/dirc
./dir
./dir2
./dir3

Mike
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

How to count no of occurences of a character in a string in UNIX

i have a string like echo "a|b|c" . i want to count the | symbols in this string . how to do this .plz tell the command (11 Replies)
Discussion started by: kamesh83
11 Replies

2. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

3. Shell Programming and Scripting

awk find a string, print the line 2 lines below it

I am parsing a nagios config, searching for a string, and then printing the line 2 lines later (the "members" string). Here's the data: define hostgroup{ hostgroup_name chat-dev alias chat-dev members thisisahostname } define hostgroup{ ... (1 Reply)
Discussion started by: mglenney
1 Replies

4. Shell Programming and Scripting

Count character in one line

Please check the attachment for the example. Purpose: count how many "|" character in one line and also display the line number. expect result: Line 1 : there are 473 "|" characters Line 2 : there are 473 "|" characters I have tried to use awk to count it, it's ok when the statistic... (8 Replies)
Discussion started by: ambious
8 Replies

5. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Sample file: This is line one, this is another line, this is the PRIMARY INDEX line l ; This is another line The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Discussion started by: KC_Rules
5 Replies

6. Shell Programming and Scripting

awk new line issue, saying string can't contain new line character

Hi , I am doing some enhancements in an existing shell script. There it used the awk command in a function as below : float_expr() { IFS=" " command eval 'awk " BEGIN { result = $* print result exit(result == 0) }"' } It calls the function float_expr to evaluate two values ,... (1 Reply)
Discussion started by: mady135
1 Replies

7. Shell Programming and Scripting

Character count of each line

Hi, I have a file with more than 1000 lines. Most of the lines have 16 characters. I want to find out lines that have less than 14 characters (usually 12 or 13). wc -l gives me the line count and wc -c gives me the total characters in a file. I could not get the total characters for each line.... (1 Reply)
Discussion started by: bobbygsk
1 Replies

8. Shell Programming and Scripting

awk - count character count of fields

Hello All, I got a requirement when I was working with a file. Say the file has unloads of data from a table in the form 1|121|asda|434|thesi|2012|05|24| 1|343|unit|09|best|2012|11|5| I was put into a scenario where I need the field count in all the lines in that file. It was simply... (6 Replies)
Discussion started by: PikK45
6 Replies

9. UNIX for Dummies Questions & Answers

Getting the character count of the last line

I need the character count of the last line of each file in a directory, and not the total. Now I have been doing this but unfortunately, -exec doesn't support pipes: find sent/ -type f -exec tail -1|wc -c {} \; If I try this: find sent/ -type f -exec tail -1 {} \; | wc -c It will give... (6 Replies)
Discussion started by: MIA651
6 Replies

10. Shell Programming and Scripting

Count specific character of a file in each line and delete this character in a specific position

I will appreciate if you help me here in this script in Solaris Enviroment. Scenario: i have 2 files : 1) /tmp/TRANSACTIONS_DAILY_20180730.txt: 201807300000000004 201807300000000005 201807300000000006 201807300000000007 201807300000000008 2)... (10 Replies)
Discussion started by: teokon90
10 Replies
RDIST(1)						      General Commands Manual							  RDIST(1)

NAME
rdist - remote file distribution program SYNOPSIS
rdist [ -nqbRhivwy ] [ -f distfile ] [ -d var=value ] [ -m host ] [ name ... ] rdist [ -nqbRhivwy ] -c name ... [login@]host[:dest] DESCRIPTION
Rdist is a program to maintain identical copies of files over multiple hosts. It preserves the owner, group, mode, and mtime of files if possible and can update programs that are executing. Rdist reads commands from distfile to direct the updating of files and/or directo- ries. If distfile is `-', the standard input is used. If no -f option is present, the program looks first for `distfile', then `Distfile' to use as the input. If no names are specified on the command line, rdist will update all of the files and directories listed in distfile. Otherwise, the argument is taken to be the name of a file to be updated or the label of a command to execute. If label and file names con- flict, it is assumed to be a label. These may be used together to update specific files using specific commands. The -c option forces rdist to interpret the remaining arguments as a small distfile. The equivalent distfile is as follows. ( name ... ) -> [login@]host install [dest] ; Other options: -d Define var to have value. The -d option is used to define or override variable definitions in the distfile. Value can be the empty string, one name, or a list of names surrounded by parentheses and separated by tabs and/or spaces. -m Limit which machines are to be updated. Multiple -m arguments can be given to limit updates to a subset of the hosts listed the dis- tfile. -n Print the commands without executing them. This option is useful for debugging distfile. -q Quiet mode. Files that are being modified are normally printed on standard output. The -q option suppresses this. -R Remove extraneous files. If a directory is being updated, any files that exist on the remote host that do not exist in the master directory are removed. This is useful for maintaining truely identical copies of directories. -h Follow symbolic links. Copy the file that the link points to rather than the link itself. -i Ignore unresolved links. Rdist will normally try to maintain the link structure of files being transfered and warn the user if all the links cannot be found. -v Verify that the files are up to date on all the hosts. Any files that are out of date will be displayed but no files will be changed nor any mail sent. -w Whole mode. The whole file name is appended to the destination directory name. Normally, only the last component of a name is used when renaming files. This will preserve the directory structure of the files being copied instead of flattening the directory structure. For example, renaming a list of files such as ( dir1/f1 dir2/f2 ) to dir3 would create files dir3/dir1/f1 and dir3/dir2/f2 instead of dir3/f1 and dir3/f2. -y Younger mode. Files are normally updated if their mtime and size (see stat(2)) disagree. The -y option causes rdist not to update files that are younger than the master copy. This can be used to prevent newer copies on other hosts from being replaced. A warn- ing message is printed for files which are newer than the master copy. -b Binary comparison. Perform a binary comparison and update files if they differ rather than comparing dates and sizes. Distfile contains a sequence of entries that specify the files to be copied, the destination hosts, and what operations to perform to do the updating. Each entry has one of the following formats. <variable name> `=' <name list> [ label: ] <source list> `->' <destination list> <command list> [ label: ] <source list> `::' <time_stamp file> <command list> The first format is used for defining variables. The second format is used for distributing files to other hosts. The third format is used for making lists of files that have been changed since some given date. The source list specifies a list of files and/or directories on the local host which are to be used as the master copy for distribution. The destination list is the list of hosts to which these files are to be copied. Each file in the source list is added to a list of changes if the file is out of date on the host which is being updated (second format) or the file is newer than the time stamp file (third format). Labels are optional. They are used to identify a command for partial updates. Newlines, tabs, and blanks are only used as separators and are otherwise ignored. Comments begin with `#' and end with a newline. Variables to be expanded begin with `$' followed by one character or a name enclosed in curly braces (see the examples at the end). The source and destination lists have the following format: <name> or `(' <zero or more names separated by white-space> `)' The shell meta-characters `[', `]', `{', `}', `*', and `?' are recognized and expanded (on the local host only) in the same way as csh(1). They can be escaped with a backslash. The `~' character is also expanded in the same way as csh but is expanded separately on the local and destination hosts. When the -w option is used with a file name that begins with `~', everything except the home directory is appended to the destination name. File names which do not begin with `/' or `~' use the destination user's home directory as the root directory for the rest of the file name. The command list consists of zero or more commands of the following format. `install' <options> opt_dest_name `;' `notify' <name list> `;' `except' <name list> `;' `except_pat' <pattern list>`;' `special' <name list> string `;' The install command is used to copy out of date files and/or directories. Each source file is copied to each host in the destination list. Directories are recursively copied in the same way. Opt_dest_name is an optional parameter to rename files. If no install command appears in the command list or the destination name is not specified, the source file name is used. Directories in the path name will be created if they do not exist on the remote host. To help prevent disasters, a non-empty directory on a target host will never be replaced with a regular file or a symbolic link. However, under the `-R' option a non-empty directory will be removed if the corresponding filename is completely absent on the master host. The options are `-R', `-h', `-i', `-v', `-w', `-y', and `-b' and have the same semantics as options on the command line except they only apply to the files in the source list. The login name used on the destination host is the same as the local host unless the destination name is of the format ``login@host". The notify command is used to mail the list of files updated (and any errors that may have occured) to the listed names. If no `@' appears in the name, the destination host is appended to the name (e.g., name1@host, name2@host, ...). The except command is used to update all of the files in the source list except for the files listed in name list. This is usually used to copy everything in a directory except certain files. The except_pat command is like the except command except that pattern list is a list of regular expressions (see ed(1) for details). If one of the patterns matches some string within a file name, that file will be ignored. Note that since `' is a quote character, it must be doubled to become part of the regular expression. Variables are expanded in pattern list but not shell file pattern matching charac- ters. To include a `$', it must be escaped with `'. The special command is used to specify sh(1) commands that are to be executed on the remote host after the file in name list is updated or installed. If the name list is omitted then the shell commands will be executed for every file updated or installed. The shell variable `FILE' is set to the current filename before executing the commands in string. String starts and ends with `"' and can cross multiple lines in distfile. Multiple commands to the shell should be separated by `;'. Commands are executed in the user's home directory on the host being updated. The special command can be used to rebuild private databases, etc. after a program has been updated. The following is a small example. HOSTS = ( matisse root@arpa) FILES = ( /bin /lib /usr/bin /usr/games /usr/include/{*.h,{stand,sys,vax*,pascal,machine}/*.h} /usr/lib /usr/man/man? /usr/ucb /usr/local/rdist ) EXLIB = ( Mail.rc aliases aliases.dir aliases.pag crontab dshrc sendmail.cf sendmail.fc sendmail.hf sendmail.st uucp vfont ) ${FILES} -> ${HOSTS} install -R ; except /usr/lib/${EXLIB} ; except /usr/games/lib ; special /usr/sbin/sendmail "/usr/sbin/sendmail -bz" ; srcs: /usr/src/bin -> arpa except_pat ( \.o$ /SCCS$ ) ; IMAGEN = (ips dviimp catdvi) imagen: /usr/local/${IMAGEN} -> arpa install /usr/local/lib ; notify ralph ; ${FILES} :: stamp.cory notify root@cory ; FILES
distfile input command file /tmp/rdist* temporary file for update lists SEE ALSO
sh(1), csh(1), stat(2) DIAGNOSTICS
A complaint about mismatch of rdist version numbers may really stem from some problem with starting your shell, e.g., you are in too many groups. BUGS
Source files must reside on the local host where rdist is executed. There is no easy way to have a special command executed after all files in a directory have been updated. Variable expansion only works for name lists; there should be a general macro facility. Rdist aborts on files which have a negative mtime (before Jan 1, 1970). There should be a `force' option to allow replacement of non-empty directories by regular files or symlinks. A means of updating file modes and owners of otherwise identical files is also needed. 4.3 Berkeley Distribution October 22, 1996 RDIST(1)
All times are GMT -4. The time now is 09:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy