awk: sort lines by count of a character or string in a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk: sort lines by count of a character or string in a line
# 1  
Old 09-03-2010
awk: sort lines by count of a character or string in a line

I want to sort lines by how many times a string occurs in each line (the most times first).
I know how to do this in two passes (add a count field in the first pass then sort on it in the second pass).

However, can it be done more optimally with a single AWK command? My AWK has improved tremendously in the last few days, but I'm not there yet.

I have written a script that "raises" all files in subdirectores up the supplied path, appending a -1, -2, -3, etc to the end of the filename (before the '.' file extension if present) if there are multiple files with the same name. It leaves all of most of those sub directories empty. I want to clean up the empty directories at the end..

Code:
find "$dir" -mindepth 1 -type d

finds them all but I need to sort them in order of deepest first or my attempts to rmdir them will fail when a folder contains deeper empty folders.

Start with this:
 
./dir
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3
./dir3/dira
./dir3/dirb
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir3/dirc

And end up with this:
 
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir3/dira
./dir3/dirb
./dir3/dirc
./dir
./dir2
./dir3

Mike
# 2  
Old 09-03-2010
Hi,

Why would you want to delve into depths if you have to remove all the dirs. ?

you can choose "-maxdepth" instead of "-mindepth"
Or
you can choose to do rm -rf instead.

If I got your issue right.

Regards,
# 3  
Old 09-03-2010
Nope. I've got a script that moves all the regular files in subdirectories into the main directory, appending a number on the end when there are duplicate file names.
I want to clean up the remaining empty directories SAFELY (there is a chance one or more contains a special file in which case I wan't the rmdir to fail--I do not want to use -rf! ).

mindepth is being used to exclude the main directory. Specifically mindepth 1 excludes '.'.

Mike
# 4  
Old 09-03-2010
I believe a simple reverse sort will suffice for your purpose, no?

sort -r will produce:

Code:
./dir/dirc
./dir/dirb
./dir/dira
./dir/dir2
./dir3/dirc
./dir3/dirb/dirVI
./dir3/dirb/dirV
./dir3/dirb/dirIV
./dir3/dirb/dirIII
./dir3/dirb/dirII
./dir3/dirb/dirI
./dir3/dirb
./dir3/dira
./dir3
./dir2/dirc
./dir2/dirb
./dir2/dira
./dir2
./dir

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 09-03-2010
Edit: Actually, I think this would work! Another solution.

Mike

Last edited by Michael Stora; 09-03-2010 at 06:55 AM..
# 6  
Old 09-03-2010
Or you can simply add -depth to your find command and execute rmdir, ignoring eventual errors (/dev/null).

In addition, some find implementations support the -empty flag.
Code:
find <source> -mindepth 1 -depth -type d -exec rmdir  {} +

In addition, some find implementations (like GNU find, for instance) support the -empty option:

Quote:
-empty File is empty and is either a regular file or a directory.
This User Gave Thanks to radoulov For This Post:
# 7  
Old 09-03-2010
-empty will exclude folders containing other empty folders, so I don't want that.

However, -depth does exactly what I'm looking for. For each branch, it has the deeper ones first so they can be deleted in the correct order. THANKS!

$ find . -mindepth 1 -depth -type d
./dir/dir2
./dir/dira
./dir/dirb
./dir/dirc
./dir
./dir2/dira
./dir2/dirb
./dir2/dirc
./dir2
./dir3/dira
./dir3/dirb/dirI
./dir3/dirb/dirII
./dir3/dirb/dirIII
./dir3/dirb/dirIV
./dir3/dirb/dirV
./dir3/dirb/dirVI
./dir3/dirb
./dir3/dirc
./dir3


So ultimately I ended up using:
Code:
find "$dir" -mindepth 1 -depth -type d -exec rmdir {} +

Mike

PS. Still interested in the general AWK solution . . .

Last edited by Michael Stora; 09-03-2010 at 07:06 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count specific character of a file in each line and delete this character in a specific position

I will appreciate if you help me here in this script in Solaris Enviroment. Scenario: i have 2 files : 1) /tmp/TRANSACTIONS_DAILY_20180730.txt: 201807300000000004 201807300000000005 201807300000000006 201807300000000007 201807300000000008 2)... (10 Replies)
Discussion started by: teokon90
10 Replies

2. UNIX for Dummies Questions & Answers

Getting the character count of the last line

I need the character count of the last line of each file in a directory, and not the total. Now I have been doing this but unfortunately, -exec doesn't support pipes: find sent/ -type f -exec tail -1|wc -c {} \; If I try this: find sent/ -type f -exec tail -1 {} \; | wc -c It will give... (6 Replies)
Discussion started by: MIA651
6 Replies

3. Shell Programming and Scripting

awk - count character count of fields

Hello All, I got a requirement when I was working with a file. Say the file has unloads of data from a table in the form 1|121|asda|434|thesi|2012|05|24| 1|343|unit|09|best|2012|11|5| I was put into a scenario where I need the field count in all the lines in that file. It was simply... (6 Replies)
Discussion started by: PikK45
6 Replies

4. Shell Programming and Scripting

Character count of each line

Hi, I have a file with more than 1000 lines. Most of the lines have 16 characters. I want to find out lines that have less than 14 characters (usually 12 or 13). wc -l gives me the line count and wc -c gives me the total characters in a file. I could not get the total characters for each line.... (1 Reply)
Discussion started by: bobbygsk
1 Replies

5. Shell Programming and Scripting

awk new line issue, saying string can't contain new line character

Hi , I am doing some enhancements in an existing shell script. There it used the awk command in a function as below : float_expr() { IFS=" " command eval 'awk " BEGIN { result = $* print result exit(result == 0) }"' } It calls the function float_expr to evaluate two values ,... (1 Reply)
Discussion started by: mady135
1 Replies

6. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Sample file: This is line one, this is another line, this is the PRIMARY INDEX line l ; This is another line The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Discussion started by: KC_Rules
5 Replies

7. Shell Programming and Scripting

Count character in one line

Please check the attachment for the example. Purpose: count how many "|" character in one line and also display the line number. expect result: Line 1 : there are 473 "|" characters Line 2 : there are 473 "|" characters I have tried to use awk to count it, it's ok when the statistic... (8 Replies)
Discussion started by: ambious
8 Replies

8. Shell Programming and Scripting

awk find a string, print the line 2 lines below it

I am parsing a nagios config, searching for a string, and then printing the line 2 lines later (the "members" string). Here's the data: define hostgroup{ hostgroup_name chat-dev alias chat-dev members thisisahostname } define hostgroup{ ... (1 Reply)
Discussion started by: mglenney
1 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. UNIX for Advanced & Expert Users

How to count no of occurences of a character in a string in UNIX

i have a string like echo "a|b|c" . i want to count the | symbols in this string . how to do this .plz tell the command (11 Replies)
Discussion started by: kamesh83
11 Replies
Login or Register to Ask a Question