Problem piping find output to awk, 1st line filename is truncated, other lines are fine.


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Problem piping find output to awk, 1st line filename is truncated, other lines are fine.
# 1  
Old 08-25-2013
Problem piping find output to awk, 1st line filename is truncated, other lines are fine.

Today I needed to take a look through a load of large backup files, so I wrote the following line to find them, order them by size, and print the file sizes in GB along with the filename. What happened was odd, the output was all as expected except for the first output line which had the filename heavily truncated. I thought the problem might be with that particular file name, so I reversed the sort order, again the first filename was heavily truncated, this time a different file which had been listed correctly before I changed the ordering. [Note: I used '**' as a field seperator for awk, none of the filenames contain the string '**'.]
Code:
find . -type f -size +500M -printf '%s**%p\n' | sort -n | awk 'FS="**" {gb=$1/(2^30); printf("%f GB\t%s\n", gb, $2)}'

So I wrote the script below to create some dirs and files with different lengths and simplified the command line. Please have a look at what happens with the different find commands below, can someone explain why the first line always has the filename truncated as I can't work out why it is. Thanks.
Code:
#!/bin/bash

mkdir "Test Dir 1"
echo "Test File 1 extra chars so diff file lengths" > "Test Dir 1/Test File 1"

mkdir "Test Dir 2"
echo "Test File 2 fewer extra chars" > "Test Dir 2/Test File 2"

mkdir "Test Dir 3"
echo "Test File 3 even fewer" > "Test Dir 3/Test File 3"

mkdir "Test Dir 4"
echo "Test File 4 a few" > "Test Dir 4/Test File 4"

Code:
# Test 1 - no piping - All OK:

$ find . -type f -printf '%s**%p\n'
23**./Test Dir 3/Test File 3
30**./Test Dir 2/Test File 2
18**./Test Dir 4/Test File 4
45**./Test Dir 1/Test File 1


# Test 2 - pipe to sort - All OK:

$ find . -type f -printf '%s**%p\n' | sort -n
18**./Test Dir 4/Test File 4
23**./Test Dir 3/Test File 3
30**./Test Dir 2/Test File 2
45**./Test Dir 1/Test File 1


# Test 3 - pipe to awk - First line filename truncated:

$ find . -type f -printf '%s**%p\n' | awk 'FS="**" {printf("%d \t%s\n", $1, $2)}'
23     Dir
30     ./Test Dir 2/Test File 2
18     ./Test Dir 4/Test File 4
45     ./Test Dir 1/Test File 1


# Test 4 - pipe to sort, then to awk - First line filename truncated:

$ find . -type f -printf '%s**%p\n' | sort -n | awk 'FS="**" {printf("%d \t%s\n", $1, $2)}'
18     Dir
23     ./Test Dir 3/Test File 3
30     ./Test Dir 2/Test File 2
45     ./Test Dir 1/Test File 1


# Test 5 - pipe to reverse sort, then to awk - First line filename truncated:

$ find . -type f -printf '%s**%p\n' | sort -nr | awk 'FS="**" {printf("%d \t%s\n", $1, $2)}'
45     Dir
30     ./Test Dir 2/Test File 2
23     ./Test Dir 3/Test File 3
18     ./Test Dir 4/Test File 4

Thanks all.
# 2  
Old 08-25-2013
The input field separator in awk needs to be specified before the first line. It is also probably a good idea to remove the special meaning of the asterisks. Try:
Code:
awk -F'[*][*]' '{printf ....


Last edited by Scrutinizer; 08-25-2013 at 01:08 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 08-25-2013
Thanks Scrutinizer, that works.

I thought that anything before the awk {} was before the first line. Oops, so obvious once you know. Smilie

To anyone who's interested, this also works:
Code:
$ find . -type f -printf '%s**%p\n' | sort -nr | awk 'BEGIN {FS="**"} {printf("%d \t%s\n", $1, $2)}'
45     ./Test Dir 1/Test File 1
30     ./Test Dir 2/Test File 2
23     ./Test Dir 3/Test File 3
18     ./Test Dir 4/Test File 4

as does...

$ find . -type f -printf '%s**%p\n' | sort -nr | awk 'BEGIN {FS="[*][*]"} {printf("%d \t%s\n", $1, $2)}'

# 4  
Old 08-25-2013
Glad it helps Smilie

Note: FS="**" only works in some awks, where it may happen to mean "zero or more asterisks" .

However, this is not defined behaviour..

Quote:
The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall be special except when used in a bracket expression (see RE Bracket Expression). Any of the following uses produce undefined results:

If these characters appear first in an ERE, or immediately following a <vertical-line>, <circumflex>, or <left-parenthesis>

If a <left-brace> is not part of a valid interval expression (see EREs Matching Multiple Characters)
So it would be best to either use:
Code:
FS="[*]*"    # for zero or more asterisks

or
Code:
FS="[*][*]"  # for exactly two asterisks...

Regular Expressions: ERE Special Characters

Last edited by Scrutinizer; 08-25-2013 at 02:12 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 08-25-2013
Thanks again.

I'll take your advice and use that instead in the future. I've used '**' as a field separator for (my system's) awk in the past so I knew it worked. It's clearly a bad habit which I will stop doing.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Get an output of lines in pattern 1st line then 10th line then 11th line then 20th line and so on.

Input file: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 (6 Replies)
Discussion started by: Sagar Singh
6 Replies

2. Shell Programming and Scripting

Script using awk to find and replace a line, how to ignore comment lines

Hello, I have some code that works more or less. This is called by a make file to adjust some hard-coded definitions in the src code. The script generated some values by looking at some of the src files and then writes those values to specific locations in other files. The awk code is used to... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

3. Shell Programming and Scripting

(n)awk: print regex search output lines in one line

Hello. I have been looking high and low for the solution for this. I seems there should be a simple answer, but alas. I have a big xml file, and I need to extract certain information from specific items. The information I need can be found between a specific set of tags. let's call them... (2 Replies)
Discussion started by: Tobias-Reiper
2 Replies

4. Shell Programming and Scripting

awk problem two lines in the same line

Hi guy, I have an output command like this: Policy Name: NBU.POL.ORA.PROD Policy Type: Oracle Active: yes HW/OS/Client: Linux RedHat2.6 node1 Iclude: /usr/openv/netbackup/scripts/backup_ora1.bash I would like to parse the... (1 Reply)
Discussion started by: luca72m
1 Replies

5. Solaris

ps output truncated

Hi, I have Solaris-10 server. /usr/ucb/ps auxww is showing full path if I am running it from root. But if I run it from non-root user, its output is truncated. I don't want to use any other alternate command. Please suggest, what can be its solution. Terminal is set to term. (21 Replies)
Discussion started by: solaris_1977
21 Replies

6. Shell Programming and Scripting

Find in first column and replace the line with Awk, and output new file

Find in first column and replace the line with Awk, and output new file File1.txt"2011-11-02","Georgia","Atlanta","x","","" "2011-11-03","California","Los Angeles","x","","" "2011-11-04","Georgia","Atlanta","x","x","x" "2011-11-05","Georgia","Atlanta","x","x","" ... (4 Replies)
Discussion started by: charles33
4 Replies

7. Shell Programming and Scripting

awk find a string, print the line 2 lines below it

I am parsing a nagios config, searching for a string, and then printing the line 2 lines later (the "members" string). Here's the data: define hostgroup{ hostgroup_name chat-dev alias chat-dev members thisisahostname } define hostgroup{ ... (1 Reply)
Discussion started by: mglenney
1 Replies

8. UNIX for Dummies Questions & Answers

Using awk to get a line number to delete, piping through sed

Alright, I'm sure there's a more efficient way to do this... I'm not an expert by any means. What I'm trying to do is search a file for lines that match the two input words (first name, last name) in order to remove that line. The removal part is what I'm struggling with. Here is my code: echo... (4 Replies)
Discussion started by: lazypeterson
4 Replies

9. UNIX for Dummies Questions & Answers

piping the output of find command to grep

Hi, I did not understand why the following did not work out as I expected: find . -name "pqp.txt" | grep -v "Permission" I thought I would be able to catch whichever paths containing my pqp.txt file without receiving the display of messages such as "find: cannot access... Permisson... (1 Reply)
Discussion started by: 435 Gavea
1 Replies

10. Shell Programming and Scripting

ps output truncated

Hi! I have some shell scripts receiving in input lots of parameters and I need to select the ones having a particular value in one parameter. A typical shell command line is: PROMPT > shell_name.ksh -avalue_a -bvalue_b -cvalue_c -dvalue_d ... I used a combinaton of ps and grep commands... (1 Reply)
Discussion started by: pciatto
1 Replies
Login or Register to Ask a Question