Find the average based on similar names in the first column Post: 302739361

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Change names in a column based on the symbols in another column

If the 4th column has - sign then the names in 3rd column has to change to some user defined names (as shown in output). Thanx input1 1 a aaaaa + 2 b bbbbb + 3 c ccccc + 4 d ddddd + 5 e eeeee + 6 f xxxxx + 8 h hhhhh +...

2. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows?

Hi Friends, In continuation to my earlier post https://www.unix.com/shell-programming-scripting/99166-script-find-average-given-column-also-specified-number-rows.html I am extending my problem as follows. Input: Column1 Column2 MAS 1 MAS 4 ...

3. Shell Programming and Scripting

AWK: how to get average based on certain column

Hi, I'm new to shell programming, can anyone help me on this? I want to do following operations - 1. Average salary for each country 2. Total salary for each city and data that looks like - salary country city 10000 zzz BN 25000 zzz BN 30000 zzz BN 10000 yyy ZN 15000 yyy ZN ...

4. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Hi, I have nine files looking similar to file1 & file2 below. File1: 1 ABCA1 1 ABCC8 1 ABR:N 1 ACACB 1 ACAP2 1 ACOT1 1 ACSBG 1 ACTR1 1 ACTRT 1 ADAMT 1 AEN:N 1 AKAP1File2: 1 A4GAL 1 ACTBL 1 ACTL7

5. Shell Programming and Scripting

Help with merge two file based on similar column content

Input file 1: A1BG A1BG A1BG A1CF A1CF BCAS BCAS A2LD1 A2M A2M HAT . . Input file 2: A1BG All A1CF TEMP

6. Homework & Coursework Questions

Find the Maximum value and average of a column

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: I am trying to complete a script which will allow me to find: a) reads a value from the keyboard. (ask the...

7. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my...

8. Shell Programming and Scripting

Calculate the average of a column based on the value of another column

Hi, I would like to calculate the average of column 'y' based on the value of column 'pos'. For example, here is file1 id pos y c 11 1 220 aa 11 4333 207 f 11 5333 112 ee 11 11116 305 e 11 11117 310 r 11 22228 781 gg 11 ...

9. UNIX for Dummies Questions & Answers

To find similar items in a column

HI, I have a long file which looks like "1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL" "2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL" "3ddd_0_1" "3ddd" 150 52 "ABC*3-DEF*3-LL" "1xxx_0_1" "1xxx" 500 5 "ABC*3-DEF*3-LL" "2yyy_0_1" "2yyy" 600 10 "ABC*2-DEF*2-LL" ...

10. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ...

LEARN ABOUT POSIX

diff

DIFF(P) 						     POSIX Programmer's Manual							   DIFF(P)

NAME

       diff - compare two files

SYNOPSIS

       diff [-c| -e| -f| -C n][-br] file1 file2

DESCRIPTION

       The  diff  utility  shall compare the contents of file1 and file2 and write to standard output a list of changes necessary to convert file1
       into file2. This list should be minimal. No output shall be produced if the files are identical.

OPTIONS

       The diff utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.

       The following options shall be supported:

       -b     Cause any amount of white space at the end of a line to be treated as a single <newline> (that is, the white-space  characters  pre-
	      ceding the <newline> are ignored) and other strings of white-space characters, not including <newline>s, to compare equal.

       -c     Produce output in a form that provides three lines of context.

       -C n   Produce output in a form that provides n lines of context (where n shall be interpreted as a positive decimal integer).

       -e     Produce output in a form suitable as input for the ed utility, which can then be used to convert file1 into file2.

       -f     Produce  output in an alternative form, similar in format to -e, but not intended to be suitable as input for the ed utility, and in
	      the opposite order.

       -r     Apply diff recursively to files and directories of the same name when file1 and file2 are both directories.

OPERANDS

       The following operands shall be supported:

       file1, file2
	      A pathname of a file to be compared. If either the file1 or file2 operand is '-' , the standard input shall be used in its place.

       If both file1 and file2 are directories, diff shall not compare block special files, character special files, or FIFO special files to  any
       files and shall not compare regular files to directories. Further details are as specified in Diff Directory Comparison Format . The behav-
       ior of diff on other file types is implementation-defined when found in directories.

       If only one of file1 and file2 is a directory, diff shall be applied to the non-directory file and the file contained in the directory file
       with a filename that is the same as the last component of the non-directory file.

STDIN

       The standard input shall be used only if one of the file1 or file2 operands references standard input. See the INPUT FILES section.

INPUT FILES

       The input files may be of any type.

ENVIRONMENT VARIABLES

       The following environment variables shall affect the execution of diff:

       LANG   Provide  a  default  value  for  the  internationalization  variables  that  are	unset or null. (See the Base Definitions volume of
	      IEEE Std 1003.1-2001, Section 8.2, Internationalization Variables for the  precedence  of  internationalization  variables  used	to
	      determine the values of locale categories.)

       LC_ALL If set to a non-empty string value, override the values of all the other internationalization variables.

       LC_CTYPE
	      Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as opposed to
	      multi-byte characters in arguments and input files).

       LC_MESSAGES
	      Determine the locale that should be used to affect the format and contents of diagnostic messages  written  to  standard	error  and
	      informative messages written to standard output.

       LC_TIME
	      Determine the locale for affecting the format of file timestamps written with the -C and -c options.

       NLSPATH
	      Determine the location of message catalogs for the processing of LC_MESSAGES .

       TZ     Determine  the timezone used for calculating file timestamps written with the -C and -c options. If TZ is unset or null, an unspeci-
	      fied default timezone shall be used.

ASYNCHRONOUS EVENTS

       Default.

STDOUT

   Diff Directory Comparison Format
       If both file1 and file2 are directories, the following output formats shall be used.

       In the POSIX locale, each file that is present in only one directory shall be reported using the following format:

	      "Only in %s: %s
", <directory pathname>, <filename>

       In the POSIX locale, subdirectories that are common to the two directories may be reported with the following format:

	      "Common subdirectories: %s and %s
", <directory1 pathname>,
		  <directory2 pathname>

       For each file common to the two directories if the two files are not to be compared, the following  format  shall  be  used  in	the  POSIX
       locale:

	      "File %s is a %s while file %s is a %s
", <directory1 pathname>,
		  <file type of directory1 pathname>, <directory2 pathname>,
		  <file type of directory2 pathname>

       For each file common to the two directories, if the files are compared and are identical, no output shall be written. If the two files dif-
       fer, the following format is written:

	      "diff %s %s %s
", <diff_options>, <filename1>, <filename2>

       where <diff_options> are the options as specified on the command line.

       All directory pathnames listed in this section shall be relative to the original command line arguments. All other names of files listed in
       this section shall be filenames (pathname components).

   Diff Binary Output Format
       In  the	POSIX locale, if one or both of the files being compared are not text files, an unspecified format shall be used that contains the
       pathnames of two files being compared and the string "differ" .

       If both files being compared are text files, depending on the options specified, one of the following formats shall be used  to	write  the
       differences.

   Diff Default Output Format
       The default (without -e, -f, -c, or -C options) diff utility output shall contain lines of these forms:

	      "%da%d
", <num1>, <num2>

	      "%da%d,%d
", <num1>, <num2>, <num3>

	      "%dd%d
", <num1>, <num2>

	      "%d,%dd%d
", <num1>, <num2>, <num3>

	      "%dc%d
", <num1>, <num2>

	      "%d,%dc%d
", <num1>, <num2>, <num3>

	      "%dc%d,%d
", <num1>, <num2>, <num3>

	      "%d,%dc%d,%d
", <num1>, <num2>, <num3>, <num4>

       These  lines  resemble ed subcommands to convert file1 into file2. The line numbers before the action letters shall pertain to file1; those
       after shall pertain to file2. Thus, by exchanging a for d and reading the line in reverse order, one can  also  determine  how  to  convert
       file2 into file1. As in ed, identical pairs (where num1= num2) are abbreviated as a single number.

       Following each of these lines, diff shall write to standard output all lines affected in the first file using the format:

	      "< %s", <line>

       and all lines affected in the second file using the format:

	      "> %s", <line>

       If  there  are lines affected in both file1 and file2 (as with the c subcommand), the changes are separated with a line consisting of three
       hyphens:

	      "---
"

   Diff -e Output Format
       With the -e option, a script shall be produced that shall, when provided as input to ed, along with an appended w (write) command,  convert
       file1  into file2. Only the a (append), c (change), d (delete), i (insert), and s (substitute) commands of ed shall be used in this script.
       Text lines, except those consisting of the single character period ( '.' ), shall be output as they appear in the file.

   Diff -f Output Format
       With the -f option, an alternative format of script shall be produced. It is similar to that produced by -e,  with  the	following  differ-
       ences:

	1. It  is  expressed in reverse sequence; the output of -e orders changes from the end of the file to the beginning; the -f from beginning
	   to end.

	2. The command form <lines> <command-letter> used by -e is reversed. For example, 10c with -e would be c10 with -f.

	3. The form used for ranges of line numbers is <space>-separated, rather than comma-separated.

   Diff -c or -C Output Format
       With the -c or -C option, the output format shall consist of affected lines along with surrounding lines of  context.  The  affected  lines
       shall  show  which ones need to be deleted or changed in file1, and those added from file2.  With the -c option, three lines of context, if
       available, shall be written before and after the affected lines. With the -C option, the user can specify how many  lines  of  context  are
       written. The exact format follows.

       The name and last modification time of each file shall be output in the following format:

	      "*** %s %s
", file1, <file1 timestamp>
	      "--- %s %s
", file2, <file2 timestamp>

       Each <file> field shall be the pathname of the corresponding file being compared. The pathname written for standard input is unspecified.

       In the POSIX locale, each <timestamp> field shall be equivalent to the output from the following command:

	      date "+%a %b %e %T %Y"

       without	the  trailing  <newline>, executed at the time of last modification of the corresponding file (or the current time, if the file is
       standard input).

       Then, the following output formats shall be applied for every set of changes.

       First, a line shall be written in the following format:

	      "***************
"

       Next, the range of lines in file1 shall be written in the following format if the range contains two or more lines:

	      "*** %d,%d ****
", <beginning line number>, <ending line number>

       and the following format otherwise:

	      "*** %d ****
", <ending line number>

       The ending line number of an empty range shall be the number of the preceding line, or 0 if the range is at the start of the file.

       Next, the affected lines along with lines of context (unaffected lines) shall be written. Unaffected lines shall be written in the  follow-
       ing format:

	      "  %s", <unaffected_line>

       Deleted lines shall be written as:

	      "- %s", <deleted_line>

       Changed lines shall be written as:

	      "! %s", <changed_line>

       Next, the range of lines in file2 shall be written in the following format if the range contains two or more lines:

	      "--- %d,%d ----
", <beginning line number>, <ending line number>

       and the following format otherwise:

	      "--- %d ----
", <ending line number>

       Then,  lines of context and changed lines shall be written as described in the previous formats. Lines added from file2 shall be written in
       the following format:

	      "+ %s", <added_line>

STDERR

       The standard error shall be used only for diagnostic messages.

OUTPUT FILES

       None.

EXTENDED DESCRIPTION

       None.

EXIT STATUS

       The following exit values shall be returned:

	0     No differences were found.

	1     Differences were found.

       >1     An error occurred.

CONSEQUENCES OF ERRORS

       Default.

       The following sections are informative.

APPLICATION USAGE

       If lines at the end of a file are changed and other lines are added, diff output may show this as a delete and add, as a change,  or  as  a
       change  and add; diff is not expected to know which happened and users should not care about the difference in output as long as it clearly
       shows the differences between the files.

EXAMPLES

       If dir1 is a directory containing a directory named x, dir2 is a directory containing a directory named x, dir1/x and dir2/x  both  contain
       files named date.out, and dir2/x contains a file named y, the command:

	      diff -r dir1 dir2

       could produce output similar to:

	      Common subdirectories: dir1/x and dir2/x
	      Only in dir2/x: y
	      diff -r dir1/x/date.out dir2/x/date.out
	      1c1
	      < Mon Jul  2 13:12:16 PDT 1990
	      ---
	      > Tue Jun 19 21:41:39 PDT 1990

RATIONALE

       The -h option was omitted because it was insufficiently specified and does not add to applications portability.

       Historical  implementations  employ  algorithms that do not always produce a minimum list of differences; the current language about making
       every effort is the best this volume of IEEE Std 1003.1-2001 can do, as there is no metric that could be employed to judge the  quality	of
       implementations	against  any  and all file contents. The statement "This list should be minimal'' clearly implies that implementations are
       not expected to provide the following output when comparing two 100-line files that differ in only one character on a single line:

	      1,100c1,100
	      all 100 lines from file1 preceded with "< "
	      ---
	      all 100 lines from file2 preceded with "> "

       The "Only in" messages required when the -r option is specified are not used by most historical implementations if the -e  option  is  also
       specified. It is required here because it provides useful information that must be provided to update a target directory hierarchy to match
       a source hierarchy. The "Common subdirectories" messages are written by System V and 4.3 BSD when the -r  option  is  specified.  They  are
       allowed	here but are not required because they are reporting on something that is the same, not reporting a difference, and are not needed
       to update a target hierarchy.

       The -c option, which writes output in a format using lines of context, has been included. The format is useful for a  variety  of  reasons,
       among them being much improved readability and the ability to understand difference changes when the target file has line numbers that dif-
       fer from another similar, but slightly different, copy. The patch utility is most valuable when working with difference listings using  the
       context format.	The BSD version of -c takes an optional argument specifying the amount of context. Rather than overloading -c and breaking
       the Utility Syntax Guidelines for diff, the standard developers decided to add a separate option for specifying a context diff with a spec-
       ified  amount  of  context  (  -C).  Also, the format for context diffs was extended slightly in 4.3 BSD to allow multiple changes that are
       within context lines from each other to be merged together. The output format contains an additional four  asterisks  after  the  range	of
       affected lines in the first filename. This was to provide a flag for old programs (like old versions of patch) that only understand the old
       context format. The version of context described here does not require that multiple changes within context lines be merged,  but  it  does
       not prohibit it either. The extension is upwards-compatible, so any vendors that wish to retain the old version of diff can do so by adding
       the extra four asterisks (that is, utilities that currently use diff and understand the new merged format  will	also  understand  the  old
       unmerged format, but not vice versa).

       The substitute command was added as an additional format for the -e option. This was added to provide implementations with a way to fix the
       classic "dot alone on a line" bug present in many versions of diff. Since many implementations have fixed this bug, the standard developers
       decided not to standardize broken behavior, but rather to provide the necessary tool for fixing the bug. One way to fix this bug is to out-
       put two periods whenever a lone period is needed, then terminate the append command with a period, and then use the substitute  command	to
       convert the two periods into one period.

       The  BSD-derived  -r  option  was added to provide a mechanism for using diff to compare two file system trees. This behavior is useful, is
       standard practice on all BSD-derived systems, and is not easily reproducible with the find utility.

       The requirement that diff not compare files in some circumstances, even though they have the same name, is based on the	actual	output	of
       historical  implementations.  The  message  specified  here  is already in use when a directory is being compared to a non-directory. It is
       extended here to preclude the problems arising from running into FIFOs and other files that would cause diff to hang waiting for input with
       no  indication  to  the user that diff was hung. In most common usage, diff -r should indicate differences in the file hierarchies, not the
       difference of contents of devices pointed to by the hierarchies.

       Many early implementations of diff require seekable files. Since the System Interfaces volume of IEEE Std 1003.1-2001 supports named pipes,
       the  standard  developers decided that such a restriction was unreasonable. Note also that the allowed filename - almost always refers to a
       pipe.

       No directory search order is specified for diff. The historical ordering is, in fact, not optimal, in that it prints out all of the differ-
       ences at the current level, including the statements about all common subdirectories before recursing into those subdirectories.

       The message:

	      "diff %s %s %s
", <diff_options>, <filename1>, <filename2>

       does not vary by locale because it is the representation of a command, not an English sentence.

FUTURE DIRECTIONS

       None.

SEE ALSO

       cmp , comm , ed , find

COPYRIGHT

       Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1, 2003 Edition, Standard for Information Technol-
       ogy -- Portable Operating System Interface (POSIX), The Open Group Base Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of
       Electrical  and	Electronics  Engineers, Inc and The Open Group. In the event of any discrepancy between this version and the original IEEE
       and The Open Group Standard, the original IEEE and The Open Group Standard is the referee document. The original Standard can  be  obtained
       online at http://www.opengroup.org/unix/online.html .

IEEE
/The Open Group						       2003								   DIFF(P)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Change names in a column based on the symbols in another column

Discussion started by: repinementer

2. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows?

Discussion started by: ks_reddy

3. Shell Programming and Scripting

AWK: how to get average based on certain column

Discussion started by: shell123

4. Shell Programming and Scripting

Joining multiple files based on one column with different and similar values (shell or perl)

Discussion started by: seqbiologist