Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

analyze(7) [redhat man page]

ANALYZE(7)							   SQL Commands 							ANALYZE(7)

NAME
ANALYZE - collect statistics about a database SYNOPSIS
ANALYZE [ VERBOSE ] [ table [ (column [, ...] ) ] ] INPUTS VERBOSE Enables display of progress messages. table The name (possibly schema-qualified) of a specific table to analyze. Defaults to all tables in the current database. column The name of a specific column to analyze. Defaults to all columns. OUTPUTS ANALYZE The command is complete. DESCRIPTION
ANALYZE collects statistics about the contents of PostgreSQL tables, and stores the results in the system table pg_statistic. Subsequently, the query planner uses the statistics to help determine the most efficient execution plans for queries. With no parameter, ANALYZE examines every table in the current database. With a parameter, ANALYZE examines only that table. It is further possible to give a list of column names, in which case only the statistics for those columns are updated. NOTES It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics will help the planner to choose the most appropriate query plan, and thereby improve the speed of query processing. A common strategy is to run VAC- UUM [vacuum(7)] and ANALYZE once a day during a low-usage time of day. Unlike VACUUM FULL, ANALYZE requires only a read lock on the target table, so it can run in parallel with other activity on the table. For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. This allows even very large tables to be analyzed in a small amount of time. Note however that the statistics are only approximate, and will change slightly each time ANALYZE is run, even if the actual table contents did not change. This may result in small changes in the planner's estimated costs shown by EXPLAIN. The collected statistics usually include a list of some of the most common values in each column and a histogram showing the approximate data distribution in each column. One or both of these may be omitted if ANALYZE deems them uninteresting (for example, in a unique-key column, there are no common values) or if the column data type does not support the appropriate operators. There is more information about the statistics in the User's Guide. The extent of analysis can be controlled by adjusting the default_statistics_target parameter variable, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE ALTER COLUMN SET STATISTICS (see ALTER TABLE [alter_table(7)]). The target value sets the maximum number of entries in the most-common-value list and the maximum number of bins in the histogram. The default target value is 10, but this can be adjusted up or down to trade off accuracy of planner estimates against the time taken for ANALYZE and the amount of space occupied in pg_statistic. In particular, setting the statistics target to zero disables collection of statistics for that column. It may be useful to do that for columns that are never used as part of the WHERE, GROUP BY, or ORDER BY clauses of queries, since the planner will have no use for statistics on such columns. The largest statistics target among the columns being analyzed determines the number of table rows sampled to prepare the statistics. Increasing the target causes a proportional increase in the time and space needed to do ANALYZE. COMPATIBILITY
SQL92 There is no ANALYZE statement in SQL92. SQL - Language Statements 2002-11-22 ANALYZE(7)

Check Out this Related Man Page

EXPLAIN(7)							   SQL Commands 							EXPLAIN(7)

NAME
EXPLAIN - show the execution plan of a statement SYNOPSIS
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement DESCRIPTION
This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned -- by plain sequential scan, index scan, etc. -- and if multiple tables are refer- enced, what join algorithms will be used to bring together the required rows from each input table. The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement (measured in units of disk page fetches). Actually two numbers are shown: the start-up time before the first row can be returned, and the total time to return all the rows. For most queries the total time is what matters, but in contexts such as a subquery in EXISTS, the planner will choose the smallest start-up time instead of the smallest total time (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest. The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality. Important: Keep in mind that the statement is actually executed when the ANALYZE option is used. Although EXPLAIN will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on an INSERT, UPDATE, DELETE, CREATE TABLE AS, or EXECUTE statement without letting the command affect your data, use this approach: BEGIN; EXPLAIN ANALYZE ...; ROLLBACK; PARAMETERS
ANALYZE Carry out the command and show the actual run times. VERBOSE Include the output column list for each node in the plan tree. statement Any SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, or CREATE TABLE AS statement, whose execution plan you wish to see. NOTES
There is only sparse documentation on the optimizer's use of cost information in PostgreSQL. Refer to in the documentation for more infor- mation. In order to allow the PostgreSQL query planner to make reasonably informed decisions when optimizing queries, the ANALYZE [analyze(7)] statement should be run to record statistics about the distribution of data within the table. If you have not done this (or if the statis- tical distribution of the data in the table has changed significantly since the last time ANALYZE was run), the estimated costs are unlikely to conform to the real properties of the query, and consequently an inferior query plan might be chosen. Genetic query optimization (GEQO) randomly tests execution plans. Therefore, when the number of join relations exceeds geqo_threshold caus- ing genetic query optimization to be used, the execution plan is likely to change each time the statement is executed. In order to measure the run-time cost of each node in the execution plan, the current implementation of EXPLAIN ANALYZE can add consider- able profiling overhead to query execution. As a result, running EXPLAIN ANALYZE on a query can sometimes take significantly longer than executing the query normally. The amount of overhead depends on the nature of the query. EXAMPLES
To show the plan for a simple query on a table with a single integer column and 10000 rows: EXPLAIN SELECT * FROM foo; QUERY PLAN --------------------------------------------------------- Seq Scan on foo (cost=0.00..155.00 rows=10000 width=4) (1 row) If there is an index and we use a query with an indexable WHERE condition, EXPLAIN might show a different plan: EXPLAIN SELECT * FROM foo WHERE i = 4; QUERY PLAN -------------------------------------------------------------- Index Scan using fi on foo (cost=0.00..5.98 rows=1 width=4) Index Cond: (i = 4) (2 rows) Here is an example of a query plan for a query using an aggregate function: EXPLAIN SELECT sum(i) FROM foo WHERE i < 10; QUERY PLAN --------------------------------------------------------------------- Aggregate (cost=23.93..23.93 rows=1 width=4) -> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4) Index Cond: (i < 10) (3 rows) Here is an example of using EXPLAIN EXECUTE to display the execution plan for a prepared query: PREPARE query(int, int) AS SELECT sum(bar) FROM test WHERE id > $1 AND id < $2 GROUP BY foo; EXPLAIN ANALYZE EXECUTE query(100, 200); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1) -> Index Scan using test_pkey on test (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99 loops=1) Index Cond: ((id > $1) AND (id < $2)) Total runtime: 0.851 ms (4 rows) Of course, the specific numbers shown here depend on the actual contents of the tables involved. Also note that the numbers, and even the selected query strategy, might vary between PostgreSQL releases due to planner improvements. In addition, the ANALYZE command uses random sampling to estimate data statistics; therefore, it is possible for cost estimates to change after a fresh run of ANALYZE, even if the actual distribution of data in the table has not changed. COMPATIBILITY
There is no EXPLAIN statement defined in the SQL standard. SEE ALSO
ANALYZE [analyze(7)] SQL - Language Statements 2010-05-14 EXPLAIN(7)
Man Page