Sort & Split records in a file Post: 302221608

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to split multiple records file in n files

Hello, Each record has a lenght of 7 characters I have 2 types of records 010 and 011 There is no character of end of line. For example my file is like that : 010hello 010bonjour011both 011sisters I would like to have 2 files 010.txt (2 records) hello bonjour and ...

2. Shell Programming and Scripting

sort a file which has 3.7 million records

hi, I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated... sort: Write error while merging. Thanks

3. Shell Programming and Scripting

sort and split file by 2 cols (1 col after the other)

Dear All, I am a newbie to shell scripting so this one is really over my head. I have a text file with five fields as below: 76576.867188 6232.454102 2.008904 55.000000 3 76576.867188 6232.454102 3.607231 55.000000 4 76576.867188 6232.454102 1.555146 65.000000 3 76576.867188 6232.454102...

4. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Hi Gurus, I need to cut single record in the file(asdf) to multile records based on the number of bytes..(44 characters). So every record will have 44 characters. All the records should be in the same file..to each of these lines I need to add the folder(<date>) name. I have a dir. in which...

5. Shell Programming and Scripting

Sort a the file & refine data column & row format

cat file1.txt field1 "user1": field2:"data-cde" field3:"data-pqr" field4:"data-mno" field1 "user1": field2:"data-dcb" field3:"data-mxz" field4:"data-zul" field1 "user2": field2:"data-cqz" field3:"data-xoq" field4:"data-pos" Now i need to have the date like below. i have just...

6. Shell Programming and Scripting

How to read records in a file and sort it?

I have a file which has number of pipe delimited records. I am able to read the records....but I want to sort it after reading. i=0 while IFS="|" read -r usrId dataOwn expire email group secProf startDt endDt smhRole RoleCat DataProf SysRole MesgRole SearchProf do print $usrId $dataOwn...

7. Shell Programming and Scripting

Split file based on records

I have to split a file based on number of lines and the below command works fine: split -l 2 Inputfile -d OutputfileMy input file contains header, detail and trailor info as below: H D D D D TMy split files for the above command contains: First File: H DSecond File: ...

8. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record...

9. Shell Programming and Scripting

Want to grep records in alphabetical order from a file and split into other files

Hi All, I have one file containing thousands of table names in single column. Now I want that file split into multiple files e.g one file containing table names starting from A, other containing all tables starting from B...and so on..till Z. I tried below but it did not work. for i in...

10. Shell Programming and Scripting

File Move & Sort by Name - Kick out Bad File Names & More

I have a dilemma, we have users who are copying files to "directory 1." These images have file names which include the year it was taken. I need to put together a script to do the following: Examine the file naming convention, ensuring it's the proper format (e.g. test-1983_filename-123.tif)...

LEARN ABOUT DEBIAN

pegasus-analyzer

PEGASUS-ANALYZER(1)													       PEGASUS-ANALYZER(1)

NAME

       pegasus-analyzer - debugs a workflow.

SYNOPSIS

       pegasus-analyzer [--help|-h] [--quiet|-q] [--strict|-s]
			[--monitord|-m|-t] [--verbose|-v]
			[--output-dir|-o output_dir]
			[--dag dag_filename] [--dir|-d|-i input_dir]
			[--print|-p print_options] [--debug-job job]
			[--debug-dir debug_dir] [--type workflow_type]
			[--conf|-c property_file] [--files]
			[--top-dir dir_name] [workflow_directory]

DESCRIPTION

       pegasus-analyzer is a command-line utility for parsing the jobstate.log file and reporting successful and failed jobs. When executed
       without any options, it will query the SQLite or MySQL database and retrieve failed job information for the particular workflow. When
       invoked with the --files option, it will retrieve information from several log files, isolating jobs that did not complete successfully,
       and printing their stdout and stderr so that users can get detailed information about their workflow runs.

OPTIONS

       -h, --help
	   Prints a usage summary with all the available command-line options.

       -q, --quiet
	   Only print the the output and error filenames instead of their contents.

       -s, --strict
	   Get jobs' output and error filenames from the job's submit file.

       -m, -t, --monitord
	   Invoke pegasus-monitord before analyzing the jobstate.log file. Although pegasus-analyzer can be executed during the workflow execution
	   as well as after the workflow has already completed execution, pegasus-monitord" is always invoked with the --replay option. Since
	   multiple instances of pegasus-monitord" should not be executed simultaneously in the same workflow directory, the user should ensure
	   that no other instances of pegasus-monitord are running. If the run_directory is writable, pegasus-analyzer will create a jobstate.log
	   file there, rotating an older log, if it is found. If the run_directory is not writable (e.g. when the user debugging the workflow is
	   not the same user that ran the workflow), pegasus-analyzer will exit and ask the user to provide the --output-dir option, in order to
	   provide an alternative location for pegasus-monitord log files.

       -v, --verbose
	   Sets the log level for pegasus-analyzer. If omitted, the default level will be set to WARNING. When this option is given, the log level
	   is changed to INFO. If this option is repeated, the log level will be changed to DEBUG.

       -o output_dir, --output-dir output_dir
	   This option provides an alternative location for all monitoring log files for a particular workflow. It is mainly used when an user
	   does not have write privileges to a workflow directory and needs to generate the log files needed by pegasus-analyzer. If this option
	   is used in conjunction with the --monitord option, it will invoke pegasus-monitord using output_dir to store all output files. Because
	   workflows can have sub-workflows, pegasus-monitord will create its files prepending the workflow wf_uuid to each filename. This way,
	   multiple workflow files can be stored in the same directory.  pegasus-analyzer has built-in logic to find the specific jobstate.log
	   file by looking at the workflow braindump.txt file first and figuring out the corresponding wf_uuid.  If output_dir does not exist, it
	   will be created.

       --dag 'dag_filename
	   In this option, dag_filename specifies the path to the DAG file to use.  pegasus-analyzer will get the directory information from the
	   dag_filename. This option overrides the --dir option below.

       -d input_dir, -i input_dir, --dir input_dir
	   Makes pegasus-analyzer look for the jobstate.log file in the input_dir directory. If this option is omitted, pegasus-analyzer will look
	   in the current directory.

       -p print_options, --print print_options
	   Tells pegasus-analyzer what extra information it should print for failed jobs.  print_options is a comma-delimited list of options,
	   that include pre, invocation, and/or all, which activates all printing options. With the pre option, pegasus-analyzer will print the
	   pre-script information for failed jobs. For the invocation option, pegasus-analyzer will print the invocation command, so users can
	   manually run the failed job.

       --debug-job job
	   When given this option, pegasus-analyzer turns on its debug_mode, when it can be used to debug a particular job. In this mode,
	   pegasus-analyzer will create a shell script in the debug_dir (see below, for specifying it) and copy all necessary files to this local
	   directory and then execute the job locally.

       --debug-dir debug_dir
	   When in debug_mode, pegasus-analyzer will create a temporary debug directory. Users can give this option in order to specify a
	   particular debug_dir directory to be used instead.

       --type workflow_type
	   In this options, users specify what workflow_type they want to debug. At this moment, the only workflow_type available is condor and it
	   is the default value if this option is not specified.

       -c property_file, --conf property_file
	   This option is used to specify an alternative property file, which may contain the path to the database to be used by pegasus-analyzer.
	   If this option is not specified, the config file specified in the braindump.txt file will take precedence.

       --files
	   This option allows users to run pegasus-analyzer using the files in the workflow directory instead of the database as the source of
	   information.  pegasus-analyzer will output the same information, this option only changes where the data comes from.

       --top-dir dir_name
	   This option enables pegasus-analyzer to show information about sub-workflows when using the database mode. When debugging a top-level
	   workflow with failures in sub-workflows, the analyzer will automatically print the command users should use to debug a failed
	   sub-workflow. This allows the analyzer to find the database it needs to access.

ENVIRONMENT VARIABLES

       pegasus-analyzer does not require that any environmental variables be set. It locates its required Python modules based on its own
       location, and therefore should not be moved outside of Pegasus' bin directory.

EXAMPLE

       The simplest way to use pegasus-analyzer is to go to the run_directory and invoke the analyzer:

	   $ pegasus-analyzer .

       which will cause pegasus-analyzer to print information about the workflow in the current directory.

       pegasus-analyzer output contains a summary, followed by detailed information about each job that either failed, or is in an unknown state.
       Here is the summary section of the output:

	   **************************Summary***************************

	    Total jobs	       :     75 (100.00%)
	    # jobs succeeded   :     41 (54.67%)
	    # jobs failed      :      0 (0.00%)
	    # jobs unsubmitted :     33 (44.00%)
	    # jobs unknown     :      1 (1.33%)

       jobs_succeeded are jobs that have completed successfully. jobs_failed are jobs that have finished, but that did not complete successfully.
       jobs_unsubmitted are jobs that are listed in the dag_file, but no information about them was found in the jobstate.log file. Finally,
       jobs_unknown are jobs that have started, but have not reached completion.

       After the summary section, pegasus-analyzer will display information about each job in the job_failed and job_unknown categories.

	   ******************Failed jobs' details**********************

	   =======================findrange_j3=========================

	     last state: POST_SCRIPT_FAILURE
		   site: local
	    submit file: /home/user/diamond-submit/findrange_j3.sub
	    output file: /home/user/diamond-submit/findrange_j3.out.000
	     error file: /home/user/diamond-submit/findrange_j3.err.000

	   --------------------Task #1 - Summary-----------------------

	    site	: local
	    hostname	: server-machine.domain.com
	    executable	: (null)
	    arguments	: -a findrange -T 60 -i f.b2 -o f.c2
	    error	: 2
	    working dir :

       In the example above, the findrange_j3 job has failed, and the analyzer displays information about the job, showing that the job finished
       with a POST_SCRIPT_FAILURE, and lists the submit, output and error files for this job. Whenever pegasus-analyzer detects that the output
       file contains a kickstart record, it will display the breakdown containing each task in the job (in this case we only have one task).
       Because pegasus-analyzer was not invoked with the --quiet flag, it will also display the contents of the output and error files (or the
       stdout and stderr sections of the kickstart record), which in this case are both empty.

       In the case of SUBDAG and subdax jobs, pegasus-analyzer will indicate it, and show the command needed for the user to debug that
       sub-workflow. For example:

	   =================subdax_black_ID000009=====================

	     last state: JOB_FAILURE
		   site: local
	    submit file: /home/user/run1/subdax_black_ID000009.sub
	    output file: /home/user/run1/subdax_black_ID000009.out
	     error file: /home/user/run1/subdax_black_ID000009.err
	     This job contains sub workflows!
	     Please run the command below for more information:
	     pegasus-analyzer -d /home/user/run1/blackdiamond_ID000009.000

	   -----------------subdax_black_ID000009.out-----------------

	   Executing condor dagman ...

	   -----------------subdax_black_ID000009.err-----------------

       tells the user the subdax_black_ID000009 sub-workflow failed, and that it can be debugged by using the indicated pegasus-analyzer command.

SEE ALSO

       pegasus-status(1), pegasus-monitord(1), pegasus-statistics(1).

AUTHORS

       Fabio Silva <fabio at isi dot edu>

       Karan Vahi <vahi at isi dot edu>

       Pegasus Team http://pegasus.isi.edu

								    05/24/2012						       PEGASUS-ANALYZER(1)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to split multiple records file in n files

Discussion started by: jeuffeu

2. Shell Programming and Scripting

sort a file which has 3.7 million records

Discussion started by: greenworld

3. Shell Programming and Scripting

sort and split file by 2 cols (1 col after the other)

Discussion started by: Ghetz

4. Shell Programming and Scripting

Split a single record to multiple records & add folder name to each line

Discussion started by: ram2581