Sponsored Content
Top Forums Shell Programming and Scripting Concatenate and sort to remove duplicates Post 303027508 by bakunin on Monday 17th of December 2018 11:59:19 AM
Old 12-17-2018
How about using sed instead of paste to pre-process the file? We would first create single lines of the blocks, then transform it back into blocks again after processing it through uniq. Here is a naive try which might need refinement:

Transform the blocks to lines:
(edited - see RudiC's post, the same idea. Basically you replace all newline characters inside a block with a temporary replacement character to get one line, RudiC used "\r", but you can use any other string as well.)

or, even simpler, using fmt ("1000" is a number higher than the number of characters a resulting line could grow, replace it with a higher number if it does not suffice). Notice, though, that transforming this back into blocks is a bit more effort because there is no replacement character for the newlines:
Code:
fmt -1000 /path/to/file > newfile

Transform the lines back to blocks (enter the ^M literally as an <ENTER>):
Code:
sed s/<replacement-for-newline>/^M/g' /path/to/newfile > file

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Hey Guys, I have file which looks like this, Contig201#numbPA Contig1452#nmdynD6PA dm022p15.r#CG6461PA dm005e16.f#SpatPA IGU001_0015_A06.f#CG17593PA I need to remove duplicates based on the chracter matching upto '#'. for example if we consider this.. Contig201#numbPA... (4 Replies)
Discussion started by: sharatz83
4 Replies

2. Shell Programming and Scripting

Sort, Uniq, Duplicates

Input File is : ------------- 25060008,0040,03, 25136437,0030,03, 25069457,0040,02, 80303438,0014,03,1st 80321837,0009,03,1st 80321977,0009,03,1st 80341345,0007,03,1st 84176527,0047,03,1st 84176527,0047,03, 20000735,0018,03,1st 25060008,0040,03, I am using the following in the script... (5 Replies)
Discussion started by: Amruta Pitkar
5 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

4. UNIX for Dummies Questions & Answers

sort by date and concatenate first three

Hi: I am trying to create some script that sorts the files in a subdirectory by date and concatenates the thre most recently created files. SAy, file1 date1 file2 date2 file3 date3 file4 date4 file5 date5 file6 date6 i only want to concatenate the first three which are the most... (4 Replies)
Discussion started by: jlarios
4 Replies

5. Solaris

concatenate/sort/cut

I have the following requirement. 1. I have to concatenate the 10 fixed width files. 2. sort based on first 10 characters 3. after that i have remove first 10 chacters from the file. can you please tell me how to do it. Thanks in Advance Samba (1 Reply)
Discussion started by: samba
1 Replies

6. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

7. Shell Programming and Scripting

Sort data by date first and then remove duplicates

Hi , I have below data inside a file named ref.psv . I want to create a shell script which will do the below 2 points : (1) sort the file content first based on the latest date which is the last column in the file (actual file its the 175th column) (2)after sorting the file based on latest date... (3 Replies)
Discussion started by: samrat dutta
3 Replies

8. Shell Programming and Scripting

Bash - remove duplicates without sort

I need to use bash to remove duplicates without using sort first. I can not use: cat file | sort | uniq But when I use only cat file | uniq some duplicates are not removed. (4 Replies)
Discussion started by: locoroco
4 Replies

9. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

10. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker... (4 Replies)
Discussion started by: gnnsprapa
4 Replies
scalac(1)							   USER COMMANDS							 scalac(1)

NAME
scalac - Compiler for the Scala 2 language SYNOPSIS
scalac [ <options> ] <source files> PARAMETERS
<options> Command line options. See OPTIONS below. <source files> One or more source files to be compiled (such as MyClass.scala). OPTIONS
The compiler has a set of standard options that are supported on the current development environment and will be supported in future releases. An additional set of non-standard options are specific to the current virtual machine implementation and are subject to change in the future. Non-standard options begin with -X. Standard Options -g:{none,source,line,vars,notc} "none" generates no debugging info, "source" generates only the source file attribute, "line" generates source and line number information, "vars" generates source, line number and local variable information, "notc" generates all of the above and will not perform tail call optimization. -nowarn Generate no warnings -verbose Output messages about what the compiler is doing -deprecation Indicate whether source should be compiled with deprecation information; defaults to off (accepted values are: on, off, yes and no) Available since Scala version 2.2.1 -unchecked Enable detailed unchecked warnings Non variable type-arguments in type patterns are unchecked since they are eliminated by erasure Available since Scala version 2.3.0 -classpath <path> Specify where to find user class files (on Unix-based systems a colon-separated list of paths, on Windows-based systems, a semi- colon-separate list of paths). This does not override the built-in ("boot") search path. The default class path is the current directory. Setting the CLASSPATH variable or using the -classpath command-line option over- rides that default, so if you want to include the current directory in the search path, you must include "." in the new settings. -sourcepath <path> Specify where to find input source files. -bootclasspath <path> Override location of bootstrap class files (where to find the standard built-in classes, such as "scala.List"). -extdirs <dirs> Override location of installed extensions. -d <directory> Specify where to place generated class files. -encoding <encoding> Specify character encoding used by source files. The default value is platform-specific (Linux: "UTF8", Windows: "Cp1252"). Executing the following code in the Scala interpreter will return the default value on your system: scala> new java.io.InputStreamReader(System.in).getEncoding -target: <target> Specify which backend to use (jvm-1.5,msil). The default value is "jvm-1.5" (was "jvm-1.4" up to Scala version 2.6.1). -print Print program with all Scala-specific features removed -optimise Generates faster bytecode by applying optimisations to the program -explaintypes Explain type errors in more detail. -uniqid Print identifiers with unique names (debugging option). -version Print product version and exit. -help Print a synopsis of standard options. Advanced Options -Xassem <file> Name of the output assembly (only relevant with -target:msil) -Xassem-path <path> List of assemblies referenced by the program (only relevant with -target:msil) -Xcheck-null Emit warning on selection of nullable reference -Xdisable-assertions Generate no assertions and assumptions -Xexperimental enable experimental extensions -Xno-uescape Disable handling of u unicode escapes -Xplug-types Parse but ignore annotations in more locations -Xplugin: <file> Load a plugin from a file -Xplugin-disable: <plugin> Disable a plugin -Xplugin-list Print a synopsis of loaded plugins -Xplugin-opt: <plugin:opt> Pass an option to a plugin -Xplugin-require: <plugin> Abort unless a plugin is available -Xprint: <phases> Print out program after <phases> (see below). -Xprint-pos Print tree positions (as offsets) -Xprint-types Print tree types (debugging option). -Xprompt Display a prompt after each error (debugging option). -Xresident Compiler stays resident, files to compile are read from standard input. -Xshow-class <class> Show class info. -Xshow-object <object> Show object info. -Xshow-phases Print a synopsis of compiler phases. -Xsource-reader <classname> Specify a custom method for reading source files. -Xscript <object> Compile as a script, wrapping the code into object.main(). Compilation Phases initial initializing compiler parse parse source files namer create symbols analyze name and type analysis refcheck reference checking uncurry uncurry function types and applications lambdalift lambda lifter typesasvalues represent types as values addaccessors add accessors for constructor arguments explicitouterclasses make links from inner classes to enclosing one explicit addconstructors add explicit constructor for each class tailcall add tail-calls wholeprog perform whole program analysis addinterfaces add one interface per class expandmixins expand mixins by code copying boxing makes boxing explicit erasure type eraser icode generate icode codegen enable code generation terminal compilation terminated all matches all phases ENVIRONMENT
JAVACMD Specify the java command to be used for running the Scala code. Arguments may be specified as part of the environment variable; spaces, quotation marks, etc., will be passed directly to the shell for expansion. JAVA_HOME Specify JDK/JRE home directory. This directory is used to locate the java command unless JAVACMD variable set. JAVA_OPTS Specify the options to be passed to the java command defined by JAVACMD. With Java 1.5 (or newer) one may for example configure the memory usage of the JVM as follows: JAVA_OPTS="-Xmx512M -Xms16M -Xss16M" With GNU Java one may configure the memory usage of the GIJ as follows: JAVA_OPTS="--mx512m --ms16m" EXAMPLES
Compile a Scala program to the current directory scalac HelloWorld Compile a Scala program to the destination directory classes scalac -d classes HelloWorld.scala Compile a Scala program using a user-defined java command env JAVACMD=/usr/local/bin/cacao scalac -d classes HelloWorld.scala Compile all Scala files found in the source directory src to the destination directory classes scalac -d classes src/*.scala EXIT STATUS
scalac returns a zero exist status if it succeeds to compile the specified input files. Non zero is returned in case of failure. AUTHOR
Written by Martin Odersky and other members of the Scala team. REPORTING BUGS
Report bugs to http://lampsvn.epfl.ch/trac/scala. COPYRIGHT
This is open-source software, available to you under a BSD-like license. See accomponying "copyright" or "LICENSE" file for copying condi- tions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. SEE ALSO
fsc(1), sbaz(1), scala(1), scaladoc(1), scalap(1) version 0.4 April 18, 2007 scalac(1)
All times are GMT -4. The time now is 03:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy