Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Merging tables: identifiying common and unique elements Post 302822491 by lsantome on Monday 17th of June 2013 05:09:20 PM
Old 06-17-2013
Hi MadeInGermany, thank you for your quick reply!

A1: Yes, every table is contained in a single file. I merge them two by two, based on their filename (pattern) with the following code:

Code:
for sample in `for file in *.tab; do echo ${file/_*/}; done | sort | uniq`; do
    cat $sample* \
    | cut -f1-33 \
    | sort -u -k2,2 \
    > $sample.tab
done

Explanation:
- The pattern defines which files are going to be merged
- Open files and select columns 1 to 33
- Sort rows based on column 2, removing duplicates
- Create an output file based on the pattern used in step one.

A2: No, identical lines do not have the same line number

Thank you again

Best,

lsantome
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

find common elements in 2 files (for loop)

Hi, i'm new here (and to scripting too). I was hoping for some help in comparing two files. i have a file called 'file1' with a list of names in the following format: adam jones paul higgins kelly lowe i also have another file which may contain some of the names but with a lot of... (4 Replies)
Discussion started by: ibking
4 Replies

2. Shell Programming and Scripting

Merging two files with a common column

Hi, I have two files file1 and file2. I have to merge the columns of those two files into file3 based on common column of two files. To be simple. file1: Row-id name1 13456 Rahul 16789 Vishal 18901 Karan file2 : Row-id place 18901 Mumbai ... (2 Replies)
Discussion started by: manneni prakash
2 Replies

3. UNIX for Dummies Questions & Answers

Merging Tables by a column

Dear Friends, I really do not know Linux and I really would like to understand it because it does help to work with large data. I am reading this forum for 1 week to try a solution for my problem. I think that, using others post informations, I was almost there... I have 2 big tables... (4 Replies)
Discussion started by: lColli
4 Replies

4. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

5. Shell Programming and Scripting

Creating array with non-duplicate / unique elements in ksh

Hi all, I have created 3 arrays which can have common elements in each like- arr_a contains str1 str2 str3 str4 str5 arr_b contains str3 str6 str7 str1 str8 arr_c contains str4 str9 str10 str2 each array is created with "set -A arr_name values" command. I want to create a resultant array-say... (1 Reply)
Discussion started by: sanzee007
1 Replies

6. Shell Programming and Scripting

Merging files with common IDs without JOIN

Hi, I am trying to merge information across 2 files. The first file is a "master" file, with all IDS. File 2 contains a subset of IDs of those in File 1. I would like to match up individuals in File 1 and File 2, and add information in File 2 to that of File 1 if they appear. However, if an... (3 Replies)
Discussion started by: hubleo
3 Replies

7. Shell Programming and Scripting

Merging two files without any common pattern

Hi I have file1 as IJU_NSOMOW; SOWWOD_TWUIQ; and file2 as how are you?; fine there; Now my problem is i need the output file as IJU_NSOMOW; how are you?; SOWWOD_TWUIQ; fine there; (2 Replies)
Discussion started by: Priya Amaresh
2 Replies

8. Shell Programming and Scripting

Count common elements in a column

HI, I have a 3-column tab separated column (approx 1GB) in which I would like to count and output the frequency of all of the common elements in the 1st column. For instance: If my input was the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 My... (4 Replies)
Discussion started by: owwow14
4 Replies

9. Shell Programming and Scripting

Merging two tables including multiple ocurrence of column identifiers and unique lines

I would like to merge two tables based on column 1: File 1: 1 today 1 green 2 tomorrow 3 red File 2: 1 a lot 1 sometimes 2 at work 2 at home 2 sometimes 3 new 4 a lot 5 sometimes 6 at work (4 Replies)
Discussion started by: BSP
4 Replies

10. Shell Programming and Scripting

Get unique elements from Array

I have an array code and output is below: echo $1 while read -r fline; do echo "%%%%%%$fline%%%%%" fmy_array+=("$fline") done <<< "$1" Output: CR30903 YU0007 SRIL CR30903 Yogesh SRIL %%%%%%CR30903 YU0007 SRIL%%%%% %%%%%%CR30903 Yogesh SRIL%%%%% ... (8 Replies)
Discussion started by: mohtashims
8 Replies
cut(1)							      General Commands Manual							    cut(1)

NAME
cut - cut out (extract) selected fields of each line of a file SYNOPSIS
list [file]... list [file]... list char] [file]... DESCRIPTION
cuts out (extracts) columns from a table or fields from each line in a file; in data base parlance, it implements the projection of a rela- tion. Fields as specified by list can be fixed length (defined in terms of character or byte position in a line when using the or option), or the length can vary from line to line and be marked with a field delimiter character such as the tab character (when using the option). can be used as a filter; if no files are given, the standard input is used. When processing single-byte character sets, the and options are equivalent and produce identical results. When processing multi-byte char- acter sets, when the and options are used together, their combined behavior is very similar, but not identical to the option. Options Options are interpreted as follows: list A comma-separated list of integer byte option), character option), or field option) numbers, in increasing order, with optional to indicate ranges. For example: Positions 1, 4, and 7. Positions 1 through 3 and 8. Positions 1 through 5 and 10. Position 3 through last position. Cut based on a list of bytes. Each selected byte is output unless the option is also specified. Cut based on character positions specified by list extracts the first 72 characters of each line). Where list is a list of fields assumed to be separated in the file by a delimiter character (see for example, copies the first and seventh field only. Lines with no field delimiters will be passed through intact (useful for table sub- headings), unless is specified. The character following is the field delimiter option only). Default is tab. Space or other characters with special meaning to the shell must be quoted. Adjacent field delimiters delimit null fields. char may be an international code set character. Do not split characters. If the high end of a range within a list is not the last byte of a character, that character is not included in the output. However, if the low end of a range within a list is not the first byte of a character, the entire character is included in the output." Suppresses lines with no delimiter characters when using option. Unless is specified, lines with no delimiters appear in the output without alteration. Hints Use to extract text from a file based on text pattern recognition (using regular expressions). Use to merge files line-by-line in columnar format. To rearrange columns in a table in a different sequence, use and See grep(1) and paste(1) for more information. EXTERNAL INFLUENCES
Environment Variables determines the interpretation of text as single and/or multi-byte characters. If is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty vari- able. If is not specified or is set to the empty string, a default of "C" (see lang(5)) is used instead of If any internationalization variable contains an invalid setting, behaves as if all internationalization variables are set to "C". See environ(5). International Code Set Support supports both single- and multi-byte character code sets. International code set characters may be specified in the char given to the option. recognizes the international code set characters according to the locale specified in the environment variable. EXAMPLES
Password file mapping of user ID to user names: Set environment variable to current login name: Convert file containing lines of arbitrary length into two files where contains the first 500 bytes (unless the 500th byte is within a multi-byte character), and contains the remainder of each line: DIAGNOSTICS
Line length must not exceed characters or fields, including the new-line character (see limits(5). Missing or option or incorrectly specified list. No error occurs if a line has fewer fields than the list calls for. list is empty. WARNINGS
does not expand tabs. Pipe text through expand(1) if tab expansion is required. Backspace characters are treated the same as any other character. To eliminate backspace characters before processing by use the or com- mand (see fold(1) and col(1)). AUTHOR
was developed by OSF and HP. SEE ALSO
grep(1), paste(1). STANDARDS CONFORMANCE
cut(1)
All times are GMT -4. The time now is 04:35 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy