Sponsored Content
Full Discussion: problem with join
Top Forums UNIX for Dummies Questions & Answers problem with join Post 302540124 by peanuts48 on Tuesday 19th of July 2011 04:34:28 PM
Old 07-19-2011
Data problem with join

So I want to join two files that have a lot of rows
The file named gen1 has 2 columns:
Code:
head gen1

1008567 0.4026931012
1119535 0.7088912314
1120590 0.7093805634
1145994 0.7287952590
1148140 0.7313924434
1155173 0.7359550430
1188481 0.7598914553
1201155 0.7663406553
1206921 0.7706542068
1452629 1.0168528755


The file names gen2 has 3 columns
Code:
head gen2

1008567 rs9442372 1
1119535 rs11260554 1
1120590 rs10907175 1
1145994 rs2887286 1
1148140 rs3813199 1
1155173 rs11260562 1
1188481 rs12563338 1
1201155 rs6685064 1
1206921 rs3753340 1
1452629 rs9439462 1

The first two columns are teh same, but if I try to join them, the output is not right!

Code:
 
join gen1 gen2 | head 
 
 rs9442372 126931012
 rs11260554 18912314
 rs10907175 13805634
 rs2887286 187952590
 rs3813199 113924434
 rs11260562 19550430
 rs12563338 18914553
 rs6685064 163406553
 rs3753340 106542068
 rs9439462 168528755

Does anyone know why join is working so strangely?

Thanks in advance!Smilie
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

A join problem?

Hi everybody, I am hoping somebody here will be either be able to solve my troubles or at least give me a push in the right direction :) . I am developing a shell script to read in 4 different files worth of data that each contain a list of: username firstname secondname group score I... (2 Replies)
Discussion started by: jamjamjammie
2 Replies

2. Shell Programming and Scripting

Problem with Join command

Hi guyz Excuse me for posting simple question I tried join and sort and other perl commands but failed I have 2 files. 1st file contain single column with around 6000 values (rows). Second file contain 2 columns 1st column is the same column (in 1st file) but randomly ordered and second... (5 Replies)
Discussion started by: repinementer
5 Replies

3. UNIX for Dummies Questions & Answers

Join 2 files with multiple columns: awk/grep/join?

Hello, My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns: File A: (tab-delimited) PDB CHAIN Start End Fragment 1avq A 171 176 awyfan 1avq A 172 177 wyfany 1c7k A 2 7... (3 Replies)
Discussion started by: InfoSeeker
3 Replies

4. UNIX for Dummies Questions & Answers

join -t problem (newbie)

Been tearing my hair out trying to work out how to make the -t option in the join command work. Joining two files on col 1; columns in both files are separated by tabs. file1: 2010/02/01-00:00 10.63 2010/02/01-00:06 10.63 2010/02/01-00:12 10.61 2010/02/01-00:18 10.58 (there are LOTS... (4 Replies)
Discussion started by: lobsterman
4 Replies

5. Homework & Coursework Questions

How join works and the specific parameters to my problem?

1. The problem statement, all variables and given/known data: I have two files created from extracting data off of two CSV files, one containing class enrollment on a specific quarter and the other containing grades for that specific quarter. The Enrollment file generated contains course name,... (11 Replies)
Discussion started by: Lechnology
11 Replies

6. UNIX for Dummies Questions & Answers

SOLVED: Join problem

Hello, Going through book, "Guide to UNIX Using Linux". I am doing one of the projects that has me writing scripts to join files. Here is my pnumname script and I am extracting the programmers names and numbers from the program file and redirecting the output to the file pnn. I then created a... (0 Replies)
Discussion started by: thebeav
0 Replies

7. UNIX for Dummies Questions & Answers

Problem when using join command

Dear all, I have two files (each only contains 1 column) as attached. I want to combined the two files and only show the common records in both files. But when I use join command only the last row was combined. Anyone know what is the problem? I don't know how to write the correct code to only... (2 Replies)
Discussion started by: forevertl
2 Replies

8. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

9. UNIX for Dummies Questions & Answers

Weird problem with join command

I have a weird issue going on with the join command... I have two files I am trying to join...here is a line from each file with the important parts marked in red: file1: /groupspace/ccops/cmis/bauwkrcn/commsamp_20140315.txt,1 file2:... (3 Replies)
Discussion started by: dbiggied
3 Replies

10. Shell Programming and Scripting

Problem with Join Command

I have 2 files. File 1 is a daily file with only a bunch of IDs and a date column. File 2 has all the dump of IDs and their respective cost. I basically want an inner join. When I am picking a few rows from these files and joining, they work perfectly fine. But when I join the full files together,... (13 Replies)
Discussion started by: Varshha
13 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 12:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy