SAS or SSD for Ubuntu 14.04 and data analysis


 
Thread Tools Search this Thread
Special Forums Hardware SAS or SSD for Ubuntu 14.04 and data analysis
# 1  
Old 02-11-2016
SAS or SSD for Ubuntu 14.04 and data analysis

I am in the process of building a workstation and have a question related to performance. I am a scientist who deals with big data (average file size 30-50gb). My OS is ubuntu 14.04 and so far I have a 128gb dual xeon E5-2630 with 6 cores each. I/O buffering is an issue so I am adding a 256/512? PCIe card and either 2 SSD or SAS drives for the OS and software. Since the PCIe will be separate its main purpose will be for file transfer, so would a SAS or SSD be a better fit for the OS? I am leaning towards SAS for the buffering issue, but wanted to ask more knowledgeable users. I forgot to mention that there will be a separate 1 or 2TB drive. Any recommendations for the size of the SAS or SSD? Thanks Smilie.
# 2  
Old 02-12-2016
do you access your data files randomly or sequentially?
# 3  
Old 02-12-2016
I access files sequentially. Thank you Smilie.
# 4  
Old 02-12-2016
SSD is more than an order of magnitude (or much) faster than SAS high-rpm disks.
SSD is limited - usually to 1-2 TB of storage. With 128GB of memory, you could easily use SSD disks to load whatever file you want into memory - e.g., usual term is a RAMDISK. Ubuntu supports this. It also caches files very effectively without much human intervention other than configuration.

Learn about pdflush: The Linux Page Cache and pdflush

There is also vmtouch. You can force any file to be read entirely into memory. Which would definitely favor SSD.

https://hoytech.com/vmtouch/ Also note some other tools on that site.

So, I would suggest: SSD's and vmtouch (or an analagous tool.)
This User Gave Thanks to jim mcnamara For This Post:
# 5  
Old 02-12-2016
You might also consider m sata ssd
Samsung SSD 840 EVO mSATA | Samsung SSD
This User Gave Thanks to jgt For This Post:
# 6  
Old 02-13-2016
Not directly related but i had a longer workshop yesterday about our new storage system (EMC VMax 200k). EMC claims that they had intended the 300GB 15k-SAS drives for high-performance, but phase them out now because (quoting from memory) with the development of Flash-SSDs its just not worth it any more. They also claim that, because they use SLC-based hardware, they have even lower rates of disk-replacement, even in heavy-duty transactional storage systems, than with rotational disks, to which a much lower energy consumption of the SSDs compared to the 15k-SAS disks contributes. There is simply less heat involved and that shows when you pack some ~2500 disks into a rack.

You haven't said where you are going to place the workstation, but in case it is going to be somewhere near your desk: 15k-disks are awefully LOUD in addition to be premier heating devices while SSDs are completely silent.

I hope this helps.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
# 7  
Old 02-13-2016
Thank you all Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

What should I format my SSD with?

Hello All, I recently received a new SSD that I am going to use for the purpose of Booting Virtual Machines. I use VMWare Player to boot Windows Guest Operating Systems onto my Linux Laptop. I currently have a SSD drive that I use for this exact same purpose that is formatted as ext3 and I'm... (3 Replies)
Discussion started by: mrm5102
3 Replies

2. UNIX for Dummies Questions & Answers

Data analysis, Regular Expression - Unix

Hey every one! I have a dataset like this : 1 100 1 0 5 100 1 8 7 50 1 0 7 100 2 0 10 20 1 8 10 30 1 8 10 100 3 8 15 50 5 0 20 90 1 0 20 99 9 0 I wanna check if the 4th column is 0 or 8 If it's zero write the 1st column itself, if it's 8 write sum of 1st and second something... (2 Replies)
Discussion started by: @man
2 Replies

3. Shell Programming and Scripting

Help with analysis data based on particular column content

Input file: Total_counts 1306726155 100% Number_of_count_true 855020282 Number_of_count_true_1 160014283 Number_of_count_true_2 44002825 Number_of_count_true_3 18098424 Number_of_count_true_4 24693745 Number_of_count_false 115421870 Number_of_count_true 51048447 Total_number_of_false ... (2 Replies)
Discussion started by: perl_beginner
2 Replies

4. Red Hat

What is the best tools for performance data gathering and analysis?

Dear Guru, IHAC who complaint that his CentOS is getting performance issue. I have to help him out of there. Could you please tell me which tools is better to gathering the whole system performance data? -- CPU/Memory/IO(disk & Network)/swap I would like the tools could be... (6 Replies)
Discussion started by: devyfong
6 Replies

5. BSD

Using SSD in FreeBSD

Now that SSD drives are becoming mainstream, I had a few questions on installing a SSD drive in a FreeBSD environment. Can FreeBSD be made SSD aware, that is, somehow let FreeBSD know that reads and writes should be limited or deferred to extend the disk's life? Is there a setting for wear... (0 Replies)
Discussion started by: figaro
0 Replies

6. UNIX and Linux Applications

help needed- data analysis-table-chart-2d plot software

Hi all, I posted the same message under 'Kaleidagraph like software for Ubuntu' thread. I guess there may not be many people familiar with Kaleidagraph. So I post my message under another subject name. I need a tool for Ubuntu 8.10, -which is quick and easy to learn and use (as I am... (1 Reply)
Discussion started by: apprentice
1 Replies
Login or Register to Ask a Question