memcpy error

10-20-2011

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by Corona688

... which did a good job of confusing me to your point. I still don't understand what your benchmarks are supposed to prove. Luckily it doesn't matter.

Re-reading my post (#11), I see how it can be misunderstood. I did not intend for the first paragraph to have anything to do with the rest of the post. I should have indicated that clearly (either with language or formatting) or I should have made it a separate post.

When the second paragraph begins, Even so, memset() is largely irrelevant in this scenario, the scenario I'm referring to has nothing to do with the memset "benchmark" in the immediately preceding, opening paragraph. I was referring instead to what had been the topic of the thread at that point, a singularly large allocation, and how it's handled by calloc() in today's open source systems. (I'm curious if the proprietary unices behave similarly. I assume so, but I have no specific information.)

The memset benchmark isn't intended to prove anything except that memset-ing 2 GB takes on the order of a fraction of a second rather than a minute or an hour. Nothing more. As far as benchmarks go, it wasn't a particularly ambitious one.

Regards and apologies for the confusion,
Alister

Last edited by alister; 10-20-2011 at 10:12 AM..

This User Gave Thanks to alister For This Post:

alister

View Public Profile for alister

Find all posts by alister

10-20-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Thanks for the explanation, and sorry for being dense in my initial reading of it.

Quote:

Originally Posted by alister

The memset benchmark isn't intended to prove anything except that memset-ing 2 GB takes on the order of a fraction of a second rather than a minute or an hour. Nothing more.

Memsetting one gig of RAM takes 1.4 seconds for me. Imagine the amount of actual work that computer could've done in that time instead. Congratulations on your fast computer though.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

10-20-2011

Registered User

174, 8

Join Date: Oct 2003

Last Activity: 15 February 2013, 11:18 PM EST

Posts: 174

Thanks Given: 1

Thanked 8 Times in 7 Posts

Quote:

Originally Posted by Corona688

Thanks for the explanation, and sorry for being dense in my initial reading of it. Memsetting one gig of RAM takes 1.4 seconds for me. Imagine the amount of actual work that computer could've done in that time instead. Congratulations on your fast computer though. Smilie

Yeah man, honestly I don't know what he's talking about with his benchmark. I just ran my own really quick test comapring a calloc'ing vs the malloc -> memset sequence and they are darn near identical. Which makes sense because calloc has to bring in pages and once you touch the malloc'd page it has to be pulled in too. Worse, my laptop only has a gig of ram, so deity forbid I attempt to calloc or malloc and memset a gig, I'll start swapping! But, unsurprisingly, I can call malloc for a gig of ram and it'll return immediately. So long as I don't touch the pages, it'll never slow down.

Point is, calloc is different than malloc and shouldn't be used unless you need all your memory zero'd. And really, what application does? Most malloc's would be followed by something "useful" like a memcpy or filling in the malloc'd memory with useful data. Further, you most certainly wouldn't use calloc for a sparse array, that'd just be crazy.

P.S. here's uname -a

Code:

Linux laptop 2.6.32-34-generic #77-Ubuntu SMP Tue Sep 13 19:40:53 UTC 2011 i686 GNU/Linux

edit: I just ran it on a work machine, memcpy followed by memset for 1 GB and calloc for 1 GB were also identical and about 2 seconds. This is on an P570 frame with 12 GB of memory and a 2 CPU's allocated. So...I'd love to know what computer does it in "fractions of a second".

edit2: I suppose I also stirred this up, by saying "I'll wait" as if to imply it'd be ages. But, in computer terms, 2 seconds is "ages". Plus, if you had to swap, it'd really be "I'll wait" because on my poor laptop with 1 GB of RAM, asking it to memset (or calloc) a gig started it swapping; my music in the background was starting to skip and the harddrive started to spin as memory was being paged to disc. It was BAD, lol. After I killed the process, it still took about 10 seconds for the poor thing to normalize, and my music player hung and wouldn't come back, so I had to kill it, lol. Fortunately, the same program without the memset (and just the malloc) ran and ended immediately, because it just pulled in address space to the process, never a physical page, and so never did any actual work. Hence the BIG difference between malloc and calloc that started this whole off topic thread of communication.

Last edited by DreamWarrior; 10-20-2011 at 09:26 PM..

DreamWarrior

View Public Profile for DreamWarrior

Find all posts by DreamWarrior

10-20-2011

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by DreamWarrior

Point is, calloc is different than malloc and shouldn't be used unless you need all your memory zero'd.

alister explained why this is irrelevant for large amounts of memory: 1) it doesn't bother, because 2) it maps it in with mmap instead, meaning 3) the kernel does it for you at the time of paging in and not before.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

10-21-2011

Registered User

174, 8

Join Date: Oct 2003

Last Activity: 15 February 2013, 11:18 PM EST

Posts: 174

Thanks Given: 1

Thanked 8 Times in 7 Posts

Quote:

Originally Posted by Corona688

Seems that's not true for both the kernels I tested with. They both took a performance hit identical to malloc+memset (which means they both bring in pages). In fact, I'd bet the actual zero'ing itself is not the problem, it's the creating physical pages that is. Either way, on every machine (three thus far) I've ran a quick calloc(1gb, 1) or malloc(1gb) -> memset(p, '0', 1gb) comparison, they are indistinguishable so far as performance is concerned. Both heartily lose out to a plain malloc(1gb), which is instantaneous.

Regardless...I'll stick to my guns, calloc is pointless; use malloc and initialize the memory in-situ as appropriate afterwards.

DreamWarrior

View Public Profile for DreamWarrior

Find all posts by DreamWarrior

10-21-2011

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

Quote:

Originally Posted by DreamWarrior

...calloc has to bring in pages...

No. It does not. It may, but nothing requires it. Small allocations may be handled by already resident pages. Large allocations are mmap'd and since those pages will be zeroed by the kernel, calloc doesn't need to touch them. None of the callocs in in any of the standard c libraries used by the popular open source unix flavors (I looked at Linux/glibc, FreeBSD, NetBSD, and OpenBSD) will call memset to zero a page which will already be zeroed by the kernel before being made available to the process.

The most obvious explanation for why your system shows no difference between malloc+memset and calloc is that your c library's calloc is naive. Or perhaps your code is flawed. Or perhaps your kernel vm subsystem is prefaulting for some reason. Or perhaps your system's environment has enabled malloc/calloc options which affect their behavior (such as filling the allocation with "junk" or zeroes). Perhaps one of the bazillion linux kernel compile options is to blame. If it were my system, I'd look into it just to satisfy my curiosity.

Quote:

Point is, calloc is different than malloc and shouldn't be used unless you need all your memory zero'd.

Obviously. My point is only that under certain conditions malloc and calloc are practically identical (both will return zeroed memory without calling memset). See for yourself in the malloc.c source links I provided in an earlier post. You'll find that both are implemented using the same internal routines. Further, if you follow the code path for a large allocation, you'll see that a calloc never memsets (unless certain options which are disabled by default are enabled).

Quote:

I just ran it on a work machine, memcpy followed by memset for 1 GB and calloc for 1 GB were also identical and about 2 seconds. This is on an P570 frame with 12 GB of memory and a 2 CPU's allocated. So...I'd love to know what computer does it in "fractions of a second".

I must retract my earlier quarter of a second figure. I cannot reproduce it. I must have misread the value. Perhaps it was 2.50s instead of 0.25s.

Here's some code and timings from OS X running on a 2.16 GHz Core2Duo Macbook with 2 GB of 667 MHz DDR2 (similar results were observed using NetBSD on similar hardware). Without any command line arguments, the executable will attempt to calloc 1 GiB. With command line arguments, it will malloc and memset 1 GiB:

Code:

$ cat large-calloc.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ALLOCSIZE 1073741824U

int
main (int argc, char **argv) {
	void *vp;

	if (argc == 1) {
		if ((vp = calloc(1, ALLOCSIZE)) == NULL)
			return 1;
	}
	else {
		if ((vp = malloc(ALLOCSIZE)) == NULL)
			return 1;
		memset(vp, 0, ALLOCSIZE);
	}
	printf("%p\n", vp);
	return 0;
}
$ cc -Wall -pedantic large-calloc.c 
$ time ./a.out
0x2008000

real    0m0.005s    # calloc
user    0m0.001s
sys     0m0.004s
$ time ./a.out with-memset
0x2008000

real    0m2.124s    # malloc + memset
user    0m0.946s
sys     0m1.168s

Quote:

I'll stick to my guns, calloc is pointless; use malloc and initialize the memory in-situ as appropriate afterwards.

That's your prerogative, but, for a large allocation with a reasonably recent C library, you're choosing to use memset to zero malloc'd memory that is probably already zeroed, instead of using calloc, which knows whether the memory is already zeroed and can avoid the overhead of a redundant memset.

So long as they're not aimed at my code, use your guns as you see fit.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

10-21-2011

Registered User

174, 8

Join Date: Oct 2003

Last Activity: 15 February 2013, 11:18 PM EST

Posts: 174

Thanks Given: 1

Thanked 8 Times in 7 Posts

Quote:

Originally Posted by alister

And that's fine. It still seems to, on my systems, require the pages to be backed immediately and that's time consuming. It certainly still performs very similarly to malloc + memset. In fact, allocating 700 MB (all my poor laptop can handle without swapping) it is only 40ms faster than malloc + memset. While this is an eternity in computer time, given the total time for the calloc is about 700 ms, that meager 40 ms savings tells me that the bulk of the time is spent backing the pages, a job which both the first memset after a malloc and, apparently on my system, calloc need to do.

Quote:

Originally Posted by alister

The most obvious explanation for why your system shows no difference between malloc+memset and calloc is that your c library's calloc is naive. Or perhaps your code is flawed. Or perhaps your kernel vm subsystem is prefaulting for some reason. Or perhaps your system's environment has enabled malloc/calloc options which affect their behavior (such as filling the allocation with "junk" or zeroes). Perhaps one of the bazillion linux kernel compile options is to blame. If it were my system, I'd look into it just to satisfy my curiosity.

I'm sure there are many reasons, but at work I'm not the sys admin, so I don't configure the systems. I just code effectively for the systems as configured. Further, my personal system is a stock Ubuntu system, an arguably popular *nix choice. So, any code I wrote for that would fall victim to calloc's performance.

My point is, to some extent, you can code either the system you're running on or an expected worse case. In my case, it happens to be I'm running on the worse case. That means, possibly, you may consider turning away from calloc.

Quote:

Originally Posted by alister

Splendid, but you're 0 for 3 on systems I have available to me. Those being two AIX machines at various O/S levels (5.3 and 6) and Ubuntu for which I provided a uname for prior and here's the libc version:

Code:

GNU C Library (Ubuntu EGLIBC 2.11.1-0ubuntu7.8) stable release version 2.11.1

While I know it's not you're job, nor am I asking you, to figure out why these systems don't perform as you say, I'm simply supplying information to show you that it's not like I'm running some obscure setup. If I wrote code that used calloc for large allocations on any of these I'd be doing myself a disservice over regular malloc, if I didn't need zero'd memory. Time and time again I've been making that point -- is it falling on deaf ears?

Quote:

Originally Posted by alister

Code:

snip your code

Ok, well here's some code I wrote:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ cat tst_mal.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <inttypes.h>
#include <unistd.h>

typedef enum
{
	ALLOCMETH_USE_MALLOC		= 'm',
	ALLOCMETH_USE_MALLOC_AND_MEMSET	= 'M',
	ALLOCMETH_USE_CALLOC		= 'c',
	
} allocMethods_t;

typedef enum
{
	RESETMETH_USE_MEMSET		= 'm',
	RESETMETH_USE_PAGE		= 'p',

} resetMethods_t;

typedef struct
{
	size_t		 sz;
	int		 allocMethod;
	int		 reset;
	int		 resetMethod;
	
	/* for reset in page mode */
	
	size_t		 rpmBlkSz;	/* here it's bytes */
	int		 rpmRptIvl;
	
} args_t;

typedef struct
{
	uint64_t	s;
	uint64_t	e;
	
} timeStat_t;

typedef enum
{
	STMOP_START,
	STMOP_END,
	
} setTimeStatOp_t;

typedef enum
{
	TSPREC_US,
	TSPREC_MS,
	TSPREC_S,
	
} timeStatPrecsion_t;

#define DEFAULT_SZ		(1024 * 1024 * 512)	/* 512 MB */
#define DEFAULT_ALLOC_METHOD	(ALLOCMETH_USE_MALLOC)
#define DEFAULT_RESET_METHOD	(RESETMETH_USE_MEMSET)

static void _processArgs(int argc, char **argv, args_t *intoArgs);

static void resetPageTest(void *memPtr, args_t *args);

static void _setTimeStat(timeStat_t *setIn, setTimeStatOp_t op);
static double _getTimingFromTimeStat(timeStat_t *getFrom, timeStatPrecsion_t prec);

int main(int argc, char **argv)
{
	args_t		 args =	{ .sz = DEFAULT_SZ,
				  .allocMethod = DEFAULT_ALLOC_METHOD,
				  .reset = 0,
				  .resetMethod = DEFAULT_RESET_METHOD,
				  .rpmBlkSz = sysconf(_SC_PAGESIZE),
				  .rpmRptIvl = 0 };
	
	void		*p;
	timeStat_t	 tmStat, tmStatRt;

	_setTimeStat(&tmStatRt, STMOP_START);
	
	_processArgs(argc, argv, &args);
	
	printf("%s'ing %lu bytes...",
		args.allocMethod == ALLOCMETH_USE_CALLOC ? "calloc" : "malloc",
		(unsigned long) args.sz);
	fflush(stdout);
	
	_setTimeStat(&tmStat, STMOP_START);
	
	if (args.allocMethod == ALLOCMETH_USE_CALLOC)
	{
		p = calloc(1, args.sz);
	}
	else
	{
		p = malloc(args.sz);
	}
	
	_setTimeStat(&tmStat, STMOP_END);
	
	if (p == NULL)
	{
		printf("FAILED!\n");
		return EXIT_FAILURE;
	}
	
	printf("OK.\nAllocated @ %p in %.2lfms\n",
		p, _getTimingFromTimeStat(&tmStat, TSPREC_MS));
	
	#define msetTst(opStr, val) \
		printf("%s memory...", opStr); fflush(stdout); \
		_setTimeStat(&tmStat, STMOP_START); \
		memset(p, val, args.sz); \
		_setTimeStat(&tmStat, STMOP_END); \
		printf("OK - done in %.2lfms.\n", \
			_getTimingFromTimeStat(&tmStat, TSPREC_MS));
	
	if (args.allocMethod == ALLOCMETH_USE_MALLOC_AND_MEMSET)
	{
		msetTst("Memsetting", 0);
	}
	
	if (args.reset)
	{
		if (args.resetMethod == RESETMETH_USE_MEMSET)
		{
			msetTst("Resetting", '*');
		}
		else
		{
			/* RESETMETH_USE_PAGE */
			
			resetPageTest(p, &args);
		}
	}
	
	#undef msetTst
	
	_setTimeStat(&tmStatRt, STMOP_END);
	
	printf("Done; executed in %.2lfs\n",
		_getTimingFromTimeStat(&tmStatRt, TSPREC_S));
}

static void _processArgs(int argc, char **argv, args_t *intoArgs)
{
	void _printUsage(int andDie)
	{
		printf("Usage:  tst_mal <args>\n"
			   "  <args> =>\n"
			   "    -sz <size>[<unit>]      Set allocation size to <size>\n"
			   "                              <unit> => 'M', 'K', 'G'\n"
			   "                                        (bytes is default if not supplied)\n\n"
			   "    -am <method>            Set allocation method to <method>\n"
			   "                              <method> =>\n");
			   
		#define prnAllocMeth(meth, methDesc) \
			printf("                                '%c' = %-20s%s\n", \
				meth, methDesc, meth == DEFAULT_ALLOC_METHOD ? "(default)" : "");
		
		prnAllocMeth(ALLOCMETH_USE_MALLOC, "Use malloc");
		prnAllocMeth(ALLOCMETH_USE_MALLOC_AND_MEMSET, "Use malloc + memset");
		prnAllocMeth(ALLOCMETH_USE_CALLOC, "Use calloc");
		
		#undef prnAllocMeth
		
		printf("    -reset                  Reset memory after initial allocation\n"
		       "    -rm <reset_method>      Sets the reset method to <reset_method>, only\n"
		       "                              useful if -reset is supplied\n"
		       "                              <reset_method> =>\n");
		       
		#define prnResetMeth(meth, methDesc) \
			printf("                                '%c' = %-19s%s\n", \
				meth, methDesc, meth == DEFAULT_RESET_METHOD ? "(default)" : "");

		prnResetMeth(RESETMETH_USE_MEMSET, "Use memset");
		prnResetMeth(RESETMETH_USE_PAGE, "Use page alogirthm");
		
		#undef prnResetMeth
		
		printf("\n"
		       "    The following arguments are only useful if \"-reset -rm %c\" is supplied:\n\n" 
		       "    -rpm_blkKb <block_kb>   Sets size of a block (in K) as <block_kb>\n"
		       "                              default is the system page size\n"
		       "    -rpm_rptIvlS <ivl_s>    Sets seconds between reporting progress to <ivl_s>\n"
		       "                              Supply 0 to not report\n",
		       RESETMETH_USE_PAGE);
			
		if (andDie) exit(EXIT_FAILURE);
	}
	
	int	i;
	
	for (i = 1; i < argc; i++)
	{
		#define _getParm(arg, parm)	\
			i++; \
			if (i == argc) \
			{ \
				printf("Must supply <%s> for -%s\n\n", parm, arg); \
				_printUsage(1); \
			}
			
		if (argv[i][0] == '-')
		{
			if (strcmp(argv[i]+1, "sz") == 0)
			{
				char	*e;
				
				_getParm("sz", "size");
				
				intoArgs->sz = strtoull(argv[i], &e, 10);
				if (*e != '\0')
				{
					switch(toupper(*e))
					{
					  default:
					  	printf("Illegal <unit> %c given to -sz; ignored\n\n",
					  		*e);
					  	break;
					  	
					  case 'G':	intoArgs->sz *= 1024 * 1024 * 1024;	break;
					  case 'M':	intoArgs->sz *= 1024 * 1024;		break;
					  case 'K':	intoArgs->sz *= 1024;				break;
					}
				}				
			}
			else if (strcmp(argv[i]+1, "am") == 0)
			{
				_getParm("am", "method");
				
				if (argv[i][1] != '\0')
				{
					goto illegalAllocMethod;	/* code reuse through goto; blasphemy! */
				}
				
				switch(argv[i][0])
				{
				  default:
				  illegalAllocMethod:
					printf("Illegal <method> %s given to -am\n\n",
						argv[i]);
					_printUsage(1);
					break;
					
				  case ALLOCMETH_USE_MALLOC:
				  case ALLOCMETH_USE_MALLOC_AND_MEMSET:
				  case ALLOCMETH_USE_CALLOC:
				  	  intoArgs->allocMethod = argv[i][0];
				  	  break;
				}
			}
			else if (strcmp(argv[i]+1, "reset") == 0)
			{
				intoArgs->reset = 1;
			}
			else if (strcmp(argv[i]+1, "rm") == 0)
			{
				_getParm("rm", "reset_method");
				
				if (argv[i][1] != '\0')
				{
					goto illegalResetMethod;
				}
				
				switch(argv[i][0])
				{
				  default:
				  illegalResetMethod:
					printf("Illegal <reset_method> %s given to -rm\n\n",
						argv[i]);
					_printUsage(1);
					break;
					
				  case RESETMETH_USE_MEMSET:
				  case RESETMETH_USE_PAGE:
				  	  intoArgs->resetMethod = argv[i][0];
				  	  break;
				}
			}
			else if (strcmp(argv[i]+1, "rpm_blkKb") == 0)
			{
				_getParm("rpm_blkKb", "block_kb");
				
				intoArgs->rpmBlkSz = strtoull(argv[i], NULL, 10) * 1024;
			}
			else if (strcmp(argv[i]+1, "rpm_rptIvlS") == 0)
			{
				_getParm("rpm_rptIvlS", "ivl_s");
				
				intoArgs->rpmRptIvl = atoi(argv[i]);
			}
			else
			{
				if (strcmp(argv[i]+1, "?") != 0)
				{
					printf("Unknown argument [%s]\n\n", argv[i]);
				}
				_printUsage(1);
			}
		}
		else
		{
			printf("Illegal argument [%s]\n\n", argv[i]);
			_printUsage(1);
		}
		
		#undef _getParm
	}
}

static void resetPageTest(void *memPtr, args_t *args)
{
	timeStat_t	 tmStat;
	
	char		*cPtr;
	
	size_t		 numBlks;
	size_t		 curBlk;
	
	time_t		 tmS, tmC;
	
	_setTimeStat(&tmStat, STMOP_START);
	
	numBlks = args->sz / args->rpmBlkSz;
	
	tmS = 0;
	
	for (cPtr = memPtr, curBlk = 0;
		 curBlk < numBlks;
		 cPtr += args->rpmBlkSz, curBlk++)
	{
		tmC = time(NULL);
		
		if (args->rpmRptIvl && tmC - tmS >= args->rpmRptIvl)
		{
			printf("Setting block %lu of %lu\n",
				(unsigned long) curBlk, (unsigned long) numBlks);
			
			tmS = tmC;
		}
		
		*cPtr = '\0';
	}
	
	_setTimeStat(&tmStat, STMOP_END);
	
	printf("Page test (for block size %lu) complete in:  %.2lfms\n",
		(unsigned long) args->rpmBlkSz, _getTimingFromTimeStat(&tmStat, TSPREC_MS));
}

static void _setTimeStat(timeStat_t *setIn, setTimeStatOp_t op)
{
	struct timeval	 tv;
	uint64_t		*setP;
	
	gettimeofday(&tv, NULL);
	
	switch(op)
	{
	  default: return;

	  case STMOP_START:	setP = &setIn->s; break;
	  case STMOP_END:	setP = &setIn->e; break;
	}
	
	*setP = tv.tv_usec + tv.tv_sec * 1000000;
}

static double _getTimingFromTimeStat(timeStat_t *getFrom, timeStatPrecsion_t prec)
{
	double		precDiv;
	
	switch(prec)
	{
	  default:	return (double) -1.0;

	  case TSPREC_US:	precDiv = 1.0;			break;
	  case TSPREC_MS:	precDiv = 1000.0;		break;
	  case TSPREC_S:	precDiv = 1000000.0;	break; 
	}
	
	return ((double) (getFrom->e - getFrom->s)) / precDiv;
}
dreamwarrior@dreamwarrior-laptop:~/CStuff$ cc tst_mal.c -o tst_mal
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbb9008 in 2976.96ms
Done; executed in 2.98s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am M -sz 700m
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc1d008 in 0.14ms
Memsetting memory...OK - done in 791.18ms.
Done; executed in 0.79s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbb1008 in 722.44ms
Done; executed in 0.72s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am M -sz 700m
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbe7008 in 0.14ms
Memsetting memory...OK - done in 762.96ms.
Done; executed in 0.76s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc65008 in 718.69ms
Done; executed in 0.72s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am M -sz 700m
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc77008 in 0.14ms
Memsetting memory...OK - done in 766.16ms.
Done; executed in 0.77s

So, let's see, the first calloc took FOREVER (but, for the record, the first malloc+memset would have too for the O/S to put some pages together). Subsequent calloc and malloc+memset perform within 40-50ms as I alternate testing them.

How about a straight malloc, though represented and timed above, here's the program just doing that:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 700m
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bb1c008 in 0.14ms
Done; executed in 0.00s

Better:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 2G
malloc'ing 2147483648 bytes...OK.
Allocated @ 0x37782008 in 0.13ms
Done; executed in 0.00s

I don't think anyone's surprised though, right?

Now, let's see after we have all those pages put together, backed, and ready to go; what's a "memset" really take:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am M -reset -sz 700m
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc6f008 in 0.14ms
Memsetting memory...OK - done in 759.59ms.
Resetting memory...OK - done in 208.10ms.
Done; executed in 0.97s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -reset -sz 700m
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc38008 in 726.57ms
Resetting memory...OK - done in 207.61ms.
Done; executed in 0.93s

Humm...so, about a third the time of the original allocation. Surprised? Well, I mean, it still sucks, so I certainly wouldn't go ahead and advocate calloc followed by a memset for 0, that'd just be dumb...oh wait, I never did!

Quote:

Originally Posted by alister

HAHA, my friend, I believe I can code just fine. I know what I'm talking about, and I believe you do as well.

At the end of the day, I'm the one developing for my systems whose integrity I have to maintain. A 2 second calloc call would be disastrous in my environment. I can spread that over the duration of the code's runtime by calling malloc and gracefully filling in the memory as needed with useful values (which are almost never all zeros) and allow the O/S to more gracefully back the pages as each one is touched rather than immediately (which appears to be what calloc is doing on my systems).

Furthermore, if I haven't already driven home my point that calloc isn't good, we'll run the final test I've created in the suite. It simulates only touching part of each allocated page, something akin to what a sparse array may do. I'm mallocing some memory and then only using some of it. The key is, to do it on a machine that is overcome, in my case that's to allocate a gig when that's all the RAM I have. Here are some "normal" results:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbaf008 in 716.15ms
Page test (for block size 4096) complete in:  61.09ms
Done; executed in 0.78s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bba6008 in 717.34ms
Page test (for block size 4096) complete in:  60.98ms
Done; executed in 0.78s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bc41008 in 717.20ms
Page test (for block size 4096) complete in:  61.31ms
Done; executed in 0.78s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 700m -reset -rm p
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bcac008 in 0.14ms
Page test (for block size 4096) complete in:  677.57ms
Done; executed in 0.68s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 700m -reset -rm p
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbd5008 in 0.14ms
Page test (for block size 4096) complete in:  670.40ms
Done; executed in 0.67s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 700m -reset -rm p
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bb4f008 in 0.14ms
Page test (for block size 4096) complete in:  674.71ms
Done; executed in 0.68s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bafe008 in 719.14ms
Page test (for block size 4096) complete in:  60.92ms
Done; executed in 0.78s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bbd8008 in 716.25ms
Page test (for block size 4096) complete in:  61.62ms
Done; executed in 0.78s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 700m -reset -rm p
malloc'ing 734003200 bytes...OK.
Allocated @ 0x8bb17008 in 0.14ms
Page test (for block size 4096) complete in:  673.26ms
Done; executed in 0.67s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 700m -reset -rm p
calloc'ing 734003200 bytes...OK.
Allocated @ 0x8bcba008 in 716.20ms
Page test (for block size 4096) complete in:  61.00ms
Done; executed in 0.78s

What you'll notice here is that the entire program time is consistently longer for calloc than malloc. If you figure that I'm using the system page size (and I am) I should be touching each page once, forcing each page to be physically created.

So, now lets thrash my poor laptop to death. It only has a gig, so this will do it, lots of swapping, and disc work....

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 1G -reset -rm p -rpm_rptIvlS 5
malloc'ing 1073741824 bytes...OK.
Allocated @ 0x777ba008 in 0.13ms
Setting block 0 of 262144
Setting block 190686 of 262144
Setting block 204808 of 262144
Setting block 219368 of 262144
Setting block 235476 of 262144
Setting block 253338 of 262144
Page test (for block size 4096) complete in:  26537.83ms
Done; executed in 26.54s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 1G -reset -rm p -rpm_rptIvlS 5
malloc'ing 1073741824 bytes...OK.
Allocated @ 0x77798008 in 0.14ms
Setting block 0 of 262144
Setting block 219046 of 262144
Setting block 235146 of 262144
Setting block 244933 of 262144
Setting block 258882 of 262144
Page test (for block size 4096) complete in:  21310.03ms
Done; executed in 21.31s
dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am m -sz 1G -reset -rm p -rpm_rptIvlS 5
malloc'ing 1073741824 bytes...OK.
Allocated @ 0x77713008 in 0.14ms
Setting block 0 of 262144
Setting block 228043 of 262144
Setting block 233084 of 262144
Setting block 243446 of 262144
Setting block 253290 of 262144
Setting block 260604 of 262144
Page test (for block size 4096) complete in:  24659.43ms
Done; executed in 24.66s

Now...wait for it.... I'll run the same code with the only difference being calloc was called to allocate the memory (you can see the code, right):

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am c -sz 1G -reset -rm p -rpm_rptIvlS 5
calloc'ing 1073741824 bytes...OK.
Allocated @ 0x7787d008 in 26556.11ms
Setting block 0 of 262144
Setting block 5480 of 262144
Setting block 9077 of 262144
Setting block 13079 of 262144
Setting block 16522 of 262144
Setting block 24810 of 262144
Setting block 36661 of 262144
Setting block 42406 of 262144
Setting block 54158 of 262144
Setting block 58628 of 262144
Setting block 76180 of 262144
Setting block 84026 of 262144
Setting block 97622 of 262144
Setting block 108027 of 262144
Setting block 116878 of 262144
Setting block 125458 of 262144
Setting block 127187 of 262144
Setting block 133809 of 262144
Setting block 140831 of 262144
Setting block 152345 of 262144
Setting block 161335 of 262144
Setting block 169500 of 262144
Setting block 174486 of 262144
Setting block 184043 of 262144
Setting block 188973 of 262144
Setting block 194954 of 262144
Setting block 201163 of 262144
Setting block 214170 of 262144
Setting block 222869 of 262144
Setting block 231034 of 262144
Setting block 240252 of 262144
Setting block 247003 of 262144
Setting block 254878 of 262144
Setting block 260254 of 262144
Page test (for block size 4096) complete in:  177205.24ms
Done; executed in 204.90s

Hummm...204 seconds, 10 times the malloc version! WHAT?! Now, what say you? Oh, and not to mention the fact that the calloc itself took about as long as the entire longest program run above!

Still convinced? Maybe your machines are different, I'm sure they are, but mine (and probably the reset of the Linux world running stock Ubuntu) work badly with calloc.

Oh, and the one that should arguably do the absolute worst, malloc + memset + the subsequent reset:

Code:

dreamwarrior@dreamwarrior-laptop:~/CStuff$ ./tst_mal -am M -sz 1G -reset -rm p -rpm_rptIvlS 5
malloc'ing 1073741824 bytes...OK.
Allocated @ 0x776ef008 in 0.14ms
Memsetting memory...OK - done in 32560.51ms.
Setting block 0 of 262144
Setting block 2216 of 262144
Setting block 6796 of 262144
Setting block 12435 of 262144
Setting block 26046 of 262144
Setting block 35270 of 262144
Setting block 42186 of 262144
Setting block 54684 of 262144
Setting block 62691 of 262144
Setting block 74020 of 262144
Setting block 87550 of 262144
Setting block 98919 of 262144
Setting block 107850 of 262144
Setting block 114000 of 262144
Setting block 121389 of 262144
Setting block 130370 of 262144
Setting block 140634 of 262144
Setting block 149115 of 262144
Setting block 157734 of 262144
Setting block 161347 of 262144
Setting block 165615 of 262144
Setting block 174104 of 262144
Setting block 176905 of 262144
Setting block 181753 of 262144
Setting block 188402 of 262144
Setting block 195635 of 262144
Setting block 201081 of 262144
Setting block 209574 of 262144
Setting block 218822 of 262144
Setting block 227971 of 262144
Setting block 236019 of 262144
Setting block 243988 of 262144
Setting block 250202 of 262144
Setting block 256309 of 262144
Page test (for block size 4096) complete in:  173445.75ms
Done; executed in 206.56s

Not even 2 seconds behind, and there's about that variation run to run with this stuff. Yep, calloc...certainly worth it on my box

DreamWarrior

View Public Profile for DreamWarrior

Find all posts by DreamWarrior

Programming

memcpy error

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print Error in Console and both Error & Output in Log file - UNIX

Discussion started by: sarathy_a35

2. Shell Programming and Scripting

Undefined reference to memcpy@GLIBC_2.14

Discussion started by: linuxUser_

3. Solaris

Rpcinfo: can't contact portmapper: RPC: Authentication error; why = Failed (unspecified error)

Discussion started by: christr

4. UNIX for Dummies Questions & Answers

> 5 ")syntax error: operand expected (error token is " error

Discussion started by: metal005

5. Programming

Segmentation Fault by memcpy

Discussion started by: Zykl0n-B

6. AIX

nim mksysb error :/usr/bin/savevg[33]: 1016,07: syntax error

Discussion started by: astjen

7. Programming

Problem with memcpy

Discussion started by: arunkumar_mca

8. UNIX for Dummies Questions & Answers

awk Shell Script error : "Syntax Error : `Split' unexpected

Discussion started by: Herry

9. Programming

memcpy segfaults, but not in windows

Discussion started by: khoma

10. UNIX for Dummies Questions & Answers

Error: Internal system error: Unable to initialize standard output file

Discussion started by: firkus