I’m getting a Segmentation Fault when I run this code. Surprisingly, when I set thread_count
to 16 or less, it doesn’t give any error. When I debug the code using gdb, the code gets an error at line local_answer += vec_1[j] * vec_2[j];
in the Calculate()
thread function. What is the reason for this behavior? How can I fix that?
I’m compiling with this gcc command:
gcc test.c -o DP -lpthread -lm -mcmodel=large -g
And here’s the code:
#include <stdio.h> #include <stdlib.h> #include <sys/time.h> #include <time.h> #include <math.h> #include <pthread.h> double *vec_1 = NULL; double *vec_2 = NULL; int vec_length = 0; int thread_count = 0; double answer = 0; double *partial_results = NULL; pthread_mutex_t mutex; void *Calculate(void *arg) { int myId = (int) arg; int myStart = myId * vec_length / thread_count; int myEnd = (myId + 1) * vec_length / thread_count; double local_answer = 0; int j; for(j = myStart; j < myEnd; j++) { local_answer += vec_1[j] * vec_2[j]; } pthread_mutex_lock(&mutex); partial_results[myId] = local_answer; pthread_mutex_unlock(&mutex); } int main(int argc, const char *argv[]) { srand((unsigned int) time(NULL)); pthread_mutex_init(&mutex, NULL); int num_iterations = 5; vec_length = 1000000000; thread_count = 25; partial_results = (double*) malloc(thread_count * sizeof(double)); double avg_time = 0; int i; vec_1 = (double*) malloc(vec_length * sizeof(double)); vec_2 = (double*) malloc(vec_length * sizeof(double)); if(vec_1==NULL || vec_2==NULL){ printf("Memory Allocation failed"); exit(0); } int j; for (j = 0; j < vec_length; j++) { vec_1[j] = ((double) rand() / (double) (RAND_MAX)) + 1; vec_2[j] = ((double) rand() / (double) (RAND_MAX)) + 1; } for (i = 0; i < num_iterations; i++) { pthread_t threads[thread_count]; pthread_attr_t attr; void* status; struct timeval t1, t2; gettimeofday(&t1, NULL); pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); int t; for (t = 0; t < thread_count; t++) { if (pthread_create(&threads[t], NULL, Calculate, (void*)(t))) { printf("ERROR in pthread_create()"); exit(-1); } } pthread_attr_destroy(&attr); answer = 0; for (t = 0; t < thread_count; t++) { if (pthread_join(threads[t], &status)) { printf("ERROR in pthread_join()"); exit(-1); } answer += partial_results[t]; } gettimeofday(&t2, NULL); avg_time += (t2.tv_sec - t1.tv_sec) * 1000.0 + (t2.tv_usec - t1.tv_usec) / 1000.0; } printf("Average time Spent : %lf n", avg_time / num_iterations); pthread_mutex_destroy(&mutex); return 0; }
Advertisement
Answer
Your vec_length
has type int
. With gcc on Linux x86 or x86_64, int
is represented in 32-bit two’s complement format. This is sufficient to accommodate the value you’re using for vec_length
, 1,000,000,000, but not to accommodate most integer multiples of that value. You compute several such multiples, and the resulting overflow of a signed integer formally produces undefined behavior.
In practice, it is likely that gcc’s actual behavior upon signed integer overflow is reproducible. In that case, you can write a program to demonstrate for yourself that the results are negative for several small-integer multiples of your vector length. Where that occurs, your program will attempt to access outside the bounds of each of the two vectors, at the line where indeed the error is indicated, with a segfault being a likely result. (And even if the overflow results were not reproducible, obtaining a negative result for some of those undefined multiplication behaviors would still be well within the realm of possibility.)
You have several alternatives, among them:
use a wider data type for your indexing computations
int myStart = myId * (int64_t) vec_length / thread_count;
use only
thread_count
values that evenly divide thevec_length
, and use parentheses to ensure that the division is performed first in your indexing computationsint myStart = myId * (vec_length / thread_count); // ... vec_length = 1000000000; thread_count = 32; // or 10 or 8 or 1000
A few other things:
- The code presented does not use any math.h functions. It therefore does not need to
#include
math.h, and you do not need to link in libm. - To compile a Pthreads program with GCC, you ought to use the
-pthreads
flag, in which case you also do not need to explicitly link in libpthread. - As discussed in comments, you do not need the complication of a
pthread_attr_t
. - As discussed in comments, your particular use of a mutex is an unnecessary performance drain.