When information and measures obtained from sequences of microscopic images are subject to time constraints, suitable fast algorithms must be implemented to process the whole data set. In this work, we deal with sequences of images obtained from time-lapse microscopy in order to detect single yeast cells in a microfluidics chip over time. The underlying idea consists in determining a minimum cost configuration for each couple of frames, which can be expressed by setting up and solving a linear programming (LP) problem. Laboratories seldom have the opportunity to use HPC hardware for such intent. For this reason, we propose an efficient GPU-parallel software implemented in CUDA and based on the simplex method, a common tool for solving LP problems. Our parallel strategy minimizes the threads divergence and maximizes the device occupancy, in order to maximize the overall throughput. Also, memory transfers between host and device have been minimized to exploit data locality. Experimental results on real images sequences highlight a promising speedup with respect to the CPU version suitable for real-time applications.
A GPU algorithm for tracking yeast cells in phase-contrast microscopy images
Marcellino, Livia;
2019-01-01
Abstract
When information and measures obtained from sequences of microscopic images are subject to time constraints, suitable fast algorithms must be implemented to process the whole data set. In this work, we deal with sequences of images obtained from time-lapse microscopy in order to detect single yeast cells in a microfluidics chip over time. The underlying idea consists in determining a minimum cost configuration for each couple of frames, which can be expressed by setting up and solving a linear programming (LP) problem. Laboratories seldom have the opportunity to use HPC hardware for such intent. For this reason, we propose an efficient GPU-parallel software implemented in CUDA and based on the simplex method, a common tool for solving LP problems. Our parallel strategy minimizes the threads divergence and maximizes the device occupancy, in order to maximize the overall throughput. Also, memory transfers between host and device have been minimized to exploit data locality. Experimental results on real images sequences highlight a promising speedup with respect to the CPU version suitable for real-time applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.