|
Parallel Colt 0.7.2 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object jcuda.jcublas.JCublas
public class JCublas
JCublas - Java bindings for CUBLAS, the NVIDIA CUDA BLAS library
www.jcuda.de
This file comment is partially taken from the cublas.h header file:
CUBLAS is an implementation of BLAS (Basic Linear Algebra Subroutines) on top
of the CUDA driver. It allows access to the computational resources of NVIDIA
GPUs. The library is self-contained at the API level, i.e. no direct
interaction with the CUDA driver is necessary.
The basic model by which applications use the CUBLAS library is to create
matrix and vector object in GPU memory space, fill them with data, then call
a sequence of BLAS functions, and finally upload the results from GPU memory
space back to the host. To accomplish this, CUBLAS provides helper functions
for creating and destroying objects in GPU space, and to write data to, and
retrieve data from, these objects.
Since the BLAS core functions (as opposed to the helper functions) do not
return error status directly (for reasons of compatibility with existing BLAS
libraries) CUBLAS provides a separate function to retrieve the last error
that was recorded, to aid in debugging.
Currently, only a subset of the BLAS core functions is implemented.
Nested Class Summary | |
---|---|
static class |
JCublas.ARCHType
Enumeration of common CPU architectures. |
static class |
JCublas.LogLevel
The log levels which may be used to control the internal logging of the JCublas library |
static class |
JCublas.OSType
Enumeration of common operating systems, independent of version or architecture. |
Field Summary | |
---|---|
static int |
CUBLAS_STATUS_ALLOC_FAILED
Resource allocation failed |
static int |
CUBLAS_STATUS_ARCH_MISMATCH
function requires an architectural feature absent from the architecture of the device |
static int |
CUBLAS_STATUS_EXECUTION_FAILED
GPU program failed to execute |
static int |
CUBLAS_STATUS_INTERNAL_ERROR
An internal CUBLAS operation failed |
static int |
CUBLAS_STATUS_INVALID_VALUE
Unsupported numerical value was passed to function |
static int |
CUBLAS_STATUS_MAPPING_ERROR
Access to GPU memory space failed |
static int |
CUBLAS_STATUS_NOT_INITIALIZED
Library not initialized |
static int |
CUBLAS_STATUS_SUCCESS
Operation completed successfully |
static int |
JCUBLAS_STATUS_INTERNAL_ERROR
An internal JCublas operation failed |
static int |
JCUBLAS_STATUS_MEMORY_ALREADY_USED
Device memory with the specified name was already allocated |
static int |
JCUBLAS_STATUS_MEMORY_NOT_FOUND
Device memory with the specified name could not be found |
Method Summary | |
---|---|
static int |
cublasAlloc(int n,
int elemSize,
String name)
Wrapper for CUBLAS function. cublasStatus cublasAlloc (int n, int elemSize, void **devicePtr) creates an object in GPU memory space capable of holding an array of n elements, where each element requires elemSize bytes of storage. |
static void |
cublasCaxpy(int n,
JCuComplex alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasCaxpy(int n,
JCuComplex alpha,
String x,
int incx,
String y,
int incy)
|
static void |
cublasCcopy(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasCcopy(int n,
String x,
int incx,
String y,
int incy)
|
static void |
cublasCgemm(char transa,
char transb,
int m,
int n,
int k,
JCuComplex alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
JCuComplex beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasCgemm(char transa,
char transb,
int m,
int n,
int k,
JCuComplex alpha,
String A,
int lda,
String B,
int ldb,
JCuComplex beta,
String C,
int ldc)
|
static void |
cublasCrot(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
float c,
JCuComplex s)
Wrapper for CUBLAS function. |
static void |
cublasCrot(int n,
String x,
int incx,
String y,
int incy,
float c,
JCuComplex s)
|
static void |
cublasCrotg(String pca,
int offsetpca,
JCuComplex cb,
String psc,
int offsetpsc,
String pcs,
int offsetpcs)
Wrapper for CUBLAS function. |
static void |
cublasCrotg(String pca,
JCuComplex cb,
String psc,
String pcs)
|
static void |
cublasCscal(int n,
JCuComplex alpha,
String x,
int incx)
|
static void |
cublasCscal(int n,
JCuComplex alpha,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasCsrot(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
float c,
float s)
Wrapper for CUBLAS function. |
static void |
cublasCsrot(int n,
String x,
int incx,
String y,
int incy,
float c,
float s)
|
static void |
cublasCsscal(int n,
float alpha,
String x,
int incx)
|
static void |
cublasCsscal(int n,
float alpha,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasCswap(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasCswap(int n,
String x,
int incx,
String y,
int incy)
|
static void |
cublasDaxpy(int n,
double alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasDaxpy(int n,
double alpha,
String x,
int incx,
String y,
int incy)
|
static void |
cublasDcopy(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasDcopy(int n,
String x,
int incx,
String y,
int incy)
|
static void |
cublasDgemm(char transa,
char transb,
int m,
int n,
int k,
double alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
double beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasDgemm(char transa,
char transb,
int m,
int n,
int k,
double alpha,
String A,
int lda,
String B,
int ldb,
double beta,
String C,
int ldc)
|
static void |
cublasDgemv(char trans,
int m,
int n,
double alpha,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx,
double beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasDgemv(char trans,
int m,
int n,
double alpha,
String A,
int lda,
String x,
int incx,
double beta,
String y,
int incy)
|
static void |
cublasDger(int m,
int n,
double alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
String A,
int offsetA,
int lda)
Wrapper for CUBLAS function. |
static void |
cublasDger(int m,
int n,
double alpha,
String x,
int incx,
String y,
int incy,
String A,
int lda)
|
static void |
cublasDrot(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
double sc,
double ss)
Wrapper for CUBLAS function. |
static void |
cublasDrot(int n,
String x,
int incx,
String y,
int incy,
double sc,
double ss)
|
static void |
cublasDrotg(String sa,
int offsetsa,
String sb,
int offsetsb,
String sc,
int offsetsc,
String ss,
int offsetss)
Wrapper for CUBLAS function. |
static void |
cublasDrotg(String sa,
String sb,
String sc,
String ss)
|
static void |
cublasDrotm(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
double[] sparam)
Wrapper for CUBLAS function. |
static void |
cublasDrotmg(double[] sd1,
double[] sd2,
double[] sx1,
double sy1,
double[] sparam)
Wrapper for CUBLAS function. |
static void |
cublasDscal(int n,
double alpha,
String x,
int incx)
|
static void |
cublasDscal(int n,
double alpha,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasDswap(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasDswap(int n,
String x,
int incx,
String y,
int incy)
|
static void |
cublasDsymm(char side,
char uplo,
int m,
int n,
double alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
double beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasDsymm(char side,
char uplo,
int m,
int n,
double alpha,
String A,
int lda,
String B,
int ldb,
double beta,
String C,
int ldc)
|
static void |
cublasDsyr(char uplo,
int n,
double alpha,
String x,
int offsetx,
int incx,
String A,
int offsetA,
int lda)
Wrapper for CUBLAS function. |
static void |
cublasDsyr(char uplo,
int n,
double alpha,
String x,
int incx,
String A,
int lda)
|
static void |
cublasDsyr2k(char uplo,
char trans,
int n,
int k,
double alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
double beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasDsyr2k(char uplo,
char trans,
int n,
int k,
double alpha,
String A,
int lda,
String B,
int ldb,
double beta,
String C,
int ldc)
|
static void |
cublasDsyrk(char uplo,
char trans,
int n,
int k,
double alpha,
String A,
int lda,
double beta,
String C,
int ldc)
|
static void |
cublasDsyrk(char uplo,
char trans,
int n,
int k,
double alpha,
String A,
int offsetA,
int lda,
double beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasDtrmm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
double alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Wrapper for CUBLAS function. |
static void |
cublasDtrmm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
double alpha,
String A,
int lda,
String B,
int ldb)
|
static void |
cublasDtrsm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
double alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Wrapper for CUBLAS function. |
static void |
cublasDtrsm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
double alpha,
String A,
int lda,
String B,
int ldb)
|
static void |
cublasDtrsv(char uplo,
char trans,
char diag,
int n,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasDtrsv(char uplo,
char trans,
char diag,
int n,
String A,
int lda,
String x,
int incx)
|
static int |
cublasFree(String name)
Wrapper for CUBLAS function. cublasStatus cublasFree (const void *devicePtr) destroys the object in GPU memory space pointed to by devicePtr. Return Values ------------- CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INTERNAL_ERROR if the object could not be deallocated CUBLAS_STATUS_SUCCESS if object was destroyed successfully |
static int |
cublasGetError()
Wrapper for CUBLAS function. cublasStatus cublasGetError() returns the last error that occurred on invocation of any of the CUBLAS BLAS functions. |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
double[] B,
int ldb)
Extended wrapper supporting double array arguments |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
DoubleBuffer B,
int ldb)
Wrapper for CUBLAS function. cublasStatus cublasGetMatrix (int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb) copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in CPU memory space. |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
float[] B,
int ldb)
Extended wrapper supporting float array arguments |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
FloatBuffer B,
int ldb)
Wrapper for CUBLAS function. cublasStatus cublasGetMatrix (int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb) copies a tile of rows x cols elements from a matrix A in GPU memory space to a matrix B in CPU memory space. |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
double[] B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
DoubleBuffer B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
float[] B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
FloatBuffer B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
JCuComplex[] B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int offsetA,
int lda,
JCuDoubleComplex[] B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
JCuComplex[] B,
int ldb)
Extended wrapper supporting complex array arguments |
static int |
cublasGetMatrix(int rows,
int cols,
String A,
int lda,
JCuDoubleComplex[] B,
int ldb)
Extended wrapper supporting complex array arguments |
static int |
cublasGetVector(int n,
String x,
int incx,
double[] y,
int incy)
Extended wrapper supporting double array arguments |
static int |
cublasGetVector(int n,
String x,
int incx,
DoubleBuffer y,
int incy)
Wrapper for CUBLAS function. cublasStatus cublasGetVector (int n, int elemSize, const void *x, int incx, void *y, int incy) copies n elements from a vector x in GPU memory space to a vector y in CPU memory space. |
static int |
cublasGetVector(int n,
String x,
int incx,
float[] y,
int incy)
Extended wrapper supporting float array arguments |
static int |
cublasGetVector(int n,
String x,
int incx,
FloatBuffer y,
int incy)
Wrapper for CUBLAS function. cublasStatus cublasGetVector (int n, int elemSize, const void *x, int incx, void *y, int incy) copies n elements from a vector x in GPU memory space to a vector y in CPU memory space. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
double[] y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
DoubleBuffer y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
float[] y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
FloatBuffer y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
JCuComplex[] y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int offsetx,
int incx,
JCuDoubleComplex[] y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasGetVector(int n,
String x,
int incx,
JCuComplex[] y,
int incy)
Extended wrapper supporting complex array arguments |
static int |
cublasGetVector(int n,
String x,
int incx,
JCuDoubleComplex[] y,
int incy)
Extended wrapper supporting complex array arguments |
static int |
cublasIcamax(int n,
String x,
int incx)
|
static int |
cublasIcamax(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static int |
cublasIcamin(int n,
String x,
int incx)
|
static int |
cublasIcamin(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static int |
cublasIdamax(int n,
String x,
int incx)
|
static int |
cublasIdamax(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static int |
cublasIdamin(int n,
String x,
int incx)
|
static int |
cublasIdamin(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static int |
cublasInit()
Wrapper for CUBLAS function. |
static int |
cublasInit(boolean emulation)
Wrapper for CUBLAS function. The emulation flag indicates whether the emulation mode of CUBLAS should be used. |
static int |
cublasIsamax(int n,
String x,
int incx)
|
static int |
cublasIsamax(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static int |
cublasIsamin(int n,
String x,
int incx)
|
static int |
cublasIsamin(int n,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasSaxpy(int n,
float alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSaxpy(int n,
float alpha,
String x,
int incx,
String y,
int incy)
|
static void |
cublasScopy(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasScopy(int n,
String x,
int incx,
String y,
int incy)
|
static int |
cublasSetMatrix(int rows,
int cols,
double[] A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
double[] A,
int lda,
String B,
int ldb)
Extended wrapper supporting double array arguments |
static int |
cublasSetMatrix(int rows,
int cols,
DoubleBuffer A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
DoubleBuffer A,
int lda,
String B,
int ldb)
Wrapper for CUBLAS function. cublasStatus cublasSetMatrix (int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb) copies a tile of rows x cols elements from a matrix A in CPU memory space to a matrix B in GPU memory space. |
static int |
cublasSetMatrix(int rows,
int cols,
float[] A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
float[] A,
int lda,
String B,
int ldb)
Extended wrapper supporting float array arguments |
static int |
cublasSetMatrix(int rows,
int cols,
FloatBuffer A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
FloatBuffer A,
int lda,
String B,
int ldb)
Wrapper for CUBLAS function. cublasStatus cublasSetMatrix (int rows, int cols, int elemSize, const void *A, int lda, void *B, int ldb) copies a tile of rows x cols elements from a matrix A in CPU memory space to a matrix B in GPU memory space. |
static int |
cublasSetMatrix(int rows,
int cols,
JCuComplex[] A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
JCuComplex[] A,
int lda,
String B,
int ldb)
Extended wrapper supporting complex array arguments |
static int |
cublasSetMatrix(int rows,
int cols,
JCuDoubleComplex[] A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Extended wrapper offering additional parameters to specify the offsets inside the matrices |
static int |
cublasSetMatrix(int rows,
int cols,
JCuDoubleComplex[] A,
int lda,
String B,
int ldb)
Extended wrapper supporting complex array arguments |
static int |
cublasSetVector(int n,
double[] x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
double[] x,
int incx,
String y,
int incy)
Extended wrapper supporting double array arguments |
static int |
cublasSetVector(int n,
DoubleBuffer x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
DoubleBuffer x,
int incx,
String y,
int incy)
Wrapper for CUBLAS function. cublasStatus cublasSetVector (int n, int elemSize, const void *x, int incx, void *y, int incy) copies n elements from a vector x in CPU memory space to a vector y in GPU memory space. |
static int |
cublasSetVector(int n,
float[] x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
float[] x,
int incx,
String y,
int incy)
Extended wrapper supporting float array arguments |
static int |
cublasSetVector(int n,
FloatBuffer x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
FloatBuffer x,
int incx,
String y,
int incy)
Wrapper for CUBLAS function. cublasStatus cublasSetVector (int n, int elemSize, const void *x, int incx, void *y, int incy) copies n elements from a vector x in CPU memory space to a vector y in GPU memory space. |
static int |
cublasSetVector(int n,
JCuComplex[] x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
JCuComplex[] x,
int incx,
String y,
int incy)
Extended wrapper supporting complex array arguments |
static int |
cublasSetVector(int n,
JCuDoubleComplex[] x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Extended wrapper offering additional parameters to specify the offsets inside the vectors. |
static int |
cublasSetVector(int n,
JCuDoubleComplex[] x,
int incx,
String y,
int incy)
Extended wrapper supporting complex array arguments |
static void |
cublasSgbmv(char trans,
int m,
int n,
int kl,
int ku,
float alpha,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx,
float beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSgbmv(char trans,
int m,
int n,
int kl,
int ku,
float alpha,
String A,
int lda,
String x,
int incx,
float beta,
String y,
int incy)
|
static void |
cublasSgemm(char transa,
char transb,
int m,
int n,
int k,
float alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
float beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasSgemm(char transa,
char transb,
int m,
int n,
int k,
float alpha,
String A,
int lda,
String B,
int ldb,
float beta,
String C,
int ldc)
|
static void |
cublasSgemv(char trans,
int m,
int n,
float alpha,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx,
float beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSgemv(char trans,
int m,
int n,
float alpha,
String A,
int lda,
String x,
int incx,
float beta,
String y,
int incy)
|
static void |
cublasSger(int m,
int n,
float alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
String A,
int offsetA,
int lda)
Wrapper for CUBLAS function. |
static void |
cublasSger(int m,
int n,
float alpha,
String x,
int incx,
String y,
int incy,
String A,
int lda)
|
static int |
cublasShutdown()
Wrapper for CUBLAS function. cublasStatus cublasShutdown() releases CPU-side resources used by the CUBLAS library. |
static void |
cublasSrot(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
float sc,
float ss)
Wrapper for CUBLAS function. |
static void |
cublasSrot(int n,
String x,
int incx,
String y,
int incy,
float sc,
float ss)
|
static void |
cublasSrotg(String sa,
int offsetsa,
String sb,
int offsetsb,
String sc,
int offsetsc,
String ss,
int offsetss)
Wrapper for CUBLAS function. |
static void |
cublasSrotg(String sa,
String sb,
String sc,
String ss)
|
static void |
cublasSrotm(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
float[] sparam)
Wrapper for CUBLAS function. |
static void |
cublasSrotm(int n,
String x,
int incx,
String y,
int incy,
double[] sparam)
|
static void |
cublasSrotm(int n,
String x,
int incx,
String y,
int incy,
float[] sparam)
|
static void |
cublasSrotmg(float[] sd1,
float[] sd2,
float[] sx1,
float sy1,
float[] sparam)
Wrapper for CUBLAS function. |
static void |
cublasSsbmv(char uplo,
int n,
int k,
float alpha,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx,
float beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSsbmv(char uplo,
int n,
int k,
float alpha,
String A,
int lda,
String x,
int incx,
float beta,
String y,
int incy)
|
static void |
cublasSscal(int n,
float alpha,
String x,
int incx)
|
static void |
cublasSscal(int n,
float alpha,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasSspmv(char uplo,
int n,
float alpha,
String AP,
int offsetAP,
String x,
int offsetx,
int incx,
float beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSspmv(char uplo,
int n,
float alpha,
String AP,
String x,
int incx,
float beta,
String y,
int incy)
|
static void |
cublasSspr(char uplo,
int n,
float alpha,
String x,
int offsetx,
int incx,
String AP,
int offsetAP)
Wrapper for CUBLAS function. |
static void |
cublasSspr(char uplo,
int n,
float alpha,
String x,
int incx,
String AP)
|
static void |
cublasSspr2(char uplo,
int n,
float alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
String AP,
int offsetAP)
Wrapper for CUBLAS function. |
static void |
cublasSspr2(char uplo,
int n,
float alpha,
String x,
int incx,
String y,
int incy,
String AP)
|
static void |
cublasSswap(int n,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSswap(int n,
String x,
int incx,
String y,
int incy)
|
static void |
cublasSsymm(char side,
char uplo,
int m,
int n,
float alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
float beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasSsymm(char side,
char uplo,
int m,
int n,
float alpha,
String A,
int lda,
String B,
int ldb,
float beta,
String C,
int ldc)
|
static void |
cublasSsymv(char uplo,
int n,
float alpha,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx,
float beta,
String y,
int offsety,
int incy)
Wrapper for CUBLAS function. |
static void |
cublasSsymv(char uplo,
int n,
float alpha,
String A,
int lda,
String x,
int incx,
float beta,
String y,
int incy)
|
static void |
cublasSsyr(char uplo,
int n,
float alpha,
String x,
int offsetx,
int incx,
String A,
int offsetA,
int lda)
Wrapper for CUBLAS function. |
static void |
cublasSsyr(char uplo,
int n,
float alpha,
String x,
int incx,
String A,
int lda)
|
static void |
cublasSsyr2(char uplo,
int n,
float alpha,
String x,
int offsetx,
int incx,
String y,
int offsety,
int incy,
String A,
int offsetA,
int lda)
Wrapper for CUBLAS function. |
static void |
cublasSsyr2(char uplo,
int n,
float alpha,
String x,
int incx,
String y,
int incy,
String A,
int lda)
|
static void |
cublasSsyr2k(char uplo,
char trans,
int n,
int k,
float alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
float beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasSsyr2k(char uplo,
char trans,
int n,
int k,
float alpha,
String A,
int lda,
String B,
int ldb,
float beta,
String C,
int ldc)
|
static void |
cublasSsyrk(char uplo,
char trans,
int n,
int k,
float alpha,
String A,
int lda,
float beta,
String C,
int ldc)
|
static void |
cublasSsyrk(char uplo,
char trans,
int n,
int k,
float alpha,
String A,
int offsetA,
int lda,
float beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasStbmv(char uplo,
char trans,
char diag,
int n,
int k,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStbmv(char uplo,
char trans,
char diag,
int n,
int k,
String A,
int lda,
String x,
int incx)
|
static void |
cublasStbsv(char uplo,
char trans,
char diag,
int n,
int k,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStbsv(char uplo,
char trans,
char diag,
int n,
int k,
String A,
int lda,
String x,
int incx)
|
static void |
cublasStpmv(char uplo,
char trans,
char diag,
int n,
String AP,
int offsetAP,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStpmv(char uplo,
char trans,
char diag,
int n,
String AP,
String x,
int incx)
|
static void |
cublasStpsv(char uplo,
char trans,
char diag,
int n,
String AP,
int offsetAP,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStpsv(char uplo,
char trans,
char diag,
int n,
String AP,
String x,
int incx)
|
static void |
cublasStrmm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
float alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Wrapper for CUBLAS function. |
static void |
cublasStrmm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
float alpha,
String A,
int lda,
String B,
int ldb)
|
static void |
cublasStrmv(char uplo,
char trans,
char diag,
int n,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStrmv(char uplo,
char trans,
char diag,
int n,
String A,
int lda,
String x,
int incx)
|
static void |
cublasStrsm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
float alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb)
Wrapper for CUBLAS function. |
static void |
cublasStrsm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
float alpha,
String A,
int lda,
String B,
int ldb)
|
static void |
cublasStrsv(char uplo,
char trans,
char diag,
int n,
String A,
int offsetA,
int lda,
String x,
int offsetx,
int incx)
Wrapper for CUBLAS function. |
static void |
cublasStrsv(char uplo,
char trans,
char diag,
int n,
String A,
int lda,
String x,
int incx)
|
static void |
cublasZgemm(char transa,
char transb,
int m,
int n,
int k,
JCuDoubleComplex alpha,
String A,
int offsetA,
int lda,
String B,
int offsetB,
int ldb,
JCuDoubleComplex beta,
String C,
int offsetC,
int ldc)
Wrapper for CUBLAS function. |
static void |
cublasZgemm(char transa,
char transb,
int m,
int n,
int k,
JCuDoubleComplex alpha,
String A,
int lda,
String B,
int ldb,
JCuDoubleComplex beta,
String C,
int ldc)
|
static void |
printMatrix(int cols,
String A,
int lda)
|
static void |
printVector(int n,
String x)
|
static void |
setEmulation(boolean emulation)
Set the flag which indicates whether a call to cublasInit should initialize JCublas in emulation mode |
static void |
setLogLevel(JCublas.LogLevel logLevel)
Set the specified log level for the JCublas library. This method may only be called after JCublas has been initialized with a call to JCublas.cublasInit() ! Currently supported log levels: LOG_QUIET: Never print anything LOG_ERROR: Print error messages LOG_INFO: Print information about the CUBLAS functions that are executed LOG_TRACE: Print fine-grained memory management information in methods like cublasSetVector |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int CUBLAS_STATUS_SUCCESS
public static final int CUBLAS_STATUS_NOT_INITIALIZED
public static final int CUBLAS_STATUS_ALLOC_FAILED
public static final int CUBLAS_STATUS_INVALID_VALUE
public static final int CUBLAS_STATUS_ARCH_MISMATCH
public static final int CUBLAS_STATUS_MAPPING_ERROR
public static final int CUBLAS_STATUS_EXECUTION_FAILED
public static final int CUBLAS_STATUS_INTERNAL_ERROR
public static final int JCUBLAS_STATUS_MEMORY_ALREADY_USED
public static final int JCUBLAS_STATUS_MEMORY_NOT_FOUND
public static final int JCUBLAS_STATUS_INTERNAL_ERROR
Method Detail |
---|
public static void setEmulation(boolean emulation)
emulation
- Whether emulation mode should be usedpublic static int cublasInit() throws Throwable
IOException
URISyntaxException
UnsatisfiedLinkError
Throwable
public static int cublasInit(boolean emulation) throws Throwable
emulation
- Indicates whether emulation mode should be used
Throwable
public static int cublasShutdown()
public static int cublasGetError()
public static int cublasAlloc(int n, int elemSize, String name)
public static int cublasFree(String name)
public static void setLogLevel(JCublas.LogLevel logLevel)
logLevel
- The log level to use.public static void printVector(int n, String x)
public static void printMatrix(int cols, String A, int lda)
public static int cublasSetVector(int n, FloatBuffer x, int incx, String y, int incy)
public static int cublasSetVector(int n, FloatBuffer x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasSetVector(int n, float[] x, int incx, String y, int incy)
public static int cublasSetVector(int n, float[] x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasSetVector(int n, JCuComplex[] x, int incx, String y, int incy)
public static int cublasSetVector(int n, JCuComplex[] x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, FloatBuffer y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, FloatBuffer y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, float[] y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, float[] y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, JCuComplex[] y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, JCuComplex[] y, int offsety, int incy)
public static int cublasSetMatrix(int rows, int cols, FloatBuffer A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, FloatBuffer A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasSetMatrix(int rows, int cols, float[] A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, float[] A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasSetMatrix(int rows, int cols, JCuComplex[] A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, JCuComplex[] A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, FloatBuffer B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, FloatBuffer B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, float[] B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, float[] B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, JCuComplex[] B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, JCuComplex[] B, int offsetB, int ldb)
public static int cublasSetVector(int n, DoubleBuffer x, int incx, String y, int incy)
public static int cublasSetVector(int n, DoubleBuffer x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasSetVector(int n, double[] x, int incx, String y, int incy)
public static int cublasSetVector(int n, double[] x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasSetVector(int n, JCuDoubleComplex[] x, int incx, String y, int incy)
public static int cublasSetVector(int n, JCuDoubleComplex[] x, int offsetx, int incx, String y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, DoubleBuffer y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, DoubleBuffer y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, double[] y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, double[] y, int offsety, int incy)
public static int cublasGetVector(int n, String x, int incx, JCuDoubleComplex[] y, int incy)
public static int cublasGetVector(int n, String x, int offsetx, int incx, JCuDoubleComplex[] y, int offsety, int incy)
public static int cublasSetMatrix(int rows, int cols, DoubleBuffer A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, DoubleBuffer A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasSetMatrix(int rows, int cols, double[] A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, double[] A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasSetMatrix(int rows, int cols, JCuDoubleComplex[] A, int lda, String B, int ldb)
public static int cublasSetMatrix(int rows, int cols, JCuDoubleComplex[] A, int offsetA, int lda, String B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, DoubleBuffer B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, DoubleBuffer B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, double[] B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, double[] B, int offsetB, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int lda, JCuDoubleComplex[] B, int ldb)
public static int cublasGetMatrix(int rows, int cols, String A, int offsetA, int lda, JCuDoubleComplex[] B, int offsetB, int ldb)
public static void cublasSrotm(int n, String x, int offsetx, int incx, String y, int offsety, int incy, float[] sparam)
void cublasSrotm (int n, float *x, int incx, float *y, int incy, const float* sparam) applies the modified Givens transformation, h, to the 2 x n matrix ( transpose(x) ) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 to n-1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. With sparam[0] = sflag, h has one of the following forms: sflag = -1.0f sflag = 0.0f sflag = 1.0f sflag = -2.0f (sh00 sh01) (1.0f sh01) (sh00 1.0f) (1.0f 0.0f) h = ( ) ( ) ( ) ( ) (sh10 sh11) (sh10 1.0f) (-1.0f sh11) (0.0f 1.0f) Input ----- n number of elements in input vectors x single precision vector with n elements incx storage spacing between elements of x y single precision vector with n elements incy storage spacing between elements of y sparam 5-element vector. sparam[0] is sflag described above. sparam[1] through sparam[4] contain the 2x2 rotation matrix h: sparam[1] contains sh00, sparam[2] contains sh10, sparam[3] contains sh01, and sprams[4] contains sh11. Output ------ x rotated vector x (unchanged if n <= 0) y rotated vector y (unchanged if n <= 0) Reference: http://www.netlib.org/blas/srotm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSrotm(int n, String x, int incx, String y, int incy, float[] sparam)
public static void cublasSrotmg(float[] sd1, float[] sd2, float[] sx1, float sy1, float[] sparam)
void cublasSrotmg (float *psd1, float *psd2, float *psx1, const float *psy1, float *sparam) constructs the modified Givens transformation matrix h which zeros the second component of the 2-vector transpose(sqrt(sd1)*sx1,sqrt(sd2)*sy1). With sparam[0] = sflag, h has one of the following forms: sflag = -1.0f sflag = 0.0f sflag = 1.0f sflag = -2.0f (sh00 sh01) (1.0f sh01) (sh00 1.0f) (1.0f 0.0f) h = ( ) ( ) ( ) ( ) (sh10 sh11) (sh10 1.0f) (-1.0f sh11) (0.0f 1.0f) sparam[1] through sparam[4] contain sh00, sh10, sh01, sh11, respectively. Values of 1.0f, -1.0f, or 0.0f implied by the value of sflag are not stored in sparam. Input ----- sd1 single precision scalar sd2 single precision scalar sx1 single precision scalar sy1 single precision scalar Output ------ sd1 changed to represent the effect of the transformation sd2 changed to represent the effect of the transformation sx1 changed to represent the effect of the transformation sparam 5-element vector. sparam[0] is sflag described above. sparam[1] through sparam[4] contain the 2x2 rotation matrix h: sparam[1] contains sh00, sparam[2] contains sh10, sparam[3] contains sh01, and sprams[4] contains sh11. Reference: http://www.netlib.org/blas/srotmg.f This functions does not set any error status.
public static void cublasDrotm(int n, String x, int offsetx, int incx, String y, int offsety, int incy, double[] sparam)
void cublasDrotm (int n, double *x, int incx, double *y, int incy, const double* sparam) applies the modified Givens transformation, h, to the 2 x n matrix ( transpose(x) ) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 to n-1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. With sparam[0] = sflag, h has one of the following forms: sflag = -1.0 sflag = 0.0 sflag = 1.0 sflag = -2.0 (sh00 sh01) (1.0 sh01) (sh00 1.0) (1.0 0.0) h = ( ) ( ) ( ) ( ) (sh10 sh11) (sh10 1.0) (-1.0 sh11) (0.0 1.0) Input ----- n number of elements in input vectors x double-precision vector with n elements incx storage spacing between elements of x y double-precision vector with n elements incy storage spacing between elements of y sparam 5-element vector. sparam[0] is sflag described above. sparam[1] through sparam[4] contain the 2x2 rotation matrix h: sparam[1] contains sh00, sparam[2] contains sh10, sparam[3] contains sh01, and sprams[4] contains sh11. Output ------ x rotated vector x (unchanged if n <= 0) y rotated vector y (unchanged if n <= 0) Reference: http://www.netlib.org/blas/drotm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSrotm(int n, String x, int incx, String y, int incy, double[] sparam)
public static void cublasDrotmg(double[] sd1, double[] sd2, double[] sx1, double sy1, double[] sparam)
void cublasDrotmg (double *psd1, double *psd2, double *psx1, const double *psy1, double *sparam) constructs the modified Givens transformation matrix h which zeros the second component of the 2-vector transpose(sqrt(sd1)*sx1,sqrt(sd2)*sy1). With sparam[0] = sflag, h has one of the following forms: sflag = -1.0 sflag = 0.0 sflag = 1.0 sflag = -2.0 (sh00 sh01) (1.0 sh01) (sh00 1.0) (1.0 0.0) h = ( ) ( ) ( ) ( ) (sh10 sh11) (sh10 1.0) (-1.0 sh11) (0.0 1.0) sparam[1] through sparam[4] contain sh00, sh10, sh01, sh11, respectively. Values of 1.0, -1.0, or 0.0 implied by the value of sflag are not stored in sparam. Input ----- sd1 single precision scalar sd2 single precision scalar sx1 single precision scalar sy1 single precision scalar Output ------ sd1 changed to represent the effect of the transformation sd2 changed to represent the effect of the transformation sx1 changed to represent the effect of the transformation sparam 5-element vector. sparam[0] is sflag described above. sparam[1] through sparam[4] contain the 2x2 rotation matrix h: sparam[1] contains sh00, sparam[2] contains sh10, sparam[3] contains sh01, and sprams[4] contains sh11. Reference: http://www.netlib.org/blas/drotmg.f This functions does not set any error status.
public static int cublasIsamax(int n, String x, int offsetx, int incx)
int cublasIsamax (int n, const float *x, int incx) finds the smallest index of the maximum magnitude element of single precision vector x; that is, the result is the first i, i = 0 to n - 1, that maximizes abs(x[1 + i * incx])). Input ----- n number of elements in input vector x single precision vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/isamax.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIsamax(int n, String x, int incx)
public static int cublasIsamin(int n, String x, int offsetx, int incx)
int cublasIsamin (int n, const float *x, int incx) finds the smallest index of the minimum magnitude element of single precision vector x; that is, the result is the first i, i = 0 to n - 1, that minimizes abs(x[1 + i * incx])). Input ----- n number of elements in input vector x single precision vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: http://www.netlib.org/scilib/blass.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIsamin(int n, String x, int incx)
public static void cublasSaxpy(int n, float alpha, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasSaxpy (int n, float alpha, const float *x, int incx, float *y, int incy) multiplies single precision vector x by single precision scalar alpha and adds the result to single precision vector y; that is, it overwrites single precision y with single precision alpha * x + y. For i = 0 to n - 1, it replaces y[ly + i * incy] with alpha * x[lx + i * incx] + y[ly + i * incy], where lx = 1 if incx >= 0, else lx = 1 +(1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors alpha single precision scalar multiplier x single precision vector with n elements incx storage spacing between elements of x y single precision vector with n elements incy storage spacing between elements of y Output ------ y single precision result (unchanged if n <= 0) Reference: http://www.netlib.org/blas/saxpy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSaxpy(int n, float alpha, String x, int incx, String y, int incy)
public static void cublasScopy(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasScopy (int n, const float *x, int incx, float *y, int incy) copies the single precision vector x to the single precision vector y. For i = 0 to n-1, copies x[lx + i * incx] to y[ly + i * incy], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors x single precision vector with n elements incx storage spacing between elements of x y single precision vector with n elements incy storage spacing between elements of y Output ------ y contains single precision vector x Reference: http://www.netlib.org/blas/scopy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasScopy(int n, String x, int incx, String y, int incy)
public static void cublasSrot(int n, String x, int offsetx, int incx, String y, int offsety, int incy, float sc, float ss)
void cublasSrot (int n, float *x, int incx, float *y, int incy, float sc, float ss) multiplies a 2x2 matrix ( sc ss) with the 2xn matrix ( transpose(x) ) (-ss sc) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 ... n - 1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. Input ----- n number of elements in input vectors x single precision vector with n elements incx storage spacing between elements of x y single precision vector with n elements incy storage spacing between elements of y sc element of rotation matrix ss element of rotation matrix Output ------ x rotated vector x (unchanged if n <= 0) y rotated vector y (unchanged if n <= 0) Reference http://www.netlib.org/blas/srot.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSrot(int n, String x, int incx, String y, int incy, float sc, float ss)
public static void cublasSrotg(String sa, int offsetsa, String sb, int offsetsb, String sc, int offsetsc, String ss, int offsetss)
void cublasSrotg (float *sa, float *sb, float *sc, float *ss) constructs the Givens tranformation ( sc ss ) G = ( ) , scˆ2 + ssˆ2 = 1, (-ss sc ) which zeros the second entry of the 2-vector transpose(sa, sb). The quantity r = (+/-) sqrt (saˆ2 + sbˆ2) overwrites sa in storage. The value of sb is overwritten by a value z which allows sc and ss to be recovered by the following algorithm: if z=1 set sc = 0.0 and ss = 1.0 if abs(z) < 1 set sc = sqrt(1-zˆ2) and ss = z if abs(z) > 1 set sc = 1/z and ss = sqrt(1-scˆ2) The function srot (n, x, incx, y, incy, sc, ss) normally is called next to apply the transformation to a 2 x n matrix. Input ----- sa single precision scalar sb single precision scalar Output ------ sa single precision r sb single precision z sc single precision result ss single precision result Reference: http://www.netlib.org/blas/srotg.f This function does not set any error status.
public static void cublasSrotg(String sa, String sb, String sc, String ss)
public static void cublasSscal(int n, float alpha, String x, int offsetx, int incx)
void sscal (int n, float alpha, float *x, int incx) replaces single precision vector x with single precision alpha * x. For i = 0 to n - 1, it replaces x[ix + i * incx] with alpha * x[ix + i * incx], where ix = 1 if incx >= 0, else ix = 1 + (1 - n) * incx. Input ----- n number of elements in input vectors alpha single precision scalar multiplier x single precision vector with n elements incx storage spacing between elements of x Output ------ x single precision result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/sscal.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSscal(int n, float alpha, String x, int incx)
public static void cublasSswap(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasSswap (int n, float *x, int incx, float *y, int incy) replaces single precision vector x with single precision alpha * x. For i = 0 to n - 1, it replaces x[ix + i * incx] with alpha * x[ix + i * incx], where ix = 1 if incx >= 0, else ix = 1 + (1 - n) * incx. Input ----- n number of elements in input vectors alpha single precision scalar multiplier x single precision vector with n elements incx storage spacing between elements of x Output ------ x single precision result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/sscal.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSswap(int n, String x, int incx, String y, int incy)
public static void cublasCaxpy(int n, JCuComplex alpha, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasCaxpy (int n, cuComplex alpha, const cuComplex *x, int incx, cuComplex *y, int incy) multiplies single-complex vector x by single-complex scalar alpha and adds the result to single-complex vector y; that is, it overwrites single-complex y with single-complex alpha * x + y. For i = 0 to n - 1, it replaces y[ly + i * incy] with alpha * x[lx + i * incx] + y[ly + i * incy], where lx = 0 if incx >= 0, else lx = 1 + (1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors alpha single-complex scalar multiplier x single-complex vector with n elements incx storage spacing between elements of x y single-complex vector with n elements incy storage spacing between elements of y Output ------ y single-complex result (unchanged if n <= 0) Reference: http://www.netlib.org/blas/caxpy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCaxpy(int n, JCuComplex alpha, String x, int incx, String y, int incy)
public static void cublasCcopy(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasCcopy (int n, const cuComplex *x, int incx, cuComplex *y, int incy) copies the single-complex vector x to the single-complex vector y. For i = 0 to n-1, copies x[lx + i * incx] to y[ly + i * incy], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors x single-complex vector with n elements incx storage spacing between elements of x y single-complex vector with n elements incy storage spacing between elements of y Output ------ y contains single complex vector x Reference: http://www.netlib.org/blas/ccopy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCcopy(int n, String x, int incx, String y, int incy)
public static void cublasCscal(int n, JCuComplex alpha, String x, int offsetx, int incx)
void cublasCscal (int n, cuComplex alpha, cuComplex *x, int incx) replaces single-complex vector x with single-complex alpha * x. For i = 0 to n - 1, it replaces x[ix + i * incx] with alpha * x[ix + i * incx], where ix = 1 if incx >= 0, else ix = 1 + (1 - n) * incx. Input ----- n number of elements in input vectors alpha single-complex scalar multiplier x single-complex vector with n elements incx storage spacing between elements of x Output ------ x single-complex result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/cscal.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCscal(int n, JCuComplex alpha, String x, int incx)
public static void cublasCrotg(String pca, int offsetpca, JCuComplex cb, String psc, int offsetpsc, String pcs, int offsetpcs)
void cublasCrotg (cuComplex *ca, cuComplex cb, float *sc, cuComplex *cs) constructs the complex Givens tranformation ( sc cs ) G = ( ) , scˆ2 + cabs(cs)ˆ2 = 1, (-cs sc ) which zeros the second entry of the complex 2-vector transpose(ca, cb). The quantity ca/cabs(ca)*norm(ca,cb) overwrites ca in storage. The function crot (n, x, incx, y, incy, sc, cs) is normally called next to apply the transformation to a 2 x n matrix. Input ----- ca single-precision complex precision scalar cb single-precision complex scalar Output ------ ca single-precision complex ca/cabs(ca)*norm(ca,cb) sc single-precision cosine component of rotation matrix cs single-precision complex sine component of rotation matrix Reference: http://www.netlib.org/blas/crotg.f This function does not set any error status.
public static void cublasCrotg(String pca, JCuComplex cb, String psc, String pcs)
public static void cublasCrot(int n, String x, int offsetx, int incx, String y, int offsety, int incy, float c, JCuComplex s)
void cublasCrot (int n, cuComplex *x, int incx, cuComplex *y, int incy, float sc, cuComplex cs) multiplies a 2x2 matrix ( sc cs) with the 2xn matrix ( transpose(x) ) (-conj(cs) sc) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 ... n - 1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. Input ----- n number of elements in input vectors x single-precision complex vector with n elements incx storage spacing between elements of x y single-precision complex vector with n elements incy storage spacing between elements of y sc single-precision cosine component of rotation matrix cs single-precision complex sine component of rotation matrix Output ------ x rotated single-precision complex vector x (unchanged if n <= 0) y rotated single-precision complex vector y (unchanged if n <= 0) Reference: http://netlib.org/lapack/explore-html/crot.f.html Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCrot(int n, String x, int incx, String y, int incy, float c, JCuComplex s)
public static void cublasCsrot(int n, String x, int offsetx, int incx, String y, int offsety, int incy, float c, float s)
void csrot (int n, cuComplex *x, int incx, cuCumplex *y, int incy, float c, float s) multiplies a 2x2 rotation matrix ( c s) with a 2xn matrix ( transpose(x) ) (-s c) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 ... n - 1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. Input ----- n number of elements in input vectors x single-precision complex vector with n elements incx storage spacing between elements of x y single-precision complex vector with n elements incy storage spacing between elements of y c cosine component of rotation matrix s sine component of rotation matrix Output ------ x rotated vector x (unchanged if n <= 0) y rotated vector y (unchanged if n <= 0) Reference http://www.netlib.org/blas/csrot.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCsrot(int n, String x, int incx, String y, int incy, float c, float s)
public static void cublasCsscal(int n, float alpha, String x, int offsetx, int incx)
void cublasCsscal (int n, float alpha, cuComplex *x, int incx) replaces single-complex vector x with single-complex alpha * x. For i = 0 to n - 1, it replaces x[ix + i * incx] with alpha * x[ix + i * incx], where ix = 1 if incx >= 0, else ix = 1 + (1 - n) * incx. Input ----- n number of elements in input vectors alpha single precision scalar multiplier x single-complex vector with n elements incx storage spacing between elements of x Output ------ x single-complex result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/csscal.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCsscal(int n, float alpha, String x, int incx)
public static void cublasCswap(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasCswap (int n, const cuComplex *x, int incx, cuComplex *y, int incy) interchanges the single-complex vector x with the single-complex vector y. For i = 0 to n-1, interchanges x[lx + i * incx] with y[ly + i * incy], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors x single-complex vector with n elements incx storage spacing between elements of x y single-complex vector with n elements incy storage spacing between elements of y Output ------ x contains-single complex vector y y contains-single complex vector x Reference: http://www.netlib.org/blas/cswap.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCswap(int n, String x, int incx, String y, int incy)
public static int cublasIcamax(int n, String x, int offsetx, int incx)
int cublasIcamax (int n, const float *x, int incx) finds the smallest index of the element having maximum absolute value in single-complex vector x; that is, the result is the first i, i = 0 to n - 1 that maximizes abs(real(x[1+i*incx]))+abs(imag(x[1 + i * incx])). Input ----- n number of elements in input vector x single-complex vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/icamax.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIcamax(int n, String x, int incx)
public static int cublasIcamin(int n, String x, int offsetx, int incx)
int cublasIcamin (int n, const float *x, int incx) finds the smallest index of the element having minimum absolute value in single-complex vector x; that is, the result is the first i, i = 0 to n - 1 that minimizes abs(real(x[1+i*incx]))+abs(imag(x[1 + i * incx])). Input ----- n number of elements in input vector x single-complex vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: see ICAMAX. Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIcamin(int n, String x, int incx)
public static void cublasSgbmv(char trans, int m, int n, int kl, int ku, float alpha, String A, int offsetA, int lda, String x, int offsetx, int incx, float beta, String y, int offsety, int incy)
void cublasSgbmv (char trans, int m, int n, int kl, int ku, float alpha, const float *A, int lda, const float *x, int incx, float beta, float *y, int incy) performs one of the matrix-vector operations y = alpha*op(A)*x + beta*y, op(A)=A or op(A) = transpose(A) alpha and beta are single precision scalars. x and y are single precision vectors. A is an m by n band matrix consisting of single precision elements with kl sub-diagonals and ku super-diagonals. Input ----- trans specifies op(A). If trans == 'N' or 'n', op(A) = A. If trans == 'T', 't', 'C', or 'c', op(A) = transpose(A) m specifies the number of rows of the matrix A. m must be at least zero. n specifies the number of columns of the matrix A. n must be at least zero. kl specifies the number of sub-diagonals of matrix A. It must be at least zero. ku specifies the number of super-diagonals of matrix A. It must be at least zero. alpha single precision scalar multiplier applied to op(A). A single precision array of dimensions (lda, n). The leading (kl + ku + 1) x n part of the array A must contain the band matrix A, supplied column by column, with the leading diagonal of the matrix in row (ku + 1) of the array, the first super-diagonal starting at position 2 in row ku, the first sub-diagonal starting at position 1 in row (ku + 2), and so on. Elements in the array A that do not correspond to elements in the band matrix (such as the top left ku x ku triangle) are not referenced. lda leading dimension of A. lda must be at least (kl + ku + 1). x single precision array of length at least (1+(n-1)*abs(incx)) when trans == 'N' or 'n' and at least (1+(m-1)*abs(incx)) otherwise. incx storage spacing between elements of x. incx must not be zero. beta single precision scalar multiplier applied to vector y. If beta is zero, y is not read. y single precision array of length at least (1+(m-1)*abs(incy)) when trans == 'N' or 'n' and at least (1+(n-1)*abs(incy)) otherwise. If beta is zero, y is not read. incy storage spacing between elements of y. incy must not be zero. Output ------ y updated according to y = alpha*op(A)*x + beta*y Reference: http://www.netlib.org/blas/sgbmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n, kl, or ku < 0; if incx or incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSgbmv(char trans, int m, int n, int kl, int ku, float alpha, String A, int lda, String x, int incx, float beta, String y, int incy)
public static void cublasSgemv(char trans, int m, int n, float alpha, String A, int offsetA, int lda, String x, int offsetx, int incx, float beta, String y, int offsety, int incy)
cublasSgemv (char trans, int m, int n, float alpha, const float *A, int lda, const float *x, int incx, float beta, float *y, int incy) performs one of the matrix-vector operations y = alpha * op(A) * x + beta * y, where op(A) is one of op(A) = A or op(A) = transpose(A) where alpha and beta are single precision scalars, x and y are single precision vectors, and A is an m x n matrix consisting of single precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array in which A is stored. Input ----- trans specifies op(A). If transa = 'n' or 'N', op(A) = A. If trans = trans = 't', 'T', 'c', or 'C', op(A) = transpose(A) m specifies the number of rows of the matrix A. m must be at least zero. n specifies the number of columns of the matrix A. n must be at least zero. alpha single precision scalar multiplier applied to op(A). A single precision array of dimensions (lda, n) if trans = 'n' or 'N'), and of dimensions (lda, m) otherwise. lda must be at least max(1, m) and at least max(1, n) otherwise. lda leading dimension of two-dimensional array used to store matrix A x single precision array of length at least (1 + (n - 1) * abs(incx)) when trans = 'N' or 'n' and at least (1 + (m - 1) * abs(incx)) otherwise. incx specifies the storage spacing between elements of x. incx must not be zero. beta single precision scalar multiplier applied to vector y. If beta is zero, y is not read. y single precision array of length at least (1 + (m - 1) * abs(incy)) when trans = 'N' or 'n' and at least (1 + (n - 1) * abs(incy)) otherwise. incy specifies the storage spacing between elements of x. incx must not be zero. Output ------ y updated according to alpha * op(A) * x + beta * y Reference: http://www.netlib.org/blas/sgemv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n are < 0, or if incx or incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSgemv(char trans, int m, int n, float alpha, String A, int lda, String x, int incx, float beta, String y, int incy)
public static void cublasSger(int m, int n, float alpha, String x, int offsetx, int incx, String y, int offsety, int incy, String A, int offsetA, int lda)
cublasSger (int m, int n, float alpha, const float *x, int incx, const float *y, int incy, float *A, int lda) performs the symmetric rank 1 operation A = alpha * x * transpose(y) + A, where alpha is a single precision scalar, x is an m element single precision vector, y is an n element single precision vector, and A is an m by n matrix consisting of single precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array used to store A. Input ----- m specifies the number of rows of the matrix A. It must be at least zero. n specifies the number of columns of the matrix A. It must be at least zero. alpha single precision scalar multiplier applied to x * transpose(y) x single precision array of length at least (1 + (m - 1) * abs(incx)) incx specifies the storage spacing between elements of x. incx must not be zero. y single precision array of length at least (1 + (n - 1) * abs(incy)) incy specifies the storage spacing between elements of y. incy must not be zero. A single precision array of dimensions (lda, n). lda leading dimension of two-dimensional array used to store matrix A Output ------ A updated according to A = alpha * x * transpose(y) + A Reference: http://www.netlib.org/blas/sger.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, incx == 0, incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSger(int m, int n, float alpha, String x, int incx, String y, int incy, String A, int lda)
public static void cublasSsbmv(char uplo, int n, int k, float alpha, String A, int offsetA, int lda, String x, int offsetx, int incx, float beta, String y, int offsety, int incy)
void cublasSsbmv (char uplo, int n, int k, float alpha, const float *A, int lda, const float *x, int incx, float beta, float *y, int incy) performs the matrix-vector operation y := alpha*A*x + beta*y alpha and beta are single precision scalars. x and y are single precision vectors with n elements. A is an n x n symmetric band matrix consisting of single precision elements, with k super-diagonals and the same number of sub-diagonals. Input ----- uplo specifies whether the upper or lower triangular part of the symmetric band matrix A is being supplied. If uplo == 'U' or 'u', the upper triangular part is being supplied. If uplo == 'L' or 'l', the lower triangular part is being supplied. n specifies the number of rows and the number of columns of the symmetric matrix A. n must be at least zero. k specifies the number of super-diagonals of matrix A. Since the matrix is symmetric, this is also the number of sub-diagonals. k must be at least zero. alpha single precision scalar multiplier applied to A*x. A single precision array of dimensions (lda, n). When uplo == 'U' or 'u', the leading (k + 1) x n part of array A must contain the upper triangular band of the symmetric matrix, supplied column by column, with the leading diagonal of the matrix in row (k+1) of the array, the first super-diagonal starting at position 2 in row k, and so on. The top left k x k triangle of the array A is not referenced. When uplo == 'L' or 'l', the leading (k + 1) x n part of the array A must contain the lower triangular band part of the symmetric matrix, supplied column by column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal starting at position 1 in row 2, and so on. The bottom right k x k triangle of the array A is not referenced. lda leading dimension of A. lda must be at least (k + 1). x single precision array of length at least (1 + (n - 1) * abs(incx)). incx storage spacing between elements of x. incx must not be zero. beta single precision scalar multiplier applied to vector y. If beta is zero, y is not read. y single precision array of length at least (1 + (n - 1) * abs(incy)). If beta is zero, y is not read. incy storage spacing between elements of y. incy must not be zero. Output ------ y updated according to alpha*A*x + beta*y Reference: http://www.netlib.org/blas/ssbmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_INVALID_VALUE if k or n < 0, or if incx or incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsbmv(char uplo, int n, int k, float alpha, String A, int lda, String x, int incx, float beta, String y, int incy)
public static void cublasSspmv(char uplo, int n, float alpha, String AP, int offsetAP, String x, int offsetx, int incx, float beta, String y, int offsety, int incy)
void cublasSspmv (char uplo, int n, float alpha, const float *AP, const float *x, int incx, float beta, float *y, int incy) performs the matrix-vector operation y = alpha * A * x + beta * y Alpha and beta are single precision scalars, and x and y are single precision vectors with n elements. A is a symmetric n x n matrix consisting of single precision elements that is supplied in packed form. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array AP. If uplo == 'U' or 'u', then the upper triangular part of A is supplied in AP. If uplo == 'L' or 'l', then the lower triangular part of A is supplied in AP. n specifies the number of rows and columns of the matrix A. It must be at least zero. alpha single precision scalar multiplier applied to A*x. AP single precision array with at least ((n * (n + 1)) / 2) elements. If uplo == 'U' or 'u', the array AP contains the upper triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i <= j, then A[i,j] is stored is AP[i+(j*(j+1)/2)]. If uplo == 'L' or 'L', the array AP contains the lower triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i >= j, then A[i,j] is stored in AP[i+((2*n-j+1)*j)/2]. x single precision array of length at least (1 + (n - 1) * abs(incx)). incx storage spacing between elements of x. incx must not be zero. beta single precision scalar multiplier applied to vector y; y single precision array of length at least (1 + (n - 1) * abs(incy)). If beta is zero, y is not read. incy storage spacing between elements of y. incy must not be zero. Output ------ y updated according to y = alpha*A*x + beta*y Reference: http://www.netlib.org/blas/sspmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, or if incx or incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSspmv(char uplo, int n, float alpha, String AP, String x, int incx, float beta, String y, int incy)
public static void cublasSspr(char uplo, int n, float alpha, String x, int offsetx, int incx, String AP, int offsetAP)
void cublasSspr (char uplo, int n, float alpha, const float *x, int incx, float *AP) performs the symmetric rank 1 operation A = alpha * x * transpose(x) + A, where alpha is a single precision scalar and x is an n element single precision vector. A is a symmetric n x n matrix consisting of single precision elements that is supplied in packed form. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array AP. If uplo == 'U' or 'u', then the upper triangular part of A is supplied in AP. If uplo == 'L' or 'l', then the lower triangular part of A is supplied in AP. n specifies the number of rows and columns of the matrix A. It must be at least zero. alpha single precision scalar multiplier applied to x * transpose(x). x single precision array of length at least (1 + (n - 1) * abs(incx)). incx storage spacing between elements of x. incx must not be zero. AP single precision array with at least ((n * (n + 1)) / 2) elements. If uplo == 'U' or 'u', the array AP contains the upper triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i <= j, then A[i,j] is stored is AP[i+(j*(j+1)/2)]. If uplo == 'L' or 'L', the array AP contains the lower triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i >= j, then A[i,j] is stored in AP[i+((2*n-j+1)*j)/2]. Output ------ A updated according to A = alpha * x * transpose(x) + A Reference: http://www.netlib.org/blas/sspr.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, or incx == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSspr(char uplo, int n, float alpha, String x, int incx, String AP)
public static void cublasSspr2(char uplo, int n, float alpha, String x, int offsetx, int incx, String y, int offsety, int incy, String AP, int offsetAP)
void cublasSspr2 (char uplo, int n, float alpha, const float *x, int incx, const float *y, int incy, float *AP) performs the symmetric rank 2 operation A = alpha*x*transpose(y) + alpha*y*transpose(x) + A, where alpha is a single precision scalar, and x and y are n element single precision vectors. A is a symmetric n x n matrix consisting of single precision elements that is supplied in packed form. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo == 'U' or 'u', then only the upper triangular part of A may be referenced and the lower triangular part of A is inferred. If uplo == 'L' or 'l', then only the lower triangular part of A may be referenced and the upper triangular part of A is inferred. n specifies the number of rows and columns of the matrix A. It must be at least zero. alpha single precision scalar multiplier applied to x * transpose(y) + y * transpose(x). x single precision array of length at least (1 + (n - 1) * abs (incx)). incx storage spacing between elements of x. incx must not be zero. y single precision array of length at least (1 + (n - 1) * abs (incy)). incy storage spacing between elements of y. incy must not be zero. AP single precision array with at least ((n * (n + 1)) / 2) elements. If uplo == 'U' or 'u', the array AP contains the upper triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i <= j, then A[i,j] is stored is AP[i+(j*(j+1)/2)]. If uplo == 'L' or 'L', the array AP contains the lower triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i >= j, then A[i,j] is stored in AP[i+((2*n-j+1)*j)/2]. Output ------ A updated according to A = alpha*x*transpose(y)+alpha*y*transpose(x)+A Reference: http://www.netlib.org/blas/sspr2.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, incx == 0, incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSspr2(char uplo, int n, float alpha, String x, int incx, String y, int incy, String AP)
public static void cublasSsymv(char uplo, int n, float alpha, String A, int offsetA, int lda, String x, int offsetx, int incx, float beta, String y, int offsety, int incy)
void cublasSsymv (char uplo, int n, float alpha, const float *A, int lda, const float *x, int incx, float beta, float *y, int incy) performs the matrix-vector operation y = alpha*A*x + beta*y Alpha and beta are single precision scalars, and x and y are single precision vectors, each with n elements. A is a symmetric n x n matrix consisting of single precision elements that is stored in either upper or lower storage mode. Input ----- uplo specifies whether the upper or lower triangular part of the array A is to be referenced. If uplo == 'U' or 'u', the symmetric matrix A is stored in upper storage mode, i.e. only the upper triangular part of A is to be referenced while the lower triangular part of A is to be inferred. If uplo == 'L' or 'l', the symmetric matrix A is stored in lower storage mode, i.e. only the lower triangular part of A is to be referenced while the upper triangular part of A is to be inferred. n specifies the number of rows and the number of columns of the symmetric matrix A. n must be at least zero. alpha single precision scalar multiplier applied to A*x. A single precision array of dimensions (lda, n). If uplo == 'U' or 'u', the leading n x n upper triangular part of the array A must contain the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced. If uplo == 'L' or 'l', the leading n x n lower triangular part of the array A must contain the lower triangular part of the symmetric matrix and the strictly upper triangular part of A is not referenced. lda leading dimension of A. It must be at least max (1, n). x single precision array of length at least (1 + (n - 1) * abs(incx)). incx storage spacing between elements of x. incx must not be zero. beta single precision scalar multiplier applied to vector y. y single precision array of length at least (1 + (n - 1) * abs(incy)). If beta is zero, y is not read. incy storage spacing between elements of y. incy must not be zero. Output ------ y updated according to y = alpha*A*x + beta*y Reference: http://www.netlib.org/blas/ssymv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, or if incx or incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsymv(char uplo, int n, float alpha, String A, int lda, String x, int incx, float beta, String y, int incy)
public static void cublasSsyr(char uplo, int n, float alpha, String x, int offsetx, int incx, String A, int offsetA, int lda)
void cublasSsyr (char uplo, int n, float alpha, const float *x, int incx, float *A, int lda) performs the symmetric rank 1 operation A = alpha * x * transpose(x) + A, where alpha is a single precision scalar, x is an n element single precision vector and A is an n x n symmetric matrix consisting of single precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array containing A. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo = 'U' or 'u', then only the upper triangular part of A may be referenced. If uplo = 'L' or 'l', then only the lower triangular part of A may be referenced. n specifies the number of rows and columns of the matrix A. It must be at least 0. alpha single precision scalar multiplier applied to x * transpose(x) x single precision array of length at least (1 + (n - 1) * abs(incx)) incx specifies the storage spacing between elements of x. incx must not be zero. A single precision array of dimensions (lda, n). If uplo = 'U' or 'u', then A must contain the upper triangular part of a symmetric matrix, and the strictly lower triangular part is not referenced. If uplo = 'L' or 'l', then A contains the lower triangular part of a symmetric matrix, and the strictly upper triangular part is not referenced. lda leading dimension of the two-dimensional array containing A. lda must be at least max(1, n). Output ------ A updated according to A = alpha * x * transpose(x) + A Reference: http://www.netlib.org/blas/ssyr.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, or incx == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsyr(char uplo, int n, float alpha, String x, int incx, String A, int lda)
public static void cublasSsyr2(char uplo, int n, float alpha, String x, int offsetx, int incx, String y, int offsety, int incy, String A, int offsetA, int lda)
void cublasSsyr2 (char uplo, int n, float alpha, const float *x, int incx, const float *y, int incy, float *A, int lda) performs the symmetric rank 2 operation A = alpha*x*transpose(y) + alpha*y*transpose(x) + A, where alpha is a single precision scalar, x and y are n element single precision vector and A is an n by n symmetric matrix consisting of single precision elements. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo == 'U' or 'u', then only the upper triangular part of A may be referenced and the lower triangular part of A is inferred. If uplo == 'L' or 'l', then only the lower triangular part of A may be referenced and the upper triangular part of A is inferred. n specifies the number of rows and columns of the matrix A. It must be at least zero. alpha single precision scalar multiplier applied to x * transpose(y) + y * transpose(x). x single precision array of length at least (1 + (n - 1) * abs (incx)). incx storage spacing between elements of x. incx must not be zero. y single precision array of length at least (1 + (n - 1) * abs (incy)). incy storage spacing between elements of y. incy must not be zero. A single precision array of dimensions (lda, n). If uplo == 'U' or 'u', then A must contains the upper triangular part of a symmetric matrix, and the strictly lower triangular parts is not referenced. If uplo == 'L' or 'l', then A contains the lower triangular part of a symmetric matrix, and the strictly upper triangular part is not referenced. lda leading dimension of A. It must be at least max(1, n). Output ------ A updated according to A = alpha*x*transpose(y)+alpha*y*transpose(x)+A Reference: http://www.netlib.org/blas/ssyr2.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, incx == 0, incy == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsyr2(char uplo, int n, float alpha, String x, int incx, String y, int incy, String A, int lda)
public static void cublasStbmv(char uplo, char trans, char diag, int n, int k, String A, int offsetA, int lda, String x, int offsetx, int incx)
void cublasStbmv (char uplo, char trans, char diag, int n, int k, const float *A, int lda, float *x, int incx) performs one of the matrix-vector operations x = op(A) * x, where op(A) = A or op(A) = transpose(A). x is an n-element single precision vector, and A is an n x n, unit or non-unit upper or lower triangular band matrix consisting of single precision elements. Input ----- uplo specifies whether the matrix A is an upper or lower triangular band matrix. If uplo == 'U' or 'u', A is an upper triangular band matrix. If uplo == 'L' or 'l', A is a lower triangular band matrix. trans specifies op(A). If transa == 'N' or 'n', op(A) = A. If trans == 'T', 't', 'C', or 'c', op(A) = transpose(A). diag specifies whether or not matrix A is unit triangular. If diag == 'U' or 'u', A is assumed to be unit triangular. If diag == 'N' or 'n', A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. n must be at least zero. In the current implementation n must not exceed 4070. k specifies the number of super- or sub-diagonals. If uplo == 'U' or 'u', k specifies the number of super-diagonals. If uplo == 'L' or 'l', k specifies the number of sub-diagonals. k must at least be zero. A single precision array of dimension (lda, n). If uplo == 'U' or 'u', the leading (k + 1) x n part of the array A must contain the upper triangular band matrix, supplied column by column, with the leading diagonal of the matrix in row (k + 1) of the array, the first super-diagonal starting at position 2 in row k, and so on. The top left k x k triangle of the array A is not referenced. If uplo == 'L' or 'l', the leading (k + 1) x n part of the array A must constain the lower triangular band matrix, supplied column by column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal startingat position 1 in row 2, and so on. The bottom right k x k triangle of the array is not referenced. lda is the leading dimension of A. It must be at least (k + 1). x single precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the source vector. On exit, x is overwritten with the result vector. incx specifies the storage spacing for elements of x. incx must not be zero. Output ------ x updated according to x = op(A) * x Reference: http://www.netlib.org/blas/stbmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, n > 4070, k < 0, or incx == 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStbmv(char uplo, char trans, char diag, int n, int k, String A, int lda, String x, int incx)
public static void cublasStbsv(char uplo, char trans, char diag, int n, int k, String A, int offsetA, int lda, String x, int offsetx, int incx)
void cublasStbsv (char uplo, char trans, char diag, int n, int k, const float *A, int lda, float *X, int incx) solves one of the systems of equations op(A)*x = b, where op(A) is either op(A) = A or op(A) = transpose(A). b and x are n-element vectors, and A is an n x n unit or non-unit, upper or lower triangular band matrix with k + 1 diagonals. No test for singularity or near-singularity is included in this function. Such tests must be performed before calling this function. Input ----- uplo specifies whether the matrix is an upper or lower triangular band matrix as follows: If uplo == 'U' or 'u', A is an upper triangular band matrix. If uplo == 'L' or 'l', A is a lower triangular band matrix. trans specifies op(A). If trans == 'N' or 'n', op(A) = A. If trans == 'T', 't', 'C', or 'c', op(A) = transpose(A). diag specifies whether A is unit triangular. If diag == 'U' or 'u', A is assumed to be unit triangular; thas is, diagonal elements are not read and are assumed to be unity. If diag == 'N' or 'n', A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. n must be at least zero. k specifies the number of super- or sub-diagonals. If uplo == 'U' or 'u', k specifies the number of super-diagonals. If uplo == 'L' or 'l', k specifies the number of sub-diagonals. k must be at least zero. A single precision array of dimension (lda, n). If uplo == 'U' or 'u', the leading (k + 1) x n part of the array A must contain the upper triangular band matrix, supplied column by column, with the leading diagonal of the matrix in row (k + 1) of the array, the first super- diagonal starting at position 2 in row k, and so on. The top left k x k triangle of the array A is not referenced. If uplo == 'L' or 'l', the leading (k + 1) x n part of the array A must constain the lower triangular band matrix, supplied column by column, with the leading diagonal of the matrix in row 1 of the array, the first sub-diagonal starting at position 1 in row 2, and so on. The bottom right k x k triangle of the array is not referenced. x single precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the n-element right-hand side vector b. On exit, it is overwritten with the solution vector x. incx storage spacing between elements of x. incx must not be zero. Output ------ x updated to contain the solution vector x that solves op(A) * x = b. Reference: http://www.netlib.org/blas/stbsv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0, n < 0, or n > 4070 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStbsv(char uplo, char trans, char diag, int n, int k, String A, int lda, String x, int incx)
public static void cublasStpmv(char uplo, char trans, char diag, int n, String AP, int offsetAP, String x, int offsetx, int incx)
void cublasStpmv (char uplo, char trans, char diag, int n, const float *AP, float *x, int incx); performs one of the matrix-vector operations x = op(A) * x, where op(A) = A, or op(A) = transpose(A). x is an n element single precision vector, and A is an n x n, unit or non-unit, upper or lower triangular matrix composed of single precision elements. Input ----- uplo specifies whether the matrix A is an upper or lower triangular matrix. If uplo == 'U' or 'u', then A is an upper triangular matrix. If uplo == 'L' or 'l', then A is a lower triangular matrix. trans specifies op(A). If transa == 'N' or 'n', op(A) = A. If trans == 'T', 't', 'C', or 'c', op(A) = transpose(A) diag specifies whether or not matrix A is unit triangular. If diag == 'U' or 'u', A is assumed to be unit triangular. If diag == 'N' or 'n', A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. n must be at least zero. AP single precision array with at least ((n * (n + 1)) / 2) elements. If uplo == 'U' or 'u', the array AP contains the upper triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i <= j, then A[i,j] is stored in AP[i+(j*(j+1)/2)]. If uplo == 'L' or 'L', the array AP contains the lower triangular part of the symmetric matrix A, packed sequentially, column by column; that is, if i >= j, then A[i,j] is stored in AP[i+((2*n-j+1)*j)/2]. x single precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the source vector. On exit, x is overwritten with the result vector. incx specifies the storage spacing for elements of x. incx must not be zero. Output ------ x updated according to x = op(A) * x, Reference: http://www.netlib.org/blas/stpmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0 or if n < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStpmv(char uplo, char trans, char diag, int n, String AP, String x, int incx)
public static void cublasStpsv(char uplo, char trans, char diag, int n, String AP, int offsetAP, String x, int offsetx, int incx)
void cublasStpsv (char uplo, char trans, char diag, int n, const float *AP, float *X, int incx) solves one of the systems of equations op(A)*x = b, where op(A) is either op(A) = A or op(A) = transpose(A). b and x are n element vectors, and A is an n x n unit or non-unit, upper or lower triangular matrix. No test for singularity or near-singularity is included in this function. Such tests must be performed before calling this function. Input ----- uplo specifies whether the matrix is an upper or lower triangular matrix as follows: If uplo == 'U' or 'u', A is an upper triangluar matrix. If uplo == 'L' or 'l', A is a lower triangular matrix. trans specifies op(A). If trans == 'N' or 'n', op(A) = A. If trans == 'T', 't', 'C', or 'c', op(A) = transpose(A). diag specifies whether A is unit triangular. If diag == 'U' or 'u', A is assumed to be unit triangular; thas is, diagonal elements are not read and are assumed to be unity. If diag == 'N' or 'n', A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. n must be at least zero. In the current implementation n must not exceed 4070. AP single precision array with at least ((n*(n+1))/2) elements. If uplo == 'U' or 'u', the array AP contains the upper triangular matrix A, packed sequentially, column by column; that is, if i <= j, then A[i,j] is stored is AP[i+(j*(j+1)/2)]. If uplo == 'L' or 'L', the array AP contains the lower triangular matrix A, packed sequentially, column by column; that is, if i >= j, then A[i,j] is stored in AP[i+((2*n-j+1)*j)/2]. When diag = 'U' or 'u', the diagonal elements of A are not referenced and are assumed to be unity. x single precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the n-element right-hand side vector b. On exit, it is overwritten with the solution vector x. incx storage spacing between elements of x. It must not be zero. Output ------ x updated to contain the solution vector x that solves op(A) * x = b. Reference: http://www.netlib.org/blas/stpsv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0, n < 0, or n > 4070 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStpsv(char uplo, char trans, char diag, int n, String AP, String x, int incx)
public static void cublasStrmv(char uplo, char trans, char diag, int n, String A, int offsetA, int lda, String x, int offsetx, int incx)
void cublasStrmv (char uplo, char trans, char diag, int n, const float *A, int lda, float *x, int incx); performs one of the matrix-vector operations x = op(A) * x, where op(A) = = A, or op(A) = transpose(A). x is an n-element single precision vector, and A is an n x n, unit or non-unit, upper or lower, triangular matrix composed of single precision elements. Input ----- uplo specifies whether the matrix A is an upper or lower triangular matrix. If uplo = 'U' or 'u', then A is an upper triangular matrix. If uplo = 'L' or 'l', then A is a lower triangular matrix. trans specifies op(A). If transa = 'N' or 'n', op(A) = A. If trans = 'T', 't', 'C', or 'c', op(A) = transpose(A) diag specifies whether or not matrix A is unit triangular. If diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. n must be at least zero. A single precision array of dimension (lda, n). If uplo = 'U' or 'u', the leading n x n upper triangular part of the array A must contain the upper triangular matrix and the strictly lower triangular part of A is not referenced. If uplo = 'L' or 'l', the leading n x n lower triangular part of the array A must contain the lower triangular matrix and the strictly upper triangular part of A is not referenced. When diag = 'U' or 'u', the diagonal elements of A are not referenced either, but are are assumed to be unity. lda is the leading dimension of A. It must be at least max (1, n). x single precision array of length at least (1 + (n - 1) * abs(incx) ). On entry, x contains the source vector. On exit, x is overwritten with the result vector. incx specifies the storage spacing for elements of x. incx must not be zero. Output ------ x updated according to x = op(A) * x, Reference: http://www.netlib.org/blas/strmv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0 or if n < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStrmv(char uplo, char trans, char diag, int n, String A, int lda, String x, int incx)
public static void cublasStrsv(char uplo, char trans, char diag, int n, String A, int offsetA, int lda, String x, int offsetx, int incx)
void cublasStrsv (char uplo, char trans, char diag, int n, const float *A, int lda, float *x, int incx) solves a system of equations op(A) * x = b, where op(A) is either A or transpose(A). b and x are single precision vectors consisting of n elements, and A is an n x n matrix composed of a unit or non-unit, upper or lower triangular matrix. Matrix A is stored in column major format, and lda is the leading dimension of the two-diemnsional array containing A. No test for singularity or near-singularity is included in this function. Such tests must be performed before calling this function. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo = 'U' or 'u', then only the upper triangular part of A may be referenced. If uplo = 'L' or 'l', then only the lower triangular part of A may be referenced. trans specifies op(A). If transa = 'n' or 'N', op(A) = A. If transa = 't', 'T', 'c', or 'C', op(A) = transpose(A) diag specifies whether or not A is a unit triangular matrix like so: if diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', then A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. It must be at least 0. In the current implementation n must be <= 4070. A is a single precision array of dimensions (lda, n). If uplo = 'U' or 'u', then A must contains the upper triangular part of a symmetric matrix, and the strictly lower triangular parts is not referenced. If uplo = 'L' or 'l', then A contains the lower triangular part of a symmetric matrix, and the strictly upper triangular part is not referenced. lda is the leading dimension of the two-dimensional array containing A. lda must be at least max(1, n). x single precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the n element right-hand side vector b. On exit, it is overwritten with the solution vector x. incx specifies the storage spacing between elements of x. incx must not be zero. Output ------ x updated to contain the solution vector x that solves op(A) * x = b. Reference: http://www.netlib.org/blas/strsv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0 or if n < 0 or n > 4070 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStrsv(char uplo, char trans, char diag, int n, String A, int lda, String x, int incx)
public static void cublasSgemm(char transa, char transb, int m, int n, int k, float alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, float beta, String C, int offsetC, int ldc)
void cublasSgemm (char transa, char transb, int m, int n, int k, float alpha, const float *A, int lda, const float *B, int ldb, float beta, float *C, int ldc) computes the product of matrix A and matrix B, multiplies the result by a scalar alpha, and adds the sum to the product of matrix C and scalar beta. sgemm() performs one of the matrix-matrix operations: C = alpha * op(A) * op(B) + beta * C, where op(X) is one of op(X) = X or op(X) = transpose(X) alpha and beta are single precision scalars, and A, B and C are matrices consisting of single precision elements, with op(A) an m x k matrix, op(B) a k x n matrix, and C an m x n matrix. Matrices A, B, and C are stored in column major format, and lda, ldb, and ldc are the leading dimensions of the two-dimensional arrays containing A, B, and C. Input ----- transa specifies op(A). If transa = 'n' or 'N', op(A) = A. If transa = 't', 'T', 'c', or 'C', op(A) = transpose(A) transb specifies op(B). If transb = 'n' or 'N', op(B) = B. If transb = 't', 'T', 'c', or 'C', op(B) = transpose(B) m number of rows of matrix op(A) and rows of matrix C n number of columns of matrix op(B) and number of columns of C k number of columns of matrix op(A) and number of rows of op(B) alpha single precision scalar multiplier applied to op(A)op(B) A single precision array of dimensions (lda, k) if transa = 'n' or 'N'), and of dimensions (lda, m) otherwise. When transa = 'N' or 'n' then lda must be at least max( 1, m ), otherwise lda must be at least max(1, k). lda leading dimension of two-dimensional array used to store matrix A B single precision array of dimensions (ldb, n) if transb = 'n' or 'N'), and of dimensions (ldb, k) otherwise. When transb = 'N' or 'n' then ldb must be at least max (1, k), otherwise ldb must be at least max (1, n). ldb leading dimension of two-dimensional array used to store matrix B beta single precision scalar multiplier applied to C. If 0, C does not have to be a valid input C single precision array of dimensions (ldc, n). ldc must be at least max (1, m). ldc leading dimension of two-dimensional array used to store matrix C Output ------ C updated based on C = alpha * op(A)*op(B) + beta * C Reference: http://www.netlib.org/blas/sgemm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if any of m, n, or k are < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSgemm(char transa, char transb, int m, int n, int k, float alpha, String A, int lda, String B, int ldb, float beta, String C, int ldc)
public static void cublasSsymm(char side, char uplo, int m, int n, float alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, float beta, String C, int offsetC, int ldc)
void cublasSsymm (char side, char uplo, int m, int n, float alpha, const float *A, int lda, const float *B, int ldb, float beta, float *C, int ldc); performs one of the matrix-matrix operations C = alpha * A * B + beta * C, or C = alpha * B * A + beta * C, where alpha and beta are single precision scalars, A is a symmetric matrix consisting of single precision elements and stored in either lower or upper storage mode, and B and C are m x n matrices consisting of single precision elements. Input ----- side specifies whether the symmetric matrix A appears on the left side hand side or right hand side of matrix B, as follows. If side == 'L' or 'l', then C = alpha * A * B + beta * C. If side = 'R' or 'r', then C = alpha * B * A + beta * C. uplo specifies whether the symmetric matrix A is stored in upper or lower storage mode, as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be referenced, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. m specifies the number of rows of the matrix C, and the number of rows of matrix B. It also specifies the dimensions of symmetric matrix A when side == 'L' or 'l'. m must be at least zero. n specifies the number of columns of the matrix C, and the number of columns of matrix B. It also specifies the dimensions of symmetric matrix A when side == 'R' or 'r'. n must be at least zero. alpha single precision scalar multiplier applied to A * B, or B * A A single precision array of dimensions (lda, ka), where ka is m when side == 'L' or 'l' and is n otherwise. If side == 'L' or 'l' the leading m x m part of array A must contain the symmetric matrix, such that when uplo == 'U' or 'u', the leading m x m part stores the upper triangular part of the symmetric matrix, and the strictly lower triangular part of A is not referenced, and when uplo == 'U' or 'u', the leading m x m part stores the lower triangular part of the symmetric matrix and the strictly upper triangular part is not referenced. If side == 'R' or 'r' the leading n x n part of array A must contain the symmetric matrix, such that when uplo == 'U' or 'u', the leading n x n part stores the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced, and when uplo == 'U' or 'u', the leading n x n part stores the lower triangular part of the symmetric matrix and the strictly upper triangular part is not referenced. lda leading dimension of A. When side == 'L' or 'l', it must be at least max(1, m) and at least max(1, n) otherwise. B single precision array of dimensions (ldb, n). On entry, the leading m x n part of the array contains the matrix B. ldb leading dimension of B. It must be at least max (1, m). beta single precision scalar multiplier applied to C. If beta is zero, C does not have to be a valid input C single precision array of dimensions (ldc, n) ldc leading dimension of C. Must be at least max(1, m) Output ------ C updated according to C = alpha * A * B + beta * C, or C = alpha * B * A + beta * C Reference: http://www.netlib.org/blas/ssymm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n are < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsymm(char side, char uplo, int m, int n, float alpha, String A, int lda, String B, int ldb, float beta, String C, int ldc)
public static void cublasSsyrk(char uplo, char trans, int n, int k, float alpha, String A, int offsetA, int lda, float beta, String C, int offsetC, int ldc)
void cublasSsyrk (char uplo, char trans, int n, int k, float alpha, const float *A, int lda, float beta, float *C, int ldc) performs one of the symmetric rank k operations C = alpha * A * transpose(A) + beta * C, or C = alpha * transpose(A) * A + beta * C. Alpha and beta are single precision scalars. C is an n x n symmetric matrix consisting of single precision elements and stored in either lower or upper storage mode. A is a matrix consisting of single precision elements with dimension of n x k in the first case, and k x n in the second case. Input ----- uplo specifies whether the symmetric matrix C is stored in upper or lower storage mode as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be referenced, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. trans specifies the operation to be performed. If trans == 'N' or 'n', C = alpha * transpose(A) + beta * C. If trans == 'T', 't', 'C', or 'c', C = transpose(A) * A + beta * C. n specifies the number of rows and the number columns of matrix C. If trans == 'N' or 'n', n specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', n specifies the columns of matrix A. n must be at least zero. k If trans == 'N' or 'n', k specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', k specifies the number of rows of matrix A. k must be at least zero. alpha single precision scalar multiplier applied to A * transpose(A) or transpose(A) * A. A single precision array of dimensions (lda, ka), where ka is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array A must contain the matrix A, otherwise the leading k x n part of the array must contains the matrix A. lda leading dimension of A. When trans == 'N' or 'n' then lda must be at least max(1, n). Otherwise lda must be at least max(1, k). beta single precision scalar multiplier applied to C. If beta izs zero, C does not have to be a valid input C single precision array of dimensions (ldc, n). If uplo == 'U' or 'u', the leading n x n triangular part of the array C must contain the upper triangular part of the symmetric matrix C and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of C is overwritten by the upper trinagular part of the updated matrix. If uplo == 'L' or 'l', the leading n x n triangular part of the array C must contain the lower triangular part of the symmetric matrix C and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of C is overwritten by the lower trinagular part of the updated matrix. ldc leading dimension of C. It must be at least max(1, n). Output ------ C updated according to C = alpha * A * transpose(A) + beta * C, or C = alpha * transpose(A) * A + beta * C Reference: http://www.netlib.org/blas/ssyrk.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0 or k < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsyrk(char uplo, char trans, int n, int k, float alpha, String A, int lda, float beta, String C, int ldc)
public static void cublasSsyr2k(char uplo, char trans, int n, int k, float alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, float beta, String C, int offsetC, int ldc)
void cublasSsyr2k (char uplo, char trans, int n, int k, float alpha, const float *A, int lda, const float *B, int ldb, float beta, float *C, int ldc) performs one of the symmetric rank 2k operations C = alpha * A * transpose(B) + alpha * B * transpose(A) + beta * C, or C = alpha * transpose(A) * B + alpha * transpose(B) * A + beta * C. Alpha and beta are single precision scalars. C is an n x n symmetric matrix consisting of single precision elements and stored in either lower or upper storage mode. A and B are matrices consisting of single precision elements with dimension of n x k in the first case, and k x n in the second case. Input ----- uplo specifies whether the symmetric matrix C is stored in upper or lower storage mode, as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be references, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. trans specifies the operation to be performed. If trans == 'N' or 'n', C = alpha * A * transpose(B) + alpha * B * transpose(A) + beta * C, If trans == 'T', 't', 'C', or 'c', C = alpha * transpose(A) * B + alpha * transpose(B) * A + beta * C. n specifies the number of rows and the number columns of matrix C. If trans == 'N' or 'n', n specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', n specifies the columns of matrix A. n must be at least zero. k If trans == 'N' or 'n', k specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', k specifies the number of rows of matrix A. k must be at least zero. alpha single precision scalar multiplier. A single precision array of dimensions (lda, ka), where ka is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array A must contain the matrix A, otherwise the leading k x n part of the array must contain the matrix A. lda leading dimension of A. When trans == 'N' or 'n' then lda must be at least max(1, n). Otherwise lda must be at least max(1,k). B single precision array of dimensions (lda, kb), where kb is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array B must contain the matrix B, otherwise the leading k x n part of the array must contain the matrix B. ldb leading dimension of N. When trans == 'N' or 'n' then ldb must be at least max(1, n). Otherwise ldb must be at least max(1, k). beta single precision scalar multiplier applied to C. If beta is zero, C does not have to be a valid input. C single precision array of dimensions (ldc, n). If uplo == 'U' or 'u', the leading n x n triangular part of the array C must contain the upper triangular part of the symmetric matrix C and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of C is overwritten by the upper trinagular part of the updated matrix. If uplo == 'L' or 'l', the leading n x n triangular part of the array C must contain the lower triangular part of the symmetric matrix C and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of C is overwritten by the lower trinagular part of the updated matrix. ldc leading dimension of C. Must be at least max(1, n). Output ------ C updated according to alpha*A*transpose(B) + alpha*B*transpose(A) + beta*C or alpha*transpose(A)*B + alpha*transpose(B)*A + beta*C Reference: http://www.netlib.org/blas/ssyr2k.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0 or k < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasSsyr2k(char uplo, char trans, int n, int k, float alpha, String A, int lda, String B, int ldb, float beta, String C, int ldc)
public static void cublasStrmm(char side, char uplo, char transa, char diag, int m, int n, float alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb)
void cublasStrmm (char side, char uplo, char transa, char diag, int m, int n, float alpha, const float *A, int lda, const float *B, int ldb) performs one of the matrix-matrix operations B = alpha * op(A) * B, or B = alpha * B * op(A) where alpha is a single-precision scalar, B is an m x n matrix composed of single precision elements, and A is a unit or non-unit, upper or lower, triangular matrix composed of single precision elements. op(A) is one of op(A) = A or op(A) = transpose(A) Matrices A and B are stored in column major format, and lda and ldb are the leading dimensions of the two-dimensonials arrays that contain A and B, respectively. Input ----- side specifies whether op(A) multiplies B from the left or right. If side = 'L' or 'l', then B = alpha * op(A) * B. If side = 'R' or 'r', then B = alpha * B * op(A). uplo specifies whether the matrix A is an upper or lower triangular matrix. If uplo = 'U' or 'u', A is an upper triangular matrix. If uplo = 'L' or 'l', A is a lower triangular matrix. transa specifies the form of op(A) to be used in the matrix multiplication. If transa = 'N' or 'n', then op(A) = A. If transa = 'T', 't', 'C', or 'c', then op(A) = transpose(A). diag specifies whether or not A is unit triangular. If diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', A is not assumed to be unit triangular. m the number of rows of matrix B. m must be at least zero. n the number of columns of matrix B. n must be at least zero. alpha single precision scalar multiplier applied to op(A)*B, or B*op(A), respectively. If alpha is zero no accesses are made to matrix A, and no read accesses are made to matrix B. A single precision array of dimensions (lda, k). k = m if side = 'L' or 'l', k = n if side = 'R' or 'r'. If uplo = 'U' or 'u' the leading k x k upper triangular part of the array A must contain the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If uplo = 'L' or 'l' the leading k x k lower triangular part of the array A must contain the lower triangular matrix, and the strictly upper triangular part of A is not referenced. When diag = 'U' or 'u' the diagonal elements of A are no referenced and are assumed to be unity. lda leading dimension of A. When side = 'L' or 'l', it must be at least max(1,m) and at least max(1,n) otherwise B single precision array of dimensions (ldb, n). On entry, the leading m x n part of the array contains the matrix B. It is overwritten with the transformed matrix on exit. ldb leading dimension of B. It must be at least max (1, m). Output ------ B updated according to B = alpha * op(A) * B or B = alpha * B * op(A) Reference: http://www.netlib.org/blas/strmm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStrmm(char side, char uplo, char transa, char diag, int m, int n, float alpha, String A, int lda, String B, int ldb)
public static void cublasStrsm(char side, char uplo, char transa, char diag, int m, int n, float alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb)
void cublasStrsm (char side, char uplo, char transa, char diag, int m, int n, float alpha, const float *A, int lda, float *B, int ldb) solves one of the matrix equations op(A) * X = alpha * B, or X * op(A) = alpha * B, where alpha is a single precision scalar, and X and B are m x n matrices that are composed of single precision elements. A is a unit or non-unit, upper or lower triangular matrix, and op(A) is one of op(A) = A or op(A) = transpose(A) The result matrix X overwrites input matrix B; that is, on exit the result is stored in B. Matrices A and B are stored in column major format, and lda and ldb are the leading dimensions of the two-dimensonials arrays that contain A and B, respectively. Input ----- side specifies whether op(A) appears on the left or right of X as follows: side = 'L' or 'l' indicates solve op(A) * X = alpha * B. side = 'R' or 'r' indicates solve X * op(A) = alpha * B. uplo specifies whether the matrix A is an upper or lower triangular matrix as follows: uplo = 'U' or 'u' indicates A is an upper triangular matrix. uplo = 'L' or 'l' indicates A is a lower triangular matrix. transa specifies the form of op(A) to be used in matrix multiplication as follows: If transa = 'N' or 'N', then op(A) = A. If transa = 'T', 't', 'C', or 'c', then op(A) = transpose(A). diag specifies whether or not A is a unit triangular matrix like so: if diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', then A is not assumed to be unit triangular. m specifies the number of rows of B. m must be at least zero. n specifies the number of columns of B. n must be at least zero. alpha is a single precision scalar to be multiplied with B. When alpha is zero, then A is not referenced and B need not be set before entry. A is a single precision array of dimensions (lda, k), where k is m when side = 'L' or 'l', and is n when side = 'R' or 'r'. If uplo = 'U' or 'u', the leading k x k upper triangular part of the array A must contain the upper triangular matrix and the strictly lower triangular matrix of A is not referenced. When uplo = 'L' or 'l', the leading k x k lower triangular part of the array A must contain the lower triangular matrix and the strictly upper triangular part of A is not referenced. Note that when diag = 'U' or 'u', the diagonal elements of A are not referenced, and are assumed to be unity. lda is the leading dimension of the two dimensional array containing A. When side = 'L' or 'l' then lda must be at least max(1, m), when side = 'R' or 'r' then lda must be at least max(1, n). B is a single precision array of dimensions (ldb, n). ldb must be at least max (1,m). The leading m x n part of the array B must contain the right-hand side matrix B. On exit B is overwritten by the solution matrix X. ldb is the leading dimension of the two dimensional array containing B. ldb must be at least max(1, m). Output ------ B contains the solution matrix X satisfying op(A) * X = alpha * B, or X * op(A) = alpha * B Reference: http://www.netlib.org/blas/strsm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasStrsm(char side, char uplo, char transa, char diag, int m, int n, float alpha, String A, int lda, String B, int ldb)
public static void cublasCgemm(char transa, char transb, int m, int n, int k, JCuComplex alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, JCuComplex beta, String C, int offsetC, int ldc)
void cublasCgemm (char transa, char transb, int m, int n, int k, cuComplex alpha, const cuComplex *A, int lda, const cuComplex *B, int ldb, cuComplex beta, cuComplex *C, int ldc) performs one of the matrix-matrix operations C = alpha * op(A) * op(B) + beta*C, where op(X) is one of op(X) = X or op(X) = transpose or op(X) = conjg(transpose(X)) alpha and beta are single-complex scalars, and A, B and C are matrices consisting of single-complex elements, with op(A) an m x k matrix, op(B) a k x n matrix and C an m x n matrix. Input ----- transa specifies op(A). If transa == 'N' or 'n', op(A) = A. If transa == 'T' or 't', op(A) = transpose(A). If transa == 'C' or 'c', op(A) = conjg(transpose(A)). transb specifies op(B). If transa == 'N' or 'n', op(B) = B. If transb == 'T' or 't', op(B) = transpose(B). If transb == 'C' or 'c', op(B) = conjg(transpose(B)). m number of rows of matrix op(A) and rows of matrix C. It must be at least zero. n number of columns of matrix op(B) and number of columns of C. It must be at least zero. k number of columns of matrix op(A) and number of rows of op(B). It must be at least zero. alpha single-complex scalar multiplier applied to op(A)op(B) A single-complex array of dimensions (lda, k) if transa == 'N' or 'n'), and of dimensions (lda, m) otherwise. lda leading dimension of A. When transa == 'N' or 'n', it must be at least max(1, m) and at least max(1, k) otherwise. B single-complex array of dimensions (ldb, n) if transb == 'N' or 'n', and of dimensions (ldb, k) otherwise ldb leading dimension of B. When transb == 'N' or 'n', it must be at least max(1, k) and at least max(1, n) otherwise. beta single-complex scalar multiplier applied to C. If beta is zero, C does not have to be a valid input. C single precision array of dimensions (ldc, n) ldc leading dimension of C. Must be at least max(1, m). Output ------ C updated according to C = alpha*op(A)*op(B) + beta*C Reference: http://www.netlib.org/blas/cgemm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if any of m, n, or k are < 0 CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasCgemm(char transa, char transb, int m, int n, int k, JCuComplex alpha, String A, int lda, String B, int ldb, JCuComplex beta, String C, int ldc)
public static void cublasDaxpy(int n, double alpha, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasDaxpy (int n, double alpha, const double *x, int incx, double *y, int incy) multiplies double-precision vector x by double-precision scalar alpha and adds the result to double-precision vector y; that is, it overwrites double-precision y with double-precision alpha * x + y. For i = 0 to n-1, it replaces y[ly + i * incy] with alpha * x[lx + i * incx] + y[ly + i*incy], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx; ly is defined in a similar way using incy. Input ----- n number of elements in input vectors alpha double-precision scalar multiplier x double-precision vector with n elements incx storage spacing between elements of x y double-precision vector with n elements incy storage spacing between elements of y Output ------ y double-precision result (unchanged if n <= 0) Reference: http://www.netlib.org/blas/daxpy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library was not initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDaxpy(int n, double alpha, String x, int incx, String y, int incy)
public static void cublasDcopy(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasDcopy (int n, const double *x, int incx, double *y, int incy) copies the double-precision vector x to the double-precision vector y. For i = 0 to n-1, copies x[lx + i * incx] to y[ly + i * incy], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and ly is defined in a similar way using incy. Input ----- n number of elements in input vectors x double-precision vector with n elements incx storage spacing between elements of x y double-precision vector with n elements incy storage spacing between elements of y Output ------ y contains double precision vector x Reference: http://www.netlib.org/blas/dcopy.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDcopy(int n, String x, int incx, String y, int incy)
public static void cublasDrot(int n, String x, int offsetx, int incx, String y, int offsety, int incy, double sc, double ss)
void cublasDrot (int n, double *x, int incx, double *y, int incy, double sc, double ss) multiplies a 2x2 matrix ( sc ss) with the 2xn matrix ( transpose(x) ) (-ss sc) ( transpose(y) ) The elements of x are in x[lx + i * incx], i = 0 ... n - 1, where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx, and similarly for y using ly and incy. Input ----- n number of elements in input vectors x double-precision vector with n elements incx storage spacing between elements of x y double-precision vector with n elements incy storage spacing between elements of y sc element of rotation matrix ss element of rotation matrix Output ------ x rotated vector x (unchanged if n <= 0) y rotated vector y (unchanged if n <= 0) Reference http://www.netlib.org/blas/drot.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDrot(int n, String x, int incx, String y, int incy, double sc, double ss)
public static void cublasDrotg(String sa, int offsetsa, String sb, int offsetsb, String sc, int offsetsc, String ss, int offsetss)
void cublasDrotg (double *sa, double *sb, double *sc, double *ss) constructs the Givens tranformation ( sc ss ) G = ( ) , scˆ2 + ssˆ2 = 1, (-ss sc ) which zeros the second entry of the 2-vector transpose(sa, sb). The quantity r = (+/-) sqrt (saˆ2 + sbˆ2) overwrites sa in storage. The value of sb is overwritten by a value z which allows sc and ss to be recovered by the following algorithm: if z=1 set sc = 0.0 and ss = 1.0 if abs(z) < 1 set sc = sqrt(1-zˆ2) and ss = z if abs(z) > 1 set sc = 1/z and ss = sqrt(1-scˆ2) The function drot (n, x, incx, y, incy, sc, ss) normally is called next to apply the transformation to a 2 x n matrix. Input ----- sa double-precision scalar sb double-precision scalar Output ------ sa double-precision r sb double-precision z sc double-precision result ss double-precision result Reference: http://www.netlib.org/blas/drotg.f This function does not set any error status.
public static void cublasDrotg(String sa, String sb, String sc, String ss)
public static void cublasDscal(int n, double alpha, String x, int offsetx, int incx)
void cublasDscal (int n, double alpha, double *x, int incx) replaces double-precision vector x with double-precision alpha * x. For i = 0 to n-1, it replaces x[lx + i * incx] with alpha * x[lx + i * incx], where lx = 1 if incx >= 0, else lx = 1 + (1 - n) * incx. Input ----- n number of elements in input vector alpha double-precision scalar multiplier x double-precision vector with n elements incx storage spacing between elements of x Output ------ x double-precision result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/dscal.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library was not initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDscal(int n, double alpha, String x, int incx)
public static void cublasDswap(int n, String x, int offsetx, int incx, String y, int offsety, int incy)
void cublasDswap (int n, double *x, int incx, double *y, int incy) replaces double-precision vector x with double-precision alpha * x. For i = 0 to n - 1, it replaces x[ix + i * incx] with alpha * x[ix + i * incx], where ix = 1 if incx >= 0, else ix = 1 + (1 - n) * incx. Input ----- n number of elements in input vectors alpha double-precision scalar multiplier x double-precision vector with n elements incx storage spacing between elements of x Output ------ x double precision result (unchanged if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/dswap.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDswap(int n, String x, int incx, String y, int incy)
public static int cublasIdamax(int n, String x, int offsetx, int incx)
int idamax (int n, const double *x, int incx) finds the smallest index of the maximum magnitude element of double- precision vector x; that is, the result is the first i, i = 0 to n - 1, that maximizes abs(x[1 + i * incx])). Input ----- n number of elements in input vector x double-precision vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: http://www.netlib.org/blas/idamax.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIdamax(int n, String x, int incx)
public static int cublasIdamin(int n, String x, int offsetx, int incx)
int idamin (int n, const double *x, int incx) finds the smallest index of the minimum magnitude element of double- precision vector x; that is, the result is the first i, i = 0 to n - 1, that minimizes abs(x[1 + i * incx])). Input ----- n number of elements in input vector x double-precision vector with n elements incx storage spacing between elements of x Output ------ returns the smallest index (0 if n <= 0 or incx <= 0) Reference: http://www.netlib.org/scilib/blass.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static int cublasIdamin(int n, String x, int incx)
public static void cublasDgemv(char trans, int m, int n, double alpha, String A, int offsetA, int lda, String x, int offsetx, int incx, double beta, String y, int offsety, int incy)
cublasDgemv (char trans, int m, int n, double alpha, const double *A, int lda, const double *x, int incx, double beta, double *y, int incy) performs one of the matrix-vector operations y = alpha * op(A) * x + beta * y, where op(A) is one of op(A) = A or op(A) = transpose(A) where alpha and beta are double precision scalars, x and y are double precision vectors, and A is an m x n matrix consisting of double precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array in which A is stored. Input ----- trans specifies op(A). If transa = 'n' or 'N', op(A) = A. If trans = trans = 't', 'T', 'c', or 'C', op(A) = transpose(A) m specifies the number of rows of the matrix A. m must be at least zero. n specifies the number of columns of the matrix A. n must be at least zero. alpha double precision scalar multiplier applied to op(A). A double precision array of dimensions (lda, n) if trans = 'n' or 'N'), and of dimensions (lda, m) otherwise. lda must be at least max(1, m) and at least max(1, n) otherwise. lda leading dimension of two-dimensional array used to store matrix A x double precision array of length at least (1 + (n - 1) * abs(incx)) when trans = 'N' or 'n' and at least (1 + (m - 1) * abs(incx)) otherwise. incx specifies the storage spacing between elements of x. incx must not be zero. beta double precision scalar multiplier applied to vector y. If beta is zero, y is not read. y double precision array of length at least (1 + (m - 1) * abs(incy)) when trans = 'N' or 'n' and at least (1 + (n - 1) * abs(incy)) otherwise. incy specifies the storage spacing between elements of x. incx must not be zero. Output ------ y updated according to alpha * op(A) * x + beta * y Reference: http://www.netlib.org/blas/dgemv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n are < 0, or if incx or incy == 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDgemv(char trans, int m, int n, double alpha, String A, int lda, String x, int incx, double beta, String y, int incy)
public static void cublasDger(int m, int n, double alpha, String x, int offsetx, int incx, String y, int offsety, int incy, String A, int offsetA, int lda)
cublasDger (int m, int n, double alpha, const double *x, int incx, const double *y, int incy, double *A, int lda) performs the symmetric rank 1 operation A = alpha * x * transpose(y) + A, where alpha is a double precision scalar, x is an m element double precision vector, y is an n element double precision vector, and A is an m by n matrix consisting of double precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array used to store A. Input ----- m specifies the number of rows of the matrix A. It must be at least zero. n specifies the number of columns of the matrix A. It must be at least zero. alpha double precision scalar multiplier applied to x * transpose(y) x double precision array of length at least (1 + (m - 1) * abs(incx)) incx specifies the storage spacing between elements of x. incx must not be zero. y double precision array of length at least (1 + (n - 1) * abs(incy)) incy specifies the storage spacing between elements of y. incy must not be zero. A double precision array of dimensions (lda, n). lda leading dimension of two-dimensional array used to store matrix A Output ------ A updated according to A = alpha * x * transpose(y) + A Reference: http://www.netlib.org/blas/dger.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, incx == 0, incy == 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDger(int m, int n, double alpha, String x, int incx, String y, int incy, String A, int lda)
public static void cublasDsyr(char uplo, int n, double alpha, String x, int offsetx, int incx, String A, int offsetA, int lda)
void cublasDsyr (char uplo, int n, double alpha, const double *x, int incx, double *A, int lda) performs the symmetric rank 1 operation A = alpha * x * transpose(x) + A, where alpha is a double precision scalar, x is an n element double precision vector and A is an n x n symmetric matrix consisting of double precision elements. Matrix A is stored in column major format, and lda is the leading dimension of the two-dimensional array containing A. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo = 'U' or 'u', then only the upper triangular part of A may be referenced. If uplo = 'L' or 'l', then only the lower triangular part of A may be referenced. n specifies the number of rows and columns of the matrix A. It must be at least 0. alpha double precision scalar multiplier applied to x * transpose(x) x double precision array of length at least (1 + (n - 1) * abs(incx)) incx specifies the storage spacing between elements of x. incx must not be zero. A double precision array of dimensions (lda, n). If uplo = 'U' or 'u', then A must contain the upper triangular part of a symmetric matrix, and the strictly lower triangular part is not referenced. If uplo = 'L' or 'l', then A contains the lower triangular part of a symmetric matrix, and the strictly upper triangular part is not referenced. lda leading dimension of the two-dimensional array containing A. lda must be at least max(1, n). Output ------ A updated according to A = alpha * x * transpose(x) + A Reference: http://www.netlib.org/blas/dsyr.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0, or incx == 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDsyr(char uplo, int n, double alpha, String x, int incx, String A, int lda)
public static void cublasDtrsv(char uplo, char trans, char diag, int n, String A, int offsetA, int lda, String x, int offsetx, int incx)
void cublasDtrsv (char uplo, char trans, char diag, int n, const double *A, int lda, double *x, int incx) solves a system of equations op(A) * x = b, where op(A) is either A or transpose(A). b and x are double precision vectors consisting of n elements, and A is an n x n matrix composed of a unit or non-unit, upper or lower triangular matrix. Matrix A is stored in column major format, and lda is the leading dimension of the two-diemnsional array containing A. No test for singularity or near-singularity is included in this function. Such tests must be performed before calling this function. Input ----- uplo specifies whether the matrix data is stored in the upper or the lower triangular part of array A. If uplo = 'U' or 'u', then only the upper triangular part of A may be referenced. If uplo = 'L' or 'l', then only the lower triangular part of A may be referenced. trans specifies op(A). If transa = 'n' or 'N', op(A) = A. If transa = 't', 'T', 'c', or 'C', op(A) = transpose(A) diag specifies whether or not A is a unit triangular matrix like so: if diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', then A is not assumed to be unit triangular. n specifies the number of rows and columns of the matrix A. It must be at least 0. In the current implementation n must be <= 2040. A is a double precision array of dimensions (lda, n). If uplo = 'U' or 'u', then A must contains the upper triangular part of a symmetric matrix, and the strictly lower triangular parts is not referenced. If uplo = 'L' or 'l', then A contains the lower triangular part of a symmetric matrix, and the strictly upper triangular part is not referenced. lda is the leading dimension of the two-dimensional array containing A. lda must be at least max(1, n). x double precision array of length at least (1 + (n - 1) * abs(incx)). On entry, x contains the n element right-hand side vector b. On exit, it is overwritten with the solution vector x. incx specifies the storage spacing between elements of x. incx must not be zero. Output ------ x updated to contain the solution vector x that solves op(A) * x = b. Reference: http://www.netlib.org/blas/dtrsv.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if incx == 0 or if n < 0 or n > 2040 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDtrsv(char uplo, char trans, char diag, int n, String A, int lda, String x, int incx)
public static void cublasDgemm(char transa, char transb, int m, int n, int k, double alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, double beta, String C, int offsetC, int ldc)
void cublasDgemm (char transa, char transb, int m, int n, int k, double alpha, const double *A, int lda, const double *B, int ldb, double beta, double *C, int ldc) computes the product of matrix A and matrix B, multiplies the result by scalar alpha, and adds the sum to the product of matrix C and scalar beta. It performs one of the matrix-matrix operations: C = alpha * op(A) * op(B) + beta * C, where op(X) = X or op(X) = transpose(X), and alpha and beta are double-precision scalars. A, B and C are matrices consisting of double-precision elements, with op(A) an m x k matrix, op(B) a k x n matrix, and C an m x n matrix. Matrices A, B, and C are stored in column-major format, and lda, ldb, and ldc are the leading dimensions of the two-dimensional arrays containing A, B, and C. Input ----- transa specifies op(A). If transa == 'N' or 'n', op(A) = A. If transa == 'T', 't', 'C', or 'c', op(A) = transpose(A). transb specifies op(B). If transb == 'N' or 'n', op(B) = B. If transb == 'T', 't', 'C', or 'c', op(B) = transpose(B). m number of rows of matrix op(A) and rows of matrix C; m must be at least zero. n number of columns of matrix op(B) and number of columns of C; n must be at least zero. k number of columns of matrix op(A) and number of rows of op(B); k must be at least zero. alpha double-precision scalar multiplier applied to op(A) * op(B). A double-precision array of dimensions (lda, k) if transa == 'N' or 'n', and of dimensions (lda, m) otherwise. If transa == 'N' or 'n' lda must be at least max(1, m), otherwise lda must be at least max(1, k). lda leading dimension of two-dimensional array used to store matrix A. B double-precision array of dimensions (ldb, n) if transb == 'N' or 'n', and of dimensions (ldb, k) otherwise. If transb == 'N' or 'n' ldb must be at least max (1, k), otherwise ldb must be at least max(1, n). ldb leading dimension of two-dimensional array used to store matrix B. beta double-precision scalar multiplier applied to C. If zero, C does not have to be a valid input C double-precision array of dimensions (ldc, n); ldc must be at least max(1, m). ldc leading dimension of two-dimensional array used to store matrix C. Output ------ C updated based on C = alpha * op(A)*op(B) + beta * C. Reference: http://www.netlib.org/blas/sgemm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS was not initialized CUBLAS_STATUS_INVALID_VALUE if m < 0, n < 0, or k < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDgemm(char transa, char transb, int m, int n, int k, double alpha, String A, int lda, String B, int ldb, double beta, String C, int ldc)
public static void cublasDtrsm(char side, char uplo, char transa, char diag, int m, int n, double alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb)
void cublasDtrsm (char side, char uplo, char transa, char diag, int m, int n, double alpha, const double *A, int lda, double *B, int ldb) solves one of the matrix equations op(A) * X = alpha * B, or X * op(A) = alpha * B, where alpha is a double precision scalar, and X and B are m x n matrices that are composed of double precision elements. A is a unit or non-unit, upper or lower triangular matrix, and op(A) is one of op(A) = A or op(A) = transpose(A) The result matrix X overwrites input matrix B; that is, on exit the result is stored in B. Matrices A and B are stored in column major format, and lda and ldb are the leading dimensions of the two-dimensonials arrays that contain A and B, respectively. Input ----- side specifies whether op(A) appears on the left or right of X as follows: side = 'L' or 'l' indicates solve op(A) * X = alpha * B. side = 'R' or 'r' indicates solve X * op(A) = alpha * B. uplo specifies whether the matrix A is an upper or lower triangular matrix as follows: uplo = 'U' or 'u' indicates A is an upper triangular matrix. uplo = 'L' or 'l' indicates A is a lower triangular matrix. transa specifies the form of op(A) to be used in matrix multiplication as follows: If transa = 'N' or 'N', then op(A) = A. If transa = 'T', 't', 'C', or 'c', then op(A) = transpose(A). diag specifies whether or not A is a unit triangular matrix like so: if diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', then A is not assumed to be unit triangular. m specifies the number of rows of B. m must be at least zero. n specifies the number of columns of B. n must be at least zero. alpha is a double precision scalar to be multiplied with B. When alpha is zero, then A is not referenced and B need not be set before entry. A is a double precision array of dimensions (lda, k), where k is m when side = 'L' or 'l', and is n when side = 'R' or 'r'. If uplo = 'U' or 'u', the leading k x k upper triangular part of the array A must contain the upper triangular matrix and the strictly lower triangular matrix of A is not referenced. When uplo = 'L' or 'l', the leading k x k lower triangular part of the array A must contain the lower triangular matrix and the strictly upper triangular part of A is not referenced. Note that when diag = 'U' or 'u', the diagonal elements of A are not referenced, and are assumed to be unity. lda is the leading dimension of the two dimensional array containing A. When side = 'L' or 'l' then lda must be at least max(1, m), when side = 'R' or 'r' then lda must be at least max(1, n). B is a double precision array of dimensions (ldb, n). ldb must be at least max (1,m). The leading m x n part of the array B must contain the right-hand side matrix B. On exit B is overwritten by the solution matrix X. ldb is the leading dimension of the two dimensional array containing B. ldb must be at least max(1, m). Output ------ B contains the solution matrix X satisfying op(A) * X = alpha * B, or X * op(A) = alpha * B Reference: http://www.netlib.org/blas/dtrsm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDtrsm(char side, char uplo, char transa, char diag, int m, int n, double alpha, String A, int lda, String B, int ldb)
public static void cublasDtrmm(char side, char uplo, char transa, char diag, int m, int n, double alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb)
void cublasDtrmm (char side, char uplo, char transa, char diag, int m, int n, double alpha, const double *A, int lda, const double *B, int ldb) performs one of the matrix-matrix operations B = alpha * op(A) * B, or B = alpha * B * op(A) where alpha is a double-precision scalar, B is an m x n matrix composed of double precision elements, and A is a unit or non-unit, upper or lower, triangular matrix composed of double precision elements. op(A) is one of op(A) = A or op(A) = transpose(A) Matrices A and B are stored in column major format, and lda and ldb are the leading dimensions of the two-dimensonials arrays that contain A and B, respectively. Input ----- side specifies whether op(A) multiplies B from the left or right. If side = 'L' or 'l', then B = alpha * op(A) * B. If side = 'R' or 'r', then B = alpha * B * op(A). uplo specifies whether the matrix A is an upper or lower triangular matrix. If uplo = 'U' or 'u', A is an upper triangular matrix. If uplo = 'L' or 'l', A is a lower triangular matrix. transa specifies the form of op(A) to be used in the matrix multiplication. If transa = 'N' or 'n', then op(A) = A. If transa = 'T', 't', 'C', or 'c', then op(A) = transpose(A). diag specifies whether or not A is unit triangular. If diag = 'U' or 'u', A is assumed to be unit triangular. If diag = 'N' or 'n', A is not assumed to be unit triangular. m the number of rows of matrix B. m must be at least zero. n the number of columns of matrix B. n must be at least zero. alpha double precision scalar multiplier applied to op(A)*B, or B*op(A), respectively. If alpha is zero no accesses are made to matrix A, and no read accesses are made to matrix B. A double precision array of dimensions (lda, k). k = m if side = 'L' or 'l', k = n if side = 'R' or 'r'. If uplo = 'U' or 'u' the leading k x k upper triangular part of the array A must contain the upper triangular matrix, and the strictly lower triangular part of A is not referenced. If uplo = 'L' or 'l' the leading k x k lower triangular part of the array A must contain the lower triangular matrix, and the strictly upper triangular part of A is not referenced. When diag = 'U' or 'u' the diagonal elements of A are no referenced and are assumed to be unity. lda leading dimension of A. When side = 'L' or 'l', it must be at least max(1,m) and at least max(1,n) otherwise B double precision array of dimensions (ldb, n). On entry, the leading m x n part of the array contains the matrix B. It is overwritten with the transformed matrix on exit. ldb leading dimension of B. It must be at least max (1, m). Output ------ B updated according to B = alpha * op(A) * B or B = alpha * B * op(A) Reference: http://www.netlib.org/blas/dtrmm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDtrmm(char side, char uplo, char transa, char diag, int m, int n, double alpha, String A, int lda, String B, int ldb)
public static void cublasDsymm(char side, char uplo, int m, int n, double alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, double beta, String C, int offsetC, int ldc)
void cublasDsymm (char side, char uplo, int m, int n, double alpha, const double *A, int lda, const double *B, int ldb, double beta, double *C, int ldc); performs one of the matrix-matrix operations C = alpha * A * B + beta * C, or C = alpha * B * A + beta * C, where alpha and beta are double precision scalars, A is a symmetric matrix consisting of double precision elements and stored in either lower or upper storage mode, and B and C are m x n matrices consisting of double precision elements. Input ----- side specifies whether the symmetric matrix A appears on the left side hand side or right hand side of matrix B, as follows. If side == 'L' or 'l', then C = alpha * A * B + beta * C. If side = 'R' or 'r', then C = alpha * B * A + beta * C. uplo specifies whether the symmetric matrix A is stored in upper or lower storage mode, as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be referenced, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. m specifies the number of rows of the matrix C, and the number of rows of matrix B. It also specifies the dimensions of symmetric matrix A when side == 'L' or 'l'. m must be at least zero. n specifies the number of columns of the matrix C, and the number of columns of matrix B. It also specifies the dimensions of symmetric matrix A when side == 'R' or 'r'. n must be at least zero. alpha double precision scalar multiplier applied to A * B, or B * A A double precision array of dimensions (lda, ka), where ka is m when side == 'L' or 'l' and is n otherwise. If side == 'L' or 'l' the leading m x m part of array A must contain the symmetric matrix, such that when uplo == 'U' or 'u', the leading m x m part stores the upper triangular part of the symmetric matrix, and the strictly lower triangular part of A is not referenced, and when uplo == 'U' or 'u', the leading m x m part stores the lower triangular part of the symmetric matrix and the strictly upper triangular part is not referenced. If side == 'R' or 'r' the leading n x n part of array A must contain the symmetric matrix, such that when uplo == 'U' or 'u', the leading n x n part stores the upper triangular part of the symmetric matrix and the strictly lower triangular part of A is not referenced, and when uplo == 'U' or 'u', the leading n x n part stores the lower triangular part of the symmetric matrix and the strictly upper triangular part is not referenced. lda leading dimension of A. When side == 'L' or 'l', it must be at least max(1, m) and at least max(1, n) otherwise. B double precision array of dimensions (ldb, n). On entry, the leading m x n part of the array contains the matrix B. ldb leading dimension of B. It must be at least max (1, m). beta double precision scalar multiplier applied to C. If beta is zero, C does not have to be a valid input C double precision array of dimensions (ldc, n) ldc leading dimension of C. Must be at least max(1, m) Output ------ C updated according to C = alpha * A * B + beta * C, or C = alpha * B * A + beta * C Reference: http://www.netlib.org/blas/dsymm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if m or n are < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDsymm(char side, char uplo, int m, int n, double alpha, String A, int lda, String B, int ldb, double beta, String C, int ldc)
public static void cublasDsyrk(char uplo, char trans, int n, int k, double alpha, String A, int offsetA, int lda, double beta, String C, int offsetC, int ldc)
void cublasDsyrk (char uplo, char trans, int n, int k, double alpha, const double *A, int lda, double beta, double *C, int ldc) performs one of the symmetric rank k operations C = alpha * A * transpose(A) + beta * C, or C = alpha * transpose(A) * A + beta * C. Alpha and beta are double precision scalars. C is an n x n symmetric matrix consisting of double precision elements and stored in either lower or upper storage mode. A is a matrix consisting of double precision elements with dimension of n x k in the first case, and k x n in the second case. Input ----- uplo specifies whether the symmetric matrix C is stored in upper or lower storage mode as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be referenced, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. trans specifies the operation to be performed. If trans == 'N' or 'n', C = alpha * transpose(A) + beta * C. If trans == 'T', 't', 'C', or 'c', C = transpose(A) * A + beta * C. n specifies the number of rows and the number columns of matrix C. If trans == 'N' or 'n', n specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', n specifies the columns of matrix A. n must be at least zero. k If trans == 'N' or 'n', k specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', k specifies the number of rows of matrix A. k must be at least zero. alpha double precision scalar multiplier applied to A * transpose(A) or transpose(A) * A. A double precision array of dimensions (lda, ka), where ka is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array A must contain the matrix A, otherwise the leading k x n part of the array must contains the matrix A. lda leading dimension of A. When trans == 'N' or 'n' then lda must be at least max(1, n). Otherwise lda must be at least max(1, k). beta double precision scalar multiplier applied to C. If beta izs zero, C does not have to be a valid input C double precision array of dimensions (ldc, n). If uplo = 'U' or 'u', the leading n x n triangular part of the array C must contain the upper triangular part of the symmetric matrix C and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of C is overwritten by the upper trinagular part of the updated matrix. If uplo = 'L' or 'l', the leading n x n triangular part of the array C must contain the lower triangular part of the symmetric matrix C and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of C is overwritten by the lower trinagular part of the updated matrix. ldc leading dimension of C. It must be at least max(1, n). Output ------ C updated according to C = alpha * A * transpose(A) + beta * C, or C = alpha * transpose(A) * A + beta * C Reference: http://www.netlib.org/blas/dsyrk.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0 or k < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDsyrk(char uplo, char trans, int n, int k, double alpha, String A, int lda, double beta, String C, int ldc)
public static void cublasDsyr2k(char uplo, char trans, int n, int k, double alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, double beta, String C, int offsetC, int ldc)
void cublasDsyr2k (char uplo, char trans, int n, int k, double alpha, const double *A, int lda, const double *B, int ldb, double beta, double *C, int ldc) performs one of the symmetric rank 2k operations C = alpha * A * transpose(B) + alpha * B * transpose(A) + beta * C, or C = alpha * transpose(A) * B + alpha * transpose(B) * A + beta * C. Alpha and beta are double precision scalars. C is an n x n symmetric matrix consisting of double precision elements and stored in either lower or upper storage mode. A and B are matrices consisting of double precision elements with dimension of n x k in the first case, and k x n in the second case. Input ----- uplo specifies whether the symmetric matrix C is stored in upper or lower storage mode, as follows. If uplo == 'U' or 'u', only the upper triangular part of the symmetric matrix is to be referenced, and the elements of the strictly lower triangular part are to be infered from those in the upper triangular part. If uplo == 'L' or 'l', only the lower triangular part of the symmetric matrix is to be references, and the elements of the strictly upper triangular part are to be infered from those in the lower triangular part. trans specifies the operation to be performed. If trans == 'N' or 'n', C = alpha * A * transpose(B) + alpha * B * transpose(A) + beta * C, If trans == 'T', 't', 'C', or 'c', C = alpha * transpose(A) * B + alpha * transpose(B) * A + beta * C. n specifies the number of rows and the number columns of matrix C. If trans == 'N' or 'n', n specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', n specifies the columns of matrix A. n must be at least zero. k If trans == 'N' or 'n', k specifies the number of rows of matrix A. If trans == 'T', 't', 'C', or 'c', k specifies the number of rows of matrix A. k must be at least zero. alpha double precision scalar multiplier. A double precision array of dimensions (lda, ka), where ka is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array A must contain the matrix A, otherwise the leading k x n part of the array must contain the matrix A. lda leading dimension of A. When trans == 'N' or 'n' then lda must be at least max(1, n). Otherwise lda must be at least max(1,k). B double precision array of dimensions (lda, kb), where kb is k when trans == 'N' or 'n', and is n otherwise. When trans == 'N' or 'n', the leading n x k part of array B must contain the matrix B, otherwise the leading k x n part of the array must contain the matrix B. ldb leading dimension of N. When trans == 'N' or 'n' then ldb must be at least max(1, n). Otherwise ldb must be at least max(1, k). beta double precision scalar multiplier applied to C. If beta is zero, C does not have to be a valid input. C double precision array of dimensions (ldc, n). If uplo == 'U' or 'u', the leading n x n triangular part of the array C must contain the upper triangular part of the symmetric matrix C and the strictly lower triangular part of C is not referenced. On exit, the upper triangular part of C is overwritten by the upper trinagular part of the updated matrix. If uplo == 'L' or 'l', the leading n x n triangular part of the array C must contain the lower triangular part of the symmetric matrix C and the strictly upper triangular part of C is not referenced. On exit, the lower triangular part of C is overwritten by the lower trinagular part of the updated matrix. ldc leading dimension of C. Must be at least max(1, n). Output ------ C updated according to alpha*A*transpose(B) + alpha*B*transpose(A) + beta*C or alpha*transpose(A)*B + alpha*transpose(B)*A + beta*C Reference: http://www.netlib.org/blas/dsyr2k.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if n < 0 or k < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasDsyr2k(char uplo, char trans, int n, int k, double alpha, String A, int lda, String B, int ldb, double beta, String C, int ldc)
public static void cublasZgemm(char transa, char transb, int m, int n, int k, JCuDoubleComplex alpha, String A, int offsetA, int lda, String B, int offsetB, int ldb, JCuDoubleComplex beta, String C, int offsetC, int ldc)
void cublasZgemm (char transa, char transb, int m, int n, int k, cuDoubleComplex alpha, const cuDoubleComplex *A, int lda, const cuDoubleComplex *B, int ldb, cuDoubleComplex beta, cuDoubleComplex *C, int ldc) zgemm performs one of the matrix-matrix operations C = alpha * op(A) * op(B) + beta*C, where op(X) is one of op(X) = X or op(X) = transpose or op(X) = conjg(transpose(X)) alpha and beta are double-complex scalars, and A, B and C are matrices consisting of double-complex elements, with op(A) an m x k matrix, op(B) a k x n matrix and C an m x n matrix. Input ----- transa specifies op(A). If transa == 'N' or 'n', op(A) = A. If transa == 'T' or 't', op(A) = transpose(A). If transa == 'C' or 'c', op(A) = conjg(transpose(A)). transb specifies op(B). If transa == 'N' or 'n', op(B) = B. If transb == 'T' or 't', op(B) = transpose(B). If transb == 'C' or 'c', op(B) = conjg(transpose(B)). m number of rows of matrix op(A) and rows of matrix C. It must be at least zero. n number of columns of matrix op(B) and number of columns of C. It must be at least zero. k number of columns of matrix op(A) and number of rows of op(B). It must be at least zero. alpha double-complex scalar multiplier applied to op(A)op(B) A double-complex array of dimensions (lda, k) if transa == 'N' or 'n'), and of dimensions (lda, m) otherwise. lda leading dimension of A. When transa == 'N' or 'n', it must be at least max(1, m) and at least max(1, k) otherwise. B double-complex array of dimensions (ldb, n) if transb == 'N' or 'n', and of dimensions (ldb, k) otherwise ldb leading dimension of B. When transb == 'N' or 'n', it must be at least max(1, k) and at least max(1, n) otherwise. beta double-complex scalar multiplier applied to C. If beta is zero, C does not have to be a valid input. C double precision array of dimensions (ldc, n) ldc leading dimension of C. Must be at least max(1, m). Output ------ C updated according to C = alpha*op(A)*op(B) + beta*C Reference: http://www.netlib.org/blas/zgemm.f Error status for this function can be retrieved via cublasGetError(). Error Status ------------ CUBLAS_STATUS_NOT_INITIALIZED if CUBLAS library has not been initialized CUBLAS_STATUS_INVALID_VALUE if any of m, n, or k are < 0 CUBLAS_STATUS_ARCH_MISMATCH if invoked on device without DP support CUBLAS_STATUS_EXECUTION_FAILED if function failed to launch on GPU
public static void cublasZgemm(char transa, char transb, int m, int n, int k, JCuDoubleComplex alpha, String A, int lda, String B, int ldb, JCuDoubleComplex beta, String C, int ldc)
|
Parallel Colt 0.7.2 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |