ATP-Net
ATP-Net
Abstract
1 Introduction
Conventional image sampling systems have to satisfy the classical Nyquist-Shannon sampling theorem with a sampling rate no less than twice the bandwidth of the signal, in order to guarantee any original continuous signal can be completely reconstructed from its samples. However, samples are usually redundant and certain compression operation, which is conventionally computationally complex, is indispensable for removal of redundancy. In the context of some band-restricted applications, e.g., massive data storage, wireless transmission, among others, limitation of the Nyquist-Shannon theorem becomes a major problem.
The emerging technology of compressive sensing (CS) gives rise to a new paradigm to capture and represent compressible signals, especially for image acquisition and reconstruction, at a rate significantly below the Nyquist rate, i.e., CS theory enables a signal to be recovered from many fewer measurements than that suggested by the Nyquist-Shannon sampling theorem when the signal is sparse in some domain. CS characterizes in implementing the sampling and compression processes jointly, by employs nonadaptive linear projections to preserve the structure of the signal, which is then reconstructed from these projections using an optimization process. In this way, CS is capable of breaking through the limitation of the Nyquist-Shannon theorem.
Yet another problem remains in that conventional compression methods are usually time-consuming, and beyond the processing capacity of data acquisition devices in some image processing applications, which have relatively poor hardware/software configuration and are powered by battery. How to design an appropriate sampling matrix is one of the major problems in the current compressed sensing study. In traditional compressed sensing way, random matrices are often used as the measurement matrices, such as random Gaussian matrix, random Bernoulli matrix, which meet the Restricted Isometry Property (RIP) with a large probability.
Moreover, the traditional sampling matrix cannot actually express the structural characteristics of the signal, and another important problem of the random matrix is its difficulty to be constructed in hardware. Additionally, the difficulty in storing these matrices and certain physical constraints on the measurement process makes it challenging to realize compressed sensing in practice. When multiplying arbitrary matrices with signal vectors of high dimension, the lack of any fast matrix multiplication algorithm results in high computational cost. In this paper, we design a deep neural network (DNN) to automatically learn the required sampling matrix, which is a ternary matrix (+1, 0, -1) to reduce storage and computational costs.
Recently, deep learning demonstrates superior performance in computer vision[]. A great deal of image Compressed Sensing reconstruction methods have been developed, such as stacked denoising autoencoder (SDA)[], non-iterative reconstruction using CNN (ReconNet)[], and iterative shrinkage-thresholding algorithm based network (ISTA-Net)[], etc. Most traditional methods exploit some structured sparsity as an image prior and then solve a sparsity-regularized optimization problem in an iterative fashion such as basis pursuit (BP), total variation (TV), and group sparse representation (GSR), etc. However, the traditional methods usually suffer highly computational complexity, and they also encounter the challenges of choosing optimal transform and tuning parameters in their solvers. The non-iterative algorithms, with deep learning method as an example, can dramatically reduce time complexity, while achieving impressive reconstruction performance, and outperform optimization-based algorithms, e.g., BP, TV, GSR, etc.
In this paper, we propose a deep learning approach to learn an optimized ternary sampling matrix and a non-linear reconstruction mapping from measurements to the original signal. Inspired by the block compressed sensing smooth projection Landweber algorithm (BCS-SPL), making up of compressed sampling, initial reconstruction and nonlinear signal reconstruction, an end-to-end deep neural network has been designed to implement the corresponding processes. Firstly, in order to design a ternary sampling matrix with low computation cost and small storage, sparse and binary constraints are applied within the proposed network architecture, by means of which the sampling matrix can be learned automatically while avoiding arduous artificial designs. Secondly, the reconstruction network consists of two parts, i.e., initial reconstruction network and a deep reconstruction network. The former uses a convolution layer, a deep separable convolution (DSC) for initial recovery, a pixel shuffle layer, which combines image block through block-based sampling to generate the initial reconstructed image. The latter further refines the initial reconstructed image to get better reconstruction quality by using dilated convolution and deep residual network. Experimental results offer a persuasive proof for the performance and the feasibility of our idea.
2 Related Work
Instead of sampling the entire signal, the CS theory samples a signal
To reduce the computational complexity, some greedy algorithms have been proposed for CS reconstruction []. The greedy iterative algorithm aims to find each non-zero coefficient through local optimization in each iteration, including Matching Pursuit (MP) [], Orthogonal Matching Pursuit (OMP) [], Stagewise Orthogonal Matching Pursuit (StOMP) [] and other various improved algorithms of OMP []. Compared to convex-programming approaches, methods of this type exhibit relatively low computational complexity at the cost of lower reconstruction quality.
In the field of image CS, some existing works have established more complex models by exploring image priors []. LI C et al. [] proposed the total variation (TV) regularized constraint, which is used to replace the sparsity-based one for enhancing the local smoothness. Zhang J et al. [] further proposed group sparse representation (GSR) for image compressed sensing recovery by enforcing image sparsity and non-local self-similarity simultaneously.
For sparse representation-based image CS reconstruction methods, a sampling matrix should be established first, which should be a random measurement matrix. But random matrix usually suffer a lot problems such as high computation cost, vast storage and uncertain reconstruction qualities.
Peng H and Mi Y et al. [] proposed the P-tensor Product CS, reducing the dimension of sampling matrix, to decrease its storage requirement, which is at the expense of more computing resources. Lu W et al. [] proposed a binary method, to determine the distribution of nonzero elements of binary sampling matrix by minimizing the coherence parameters with a greedy method. Binarized matrix can obviously alleviate the storage pressure through low-bit storage, and convert multiplication into addition and subtraction through binarization (+1, -1) to mitigate the burden of calculation. But these matrices are still data independent, ignoring the characteristics of the data. Nguyen D M et al. [] proposed a solution, utilizing deep learning, to obtain very sparse ternary projections, which can be efficiently exploited in hardware implementations for CS. However, this three-valued method introduces negative influence to the recovery performance, resulting in a poor recovery effect, that is far below that the current state-of-the-art method.
The sensing matrix
In the past decade, a lot of deep learning-based methods have also been developed for image CS reconstruction. Mousavi et al. [] used stacked denoising autoencoder (SDA) to capture statistical dependencies and learned the representation from training data. His method improves image recovery performance as well, with the reconstruction quality comparable to traditional state-of-the-art algorithms, while significantly cut the budget of reconstruction time, which opens up a new direction for CS. Kulkarni et al. [] proposed a deep learning-based CS reconstruction model, named ReconNet, by employment of Convolutional Neural Network (CNN) instead of SDA's fully connected network to reduce the number of parameters and computational complexity.
By combining traditional algorithms with deep learning, and using iterative shrinkage-thresholding algorithm (ISTA), Zhang J et al. proposed ISTA-Net [] to reconstruct images. He makes the use of nonlinear operations to solve the near-end sampling transformation, to reduce the computing time, and to improve the peak signal-to-noise ratio of the restored image in some specific image areas. whereas the sampling matrix of ISTA-Net is still a random matrix that is independent of the original signal.
Sampling matrices of aforementioned methods are stored as floating-point number, which means tremendous storage and calculation pressure for hardware implementation. W Shi et al. proposed CSNet [], which uses a deep residual convolutional neural network to realize CS, and to construct a binary sampling matrix. Although these techniques have partially improved the quality of restoration in the initial restoration stage of block compressed sensing, only image blocks are spliced and the relationship among blocks is neglected, resulting in a longer convergence time during model training.
As discussed above, CS framework consists of two parts, i.e., a sampling network and a reconstruction network, corresponding to encoding network and decoding network, respectively. The problem with the encoding side is that the sampling matrix is data-independent, and the inability to express the structural characteristics of the data affects the quality of recovery. In addition, the sampling matrix suffers from its own high computation cost as well as vast storage. With respect to the problem with the decoding network, how to reduce the block effect and further improve the quality of recovery for those who use block compressed sensing, is always concerned.
Therefore, in this paper, an end-to-end deep learning-based compressed sensing algorithm is proposed, which involves both the encoding network and the decoding network. On the encoding side, we combine binarization technology with pruning technology based on the attention mechanism to make the sampling matrix tri-valued. The decoding end consists of two networks, an initial reconstruction network using deep separable convolution and a deep reconstruction network using residual connection and dilated convolution. With such a structural design, we can mitigate the block effect and improve the quality of reconstruction.
3 Attention-based Ternary Projection for Compressed Sensing
In this section, the proposed end-to-end deep learning framework for compressed sensing, namely Attention-based ternary projection network (ATP-Net), is described in detail.
The ATP-Net architecture consists of three parts, the sampling network, the initial reconstruction network and the deep reconstruction network, with block-based sampling been adopted to improve sampling efficiency and reduce hardware overhead, as illustrated by Fig.1. This method provides a general framework, which can utilize the attention mechanism to obtain the ternary matrix and realize the image reconstruction. As a result, there is only a small loss in reconstruction performance compared to full-precision sampling, accompanied by a reduction of the hardware pressure, i.e., storage pressure, calculation pressure, of the sampling matrix.
3.1 Sampling Network
3.1.1 Block-based Compressed Sampling
Unlike traditional block-based compressed sensing (BCS), which samples input original images with non-overlapping blocks, ATP-Net employs a convolutional layer instead of a fully connected layer to spatial sampling of the input images. Mathematically, the BCS sampling can be expressed as
$$
y={{\Phi }^{big}}\otimes x
\tag{3}
$$
where $\otimes$represents convolution operation and the sampling matrix
Since different sampling rates can be achieved by changing the output channel of the sampling network, which is a convolutional layer, the output channels of the sampling layer can be calculated as follows $$ out_channels=blocksizeblocksizesubrate*image_channel \tag{5} $$
Here, we take a grayscale image as an example, namely image channel is 1, the sampling rate is 0.1 and block size of BCS is 32, the out channels of sampling layer will be 102.
3.1.2 Attention-based Ternary Projection
Traditionally, binarization methods that seek to binarize the weights of DNN models [], are used to address the storage and computational issues. Although the hardware pressure can be reduced by binarized weight, however, it introduces an unacceptable loss of accuracy. The ternary network reduces the storage pressure and calculation pressure of the hardware by converting the 32-bit floating-point number to the low-bit three-value (+1, 0, -1), followed by a conversion of the multiplication operation into addition and subtraction. Ternary networks seek to balance the full precision and the binary precision weight networks. In general, a ternary matrix can be achieved in two steps, namely the binarization step and the attention-based pruning step.
The binarization step aims to convert floating point values into binarized values, i.e., mapping original weights
Therefore, when the data stream is forward propagated, the current weight matrix in the form of floating-point numbers is obtained, with each entry determined by formula (6). In the parameter update stage, that is, the gradient is calculated according to the corresponding binarized weights followed by the update of the floating-point number.
In the attention-based pruning step of ATP-Net, the importance of the weight matrix will be calculated first based on the attention mechanism, and is used to prune the unimportant parameters, with the purpose to obtain a mask matrix, whose element
Calculation of the mask matrix is also divided into two steps.
i) In the first step, due to the requirements of linear sampling, we establish the relationship between the attention mechanism and the weight matrix of the sampling layer through the following equation
$$
F(x)={{y}{atten}}=\hat{W}\otimes x
\tag{7}
$$
where $y{atten}$ is the output tensor of attention layer,
ii) In the second step, the importance matrix can be calculated through the following equation
$$
Y=\hat{W}\ominus W
\tag{8}
$$
where
3.1.3 Scaling Factor Alpha
In order to compensate for the loss of precision caused by ternary matrix, we use an adaptive scaling factor, alpha, to enhance the expressive ability of ternary matrix. This scaling factor learns itself adaptively during model training.
3.2 Reconstruction Network
3.2.1 Depth Separable Convolution
In order to fully express the relationship between the various feature maps, the initialization recovery module uses a depth separable convolution layer to replace the standard convolution layer in the initial reconstruction module. The depth separable convolution consists of two steps, deep convolution and point-wise convolution. Different convolution kernels can be used for each input feature map, which can express the relationship within the feature map and that between the feature maps of each layer.
By decoupling the channel dimension of the feature map and the interior relationship of the feature map, it is possible to obtain not only the internal local features in the channel, but also the channel features between different feature maps. The realization of the depth separable convolution emphasizes on extraction of the high-level features of the input tensor by combining deep convolution and point-wise convolution through two steps.
For deep convolution, the schematic diagram of the convolution mode is shown in Figure2. Each filter in the deep convolution is responsible for a single channel. Unlike unconventional convolutions operate in that each channel is processed by only one filter, conventional ones convolve the input and then perform the adding processing. The number of feature maps generated by this process is the same as that of input channels, and that of channels which will not change.
Figure 2
For the separated point-by-point convolution, a convolution kernel of size 1 is introduced to perform conventional convolution operations upon the input, pixel by pixel. As a result, the convolution operation of point-by-point convolution is capable of learning the weights of the input tensor in the channel direction, and generate a new feature map for output. In general, dimension of the input tensor will not change, both horizontally and vertically, through the module, using depth separable convolution without adding boundary filling.
3.2.2 Dilated Convolution
Dilated convolutions and ordinary ones differ in the convolution kernel. The latter uses compact kernel, while the former can control the internal interval of its convolution kernel through the expansion rate, whose mathematical description can be shown as follows:
$$
{{y}{i}}=\sum\limits{k=1}^{K}{{{x}{i+k\times r}}}{{w}{k}}
\tag{10}
$$
where
Figure 3 is a principled diagram of the aforementioned two convolutions, which demonstrates their difference of the kernel structures, revealed by -a and -b, respectively. It can be seen from the figure that dilated convolution is a special case of ordinary convolutions, by padding 0 to expand convolution kernel. Consequently, the dilated convolution can expand the receptive field arbitrarily without introducing additional parameters.
Figure 3
Dilated convolutions also play a very beneficial role in removing the block effect of block compressed sensing. Because the block effect is caused by the loss of context information as well as the resulted edge effect, it can be reduced by adding dilated convolution. Simultaneously, the reconstruction quality can be improved.
However, continuous use of dilated convolutions will trigger a chessboard problem. The reason is that in the feature map of the output of a dilated convolution, neighborhood pixels of a specified pixel can be obtained by convolution from different independent sets. For the problem of continuous cascading of dilated convolutional layers, a combination of different expansion rates with a common divisor greater than 1 is conducted. Although the method of Hybrid Dilated Convolution [] can be used, an excessive expansion rate requires a large boundary filling, leading to unnecessary operations, which is the reason that interspersed basic blocks are used to avoid cascading and chessboard problems.
3.2.3 Reconstruction Network Architecture
The initial reconstruction part, in which a depth separable convolution is used as the basic structure, is used to obtain a simple reconstruction of the image from the compressed vector. The reason why depth separable convolution is used instead of ordinary convolution is that an ordinary convolution can merely express the features within each layer of feature maps and ignores the data relevance among feature maps. However, the input data of the initial reconstruction module is measurement of block compressed sampling. There is a clear physical meaning between these layers, i.e., the spatial association between different sampling blocks. Therefore, the features between layers can be learned to estimate the correlation between sampling blocks, by using depth separable convolutions, learning the features in layers, mitigating the block edge effect caused by the block sampling, and also reducing the number of parameters of the model.
The deep reconstruction part uses a stack of basic blocks based on channel splicing, 2D convolution, and residual connections, which is used to further optimize quality of the image reconstruction. The residual connections within the blocks and the residual connections between blocks are used to avoid network training problem induced by excessive network layers. At the same time, we add a dilated convolutional layer before one or several basic blocks. By using the dilated convolutional layer, the number of intervals between the convolution kernels can be increased by setting a certain dilation rate to expand the receptive field. Compared with traditional down sampling methods, such as adding a pooling layer, etc., dilated convolution can increase the receptive field without degrading the resolution and is able to capture more contextual information. Adding a dilated convolution layer between multiple basic blocks is beneficial to circumvent the problem caused by uninterrupted use of dilated convolution.
The output of initial reconstruction,
After been processed by the first convolutional layer, the signal will go through multiple basic blocks and dilated convolution one by one for feature learning, which can be formally expressed as
$$
X_{i}^{base}=F(g(X_{i}^{c})+X_{i-1}^{base})
\tag{12}
$$
$$
{{X}^{d}}=F({{W}{d}}\cdot X{i}^{c}+{{b}{d}})
\tag{13}
$$
where $X{i}^{base}$ represents the output result of the i-th basic block,
Composition of the basic block in the deep reconstruction module is inspired by the ESRGAN [] series networks in the field of image super-resolution, combined with the idea of Dense Block in the internal basic structure of the ESRGAN series of networks, named RRDB (Residual in Residual Dense Block). The input and output of each layer are concatenated to form the input of its successive layer, as illustrated by Figure 4, which demonstrates the detailed structure diagram of the base block.
Figure 4
In Figure 4, DS-Conv represents the depth separable convolutional layer, as that in Figure4. In and Out represent the number of input channels and output ones of the convolutional layer, respectively, and Concatenate represents the channel concatenation of two tensors. The process of basic block, i.e., from input to output, can be expressed as
$$
{{y}{1}}=F(W{d}^{1}\cdot x+b_{d}^{1})
\tag{14}
$$
$$
{{y}{2}}=F(W{d}^{2}\cdot ({{y}{1}}\oplus x)+b{d}^{2})
\tag{15}
$$
$$
{{y}{3}}=F(W{d}^{3}\cdot ({{y}{2}}\oplus {{y}{1}}\oplus x)+b_{d}^{3})
\tag{16}
$$
$$
g(x)={{y}_{3}}+\beta x
\tag{17}
$$
where
4 Experimental Results
4.1 Dataset
We use the BSDS500 database [], which consists of 500 images, and LIVE1 database [] for training and validation, respectively. In order to enhance the diversity of data and improve the performance of the model, TorchVision [] are utilized to preprocess data and achieve data enhancement. The input training image is randomly cropped, with image size set to 96×96, and the horizontal and vertical directions are both randomly flipped with the probability of 50%. Finally, the grayscale processing is performed to obtain the final input data that will be fed to the ATP-Net. Meanwhile, we select Set11[] to evaluate the proposed methods, which consists of 11 grayscale images.
4.2 Training details of ATP-Net model.
In the experiment, the ATP-Net model is built and trained based on PyTorch, with hyperparameters set as follows: the block size at the sampling part is 32, the initial learning rate is set to 1×10-4, the batch size is 32, and the learning rate is reduced to 10% of the original learning rate every 100 rounds. It should be noted that the length and width of the input image must be divisible by 32, since the block size of block sampling is set to 32 pixels. Otherwise, the input image has to be cropped. In case the length or width of the input image is less than 32 pixels, no output can be obtained.
4.3 Comparison with state of the art
In this paper, performance of ATP-Net will be evaluated in terms of both objective and subjective assessment, i.e., peak signal-to-noise ratio (PSNR) and visual evaluation, with TVAL3 algorithm, D-AMP algorithm, ReconNet and DR2-Net being selected as references. As for the Set11 data set, the sampling rates (MR) of 0.25, 0.1, 0.04, 0.01 are used. The Set11 dataset is mainly composed of natural images, including classic images used in the field of image measurement, such as Lena, etc., as well as animated images Flinstones, to test the model's capability of reconstruction in unnatural images.
Table 1 lists the PSNR performance of different algorithms at various sampling rates. It can be seen from Table 1 that the improvement of ATP-Net is obvious at a lower sampling rate, and it tends to be less and less obvious as the sampling rate increases, which implies that ATP-Net behaves better at lower sampling rate. Table 1
Image | Mothed | MR=0.25 | MR=0.10 | MR=0.04 | MR=0.01 |
---|---|---|---|---|---|
Barbara | TVAL3 | 24.19 | 21.88 | 18.98 | 11.94 |
D-AMP | 25.08 | 21.23 | 16.37 | 5.48 | |
ReconNet | 23.25 | 21.89 | 20.38 | 18.61 | |
DR2-Net | 25.77 | 22.69 | 20.7 | 18.65 | |
ATP-Net | 26.81 | 24.18 | 22.94 | 21.36 | |
Fingerprint | TVAL3 | 22.7 | 18.69 | 16.04 | 10.35 |
D-AMP | 25.17 | 17.15 | 13.82 | 4.66 | |
ReconNet | 25.57 | 20.75 | 16.91 | 14.82 | |
DR2-Net | 27.65 | 22.03 | 17.4 | 14.73 | |
ATP-Net | 28.91 | 25.42 | 19.55 | 16.1 | |
Flinstones | TVAL3 | 24.05 | 18.88 | 14.88 | 9.75 |
D-AMP | 25.02 | 16.94 | 12.93 | 4.33 | |
ReconNet | 22.45 | 18.92 | 16.3 | 13.96 | |
DR2-Net | 26.19 | 21.09 | 16.93 | 14.01 | |
ATP-Net | 27.56 | 24.42 | 19.36 | 16.35 | |
Lena | TVAL3 | 28.67 | 24.16 | 19.46 | 11.87 |
D-AMP | 28 | 22.51 | 16.52 | 5.73 | |
ReconNet | 26.54 | 23.83 | 21.28 | 17.87 | |
DR2-Net | 29.42 | 25.39 | 22.13 | 19.97 | |
ATP-Net | 31.05 | 28.53 | 24.97 | 21.7 | |
Monarch | TVAL3 | 27.77 | 21.16 | 16.73 | 11.09 |
D-AMP | 26.39 | 19 | 14.57 | 6.2 | |
ReconNet | 24.31 | 21.1 | 18.19 | 15.39 | |
DR2-Net | 27.95 | 23.1 | 18.93 | 15.33 | |
ATP-Net | 30.17 | 27.55 | 22.32 | 20.47 | |
Parrots | TVAL3 | 27.17 | 23.13 | 18.88 | 11.44 |
D-AMP | 26.86 | 21.64 | 15.78 | 5.09 | |
ReconNet | 25.59 | 22.63 | 20.27 | 17.63 | |
DR2-Net | 28.73 | 23.94 | 21.16 | 18.01 | |
ATP-Net | 30.52 | 27.63 | 23.99 | 21.85 | |
Boats | TVAL3 | 28.81 | 23.86 | 19.2 | 11.86 |
D-AMP | 29.26 | 21.9 | 16.01 | 5.34 | |
ReconNet | 27.3 | 24.15 | 21.36 | 18.49 | |
DR2-Net | 30.09 | 25.58 | 22.11 | 18.67 | |
ATP-Net | 31.47 | 28.85 | 24.66 | 21.68 | |
Cameraman | TVAL3 | 25.69 | 21.91 | 18.3 | 11.97 |
D-AMP | 24.41 | 20.35 | 15.11 | 5.64 | |
ReconNet | 23.15 | 21.28 | 19.26 | 17.11 | |
DR2-Net | 25.62 | 22.46 | 19.84 | 17.08 | |
ATP-Net | 28.05 | 25.35 | 22.19 | 20.03 | |
Foreman | TVAL3 | 35.42 | 28.69 | 20.63 | 10.97 |
D-AMP | 35.45 | 25.51 | 16.27 | 3.84 | |
ReconNet | 29.47 | 27.09 | 23.72 | 20.04 | |
DR2-Net | 33.53 | 29.2 | 25.34 | 20.59 | |
ATP-Net | 35.28 | 33.71 | 30.07 | 25.68 | |
House | TVAL3 | 32.08 | 26.29 | 20.94 | 11.86 |
D-AMP | 33.64 | 24.55 | 16.91 | 5 | |
ReconNet | 28.46 | 26.69 | 22.58 | 19.31 | |
DR2-Net | 31.83 | 27.53 | 23.92 | 19.61 | |
ATP-Net | 34.08 | 31.86 | 27.41 | 23.17 | |
Peppers | TVAL3 | 29.62 | 22.64 | 18.21 | 11.35 |
D-AMP | 29.84 | 21.39 | 16.13 | 5.79 | |
ReconNet | 24.77 | 22.15 | 19.56 | 16.82 | |
DR2-Net | 28.49 | 23.73 | 20.32 | 16.9 | |
ATP-Net | 31.04 | 27.56 | 23.35 | 20.47 | |
Mean PSNR | TVAL3 | 27.84 | 22.84 | 18.39 | 11.31 |
D-AMP | 28.17 | 21.14 | 15.49 | 5.19 | |
ReconNet | 25.54 | 22.68 | 19.99 | 17.27 | |
DR2-Net | 28.66 | 24.32 | 20.8 | 17.44 | |
ATP-Net | 30.4 | 27.73 | 23.71 | 20.55 |
Figure 5 compares the reconstruction effect of the ATP-Net model over the Set11 dataset, which demonstrates the original images in (a), and images generated by ATP-Net with sampling rate set to 0.25 in (b). It can be seen that 11 images in Set11 dataset as whole have a fairly satisfactory reconstruction effect, with a few being polluted by a certain amount of noise due to the loss caused by the ternarization. For an animated image of FLINSTONES, the reconstruction effect is relatively ordinary, since this type of images differ from commonly natural images, which are usually used as the training dataset. Table 1 shows the corresponding PSNR index values.
5 Conclusion
In this paper, we propose a ternary sampling matrix method based on attention mechanism and implements a compressed sensing algorithm named ATP-Net model. The model ternarizes the sampling matrix to alleviate the problems of large storage space occupation and low computational efficiency of the sampling matrix in compressed sensing, on the basis of perceiving the importance of the weights, which based on the binarization of the sampling matrix, through the attention mechanism. Experimental results over the Set11 dataset illustrate that ATP-Net out-performances the state-of-the-art models, e.g., TVAL3 algorithm, D-AMP algorithm, ReconNet and DR2-Net, and so forth, either in terms of PSNR, or by means of subjective comparison.