Papers by Ali Azarpeyvand

IEEE Access
Crime prediction in video-surveillance systems is required to prevent incident and protect assets... more Crime prediction in video-surveillance systems is required to prevent incident and protect assets. In this sense, our article proposes first artificial intelligence approach for Robbery Behavior Potential (RBP) prediction and detection in an indoor camera. Our method is based on three detection modules including head cover, crowd and loitering detection modules for timely actions and preventing robbery. The two first modules are implemented by retraining YOLOV5 model with our gathered dataset which is annotated manually. In addition, we innovate a novel definition for loitering detection module which is based on DeepSORT algorithm. A fuzzy inference machine renders an expert knowledge as rules and then makes final decision about predicted robbery potential. This is laborious due to: different manner of robber, different angle of surveillance camera and low resolution of video images. We accomplished our experiment on real world video surveillance images and reaching the F1-score of 0.537. Hence, to make an experimental comparison with the other related works, we define threshold value for RBP to evaluate video images as a robbery detection problem. Under this assumption, the experimental results show that the proposed method performs significantly better in detecting the robbery as compared to the robbery detection methods by distinctly report with F1-score of 0.607. We strongly believe that the application of the proposed method could cause reduction of robbery detriment in a control center of surveillance cameras by predicting and preventing incident of robbery. On the other hand, situational awareness of human operator enhances and more cameras can be managed.

arXiv (Cornell University), May 14, 2022
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been signifi... more Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computation, which leads to high memory usage and energy consumption. As a result, deploying of DNNs on devices with constrained hardware resources poses significant challenges. To overcome this, various compression techniques have been widely employed to optimize DNN accelerators. A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. Quantization not only reduces memory requirements but also replaces high-cost operations with low-cost ones. DNN quantization offers flexibility and efficiency in hardware design, making it a widely adopted technique in various methods. Since quantization has been extensively utilized in previous works, there is a need for an integrated report that provides an understanding, analysis, and comparison of different quantization approaches. Consequently, we present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We describe clustering-based quantization methods and explore the use of a scale factor parameter for approximating full-precision values. Moreover, we thoroughly review the training of quantized DNN, including the use of straight-through estimator and quantized regularization. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization. Furthermore, we highlight the evaluation metrics for quantized methods and important benchmarks in image classification task. We also present the accuracy of the state-of-the-art methods on CIFAR-10 and ImageNet. This paper attempts to make the readers familiar with the basic and advanced concepts of quantization, introduce important works in DNN quantization, and highlight challenges for future research in this field.
Logarithm-approximate floating-point multiplier
Microelectronics Journal
Analysis of reiability parameters of convey or belt joints

Significantly improving human detection in low-resolution images by retraining YOLOv3
2021 26th International Computer Conference, Computer Society of Iran (CSICC)
Human detection in images is a crucial task due to its usage in different areas including person ... more Human detection in images is a crucial task due to its usage in different areas including person detection and identification, abnormal surveillance and crowd counting. Low-resolution of image sequences taken by stationary outdoor surveillance cameras is very challenging. Detecting human with deep learning techniques, is more powerful than traditional methods due to its ability to learn high-level deeper features, high detection accuracy and speed. Therefore, this paper proposes a method for human detection in low-resolution images based on YOLOv3. This method will prepare a dataset of low-resolution images collected by outdoor surveillance cameras and annotate them manually. Next, we retrain YOLOv3 to make an improved model for low-resolution images. The model achieves F1-score of 0.804 human detecting for low-resolution test images.
An ultra-fast multi-objective optimization algorithm for VLIW architecture
2016 IEEE East-West Design & Test Symposium (EWDTS), 2016
In this paper, a novel ultra-fast multi-objective optimization algorithm for VLIW architecture de... more In this paper, a novel ultra-fast multi-objective optimization algorithm for VLIW architecture design space exploration has been proposed. This method which is based on design space pruning, is applicable to any architecture objectives such as the number of issue widths, ALUs, the number of register file clusters and etc. Proposed method could be utilized for optimizing the configuration to meet various constraints of the design. Having defined several distinct objectives in this study, our heuristic method is deployed to optimize the design in terms of performance and cost. Implementation of the algorithm for three sample applications has resulted in substantial speed improvement (over 35000 times more) and negligible error (up to 3.5%).

Current Genomics
System biology problems such as whole-genome network construction from large-scale gene expressio... more System biology problems such as whole-genome network construction from large-scale gene expression data are sophisticated and time-consuming. Therefore, using sequential algorithms are not feasible to obtain a solution in an acceptable amount of time. Today, by using massively parallel computing, it is possible to infer large-scale gene regulatory networks. Recently, establishing gene regulatory networks from large-scale datasets have drawn the noticeable attention of researchers in the field of parallel computing and system biology. In this paper, we attempt to provide a more detailed overview of the recent parallel algorithms for constructing gene regulatory networks. Firstly, fundamentals of gene regulatory networks inference and large-scale datasets challenges are given. Secondly, a detailed description of the four parallel frameworks and libraries including CUDA, OpenMP, MPI, and Hadoop is discussed. Thirdly, parallel algorithms are reviewed. Finally, some conclusions and guidelines for parallel reverse engineering are described.
Journal of Iranian Association of Electrical and Electronics Engineers, 2021

Data-Driven and Knowledge-Based Algorithms for Gene Network Reconstruction on High-Dimensional Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2020
Previous efforts in gene network reconstruction have mainly focused on data-driven modeling, with... more Previous efforts in gene network reconstruction have mainly focused on data-driven modeling, with little attention paid to knowledge-based approaches. Leveraging prior knowledge, however, is a promising paradigm that has been gaining momentum in network reconstruction and computational biology research communities. This paper proposes two new algorithms for reconstructing a gene network from expression profiles with and without prior knowledge in small sample and high-dimensional settings. First, using tools from the statistical estimation theory, particularly the empirical Bayesian approach, the current research estimates a covariance matrix via the shrinkage method. Second, estimated covariance matrix is employed in the penalized normal likelihood method to select the Gaussian graphical model. This formulation allows the application of prior knowledge in the covariance estimation, as well as in the Gaussian graphical model selection. Experimental results on simulated and real datasets show that, compared to state-of-the-art methods, the proposed algorithms achieve better results in terms of both PR and ROC curves. Finally, the present work applies its method on the RNA-seq data of human gastric atrophy patients, which was obtained from the EMBL-EBI database. The source codes and relevant data can be downloaded from: https://github.com/AbbaszadehO/DKGN.

Journal of Electronic Testing, 2018
Advances in VLSI technology have made circuits more vulnerable to faults. Architectural vulnerabi... more Advances in VLSI technology have made circuits more vulnerable to faults. Architectural vulnerability factor (AVF) reflects the possibility that a transient fault eventually causes an error in the circuit output. This factor represents the system vulnerability to transient faults and is used to compare different fault-tolerant designs or architectures. In this paper, we have introduced a simulation-based fault injection framework which is developed to evaluate the AVF of different adder hardware description models in various abstraction levels. Then, we introduce the most beneficial abstraction level for evaluating the vulnerability of a design. Finally, exploiting our fault injection framework, we compare the inherent fault tolerance of eight famous adders. We have explored the design space of different adder architectures while considering both delay and area constraints for comparing the inherent fault tolerance level of different adder architectures. To the best of our knowledge, this comparative study is not covered in the literature elsewhere.

Analog Integrated Circuits and Signal Processing, 2017
The parallel structure of matrix multipliers makes them fascinating candidates to benefit from me... more The parallel structure of matrix multipliers makes them fascinating candidates to benefit from memristors' high density architecture. This paper first explains a memristor-based analog vector-matrix multiplier suitable for approximate computing. According to the existence of fast and efficient converters, namely, DACs and ADCs, in the field of approximate computing and the programmability of memristors, the presented vector-matrix multiplier is combined with digital circuits which it leads to a matrix-matrix multiplier as an extension. In this work, opamps' characteristics such as power and speed, distribution of matrix elements, and memristors' faults have been considered and their effects on performance, accuracy, and efficiency of the proposed multiplier have been analyzed. Also, a new structure for handling negative numbers has been proposed. All the circuits have been simulated using ''Ngspice mixed-signal circuit simulator'' in C?? programming environment. The simulation results revealed that the multiplier's analog core brought gains in terms of performance and energy when acceptable ranges of inaccuracies in results could be tolerated.

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018
In this paper, we demonstrate the feasibility of building a memristor-based approximate accelerat... more In this paper, we demonstrate the feasibility of building a memristor-based approximate accelerator to be used in cooperation with general-purpose ×86 processors. First, an integrated full system simulator is developed for simultaneous simulation of any multicrossbar architecture as an accelerator for ×86 processors, which is performed by coupling a cycle accurate Marss ×86 processor simulator with the Ngspice mixed-level/mixed-signal circuit simulator. Then, a novel mixedsignal memristor-based architecture is presented for multiplying floating-point signed complex numbers. The presented multiplier is extended for accelerating convolutional neural networks and finally, it is tightly integrated with the pipeline of a generic ×86 processor. To validate the accelerator, first it is utilized for multiplying different matrices that vary in size and distribution. Then, it is used as an accelerator for accelerating the tiny-dnn, an open-source C++ implementation of deep learning neural networks. The memristor-based accelerator provides more than 100× speedup and energy saving for a 64 × 64 matrixmatrix multiplication, with an accuracy of 90%. Using the accelerated tiny-dnn for the MNIST database classification more than 10× speedup and energy saving along with 95.51% pattern recognition accuracy is achieved.

Analog Integrated Circuits and Signal Processing, 2018
This paper presents a mixed-signal implementation of complex-valued FIR filter bank using a memri... more This paper presents a mixed-signal implementation of complex-valued FIR filter bank using a memristor-based approximate multiplier. First, a memristor-based vector-matrix multiplier was developed for complex number multiplications and then it was extended to perform the convolution operation. Finally, it was utilized for filtering application to study the potential capability of the memristor-based multipliers for accelerating digital signal processing. To evaluate the design, datasets presented by TDFIR kernel of HPEC benchmark suite, which include complex computations, were used. For simulations, Ngspice mixed-signal circuit simulator has been employed as a shared library in C?? programming environment. According to simulation results, using 6-bit accuracy for a 64 9 64 filter bank, the performance can be increased up to * 3 GFLOPS while presenting efficiency of * 4 GFLOPS/W. The results show that memristor-based mixed-signal architectures can be considered as favorable candidates for accelerating digital signal processing.

Hardware Implementation of a Chaos Based Image Encryption Using High-Level Synthesis
2021 29th Iranian Conference on Electrical Engineering (ICEE)
In recent years, the use of digital images in data transmission networks and digital devices has ... more In recent years, the use of digital images in data transmission networks and digital devices has increased greatly. Therefore, the security of images in transmission or storage has become very important due to security attacks. For this reason, cryptography is used to protect images from unauthorized access and attacks. In this paper, an image encryption algorithm using chaos theory is proposed in which, besides providing the appropriate level of security, the speed of encryption operations is also considered. The 1D tent map is used for the encryption process. Its security analyses have been performed to verify its efficiency in terms of security. The hardware implementation of the algorithm was performed on FPGA with the help of high-level synthesis (HLS), architectural techniques, time and resource optimization. Based on our experimental results, the hardware designed for image encryption is robust in terms of security and it can achieve an encryption and decryption time of 21.14 milliseconds for an image with a size of ${256\times 256}$.

Reliability-aware cross-layer custom instruction screening
ABSTRACT Bias Temperature Instability (BTI) and process variation introduce remarkable unpredicta... more ABSTRACT Bias Temperature Instability (BTI) and process variation introduce remarkable unpredictability to Custom Instructions (CIs) manufactured at nano-scale technology. Moreover, shrinking the feature size to nanometer levels makes soft error another critical issue of CIs. To tackle these factors, we propose a reliability-aware cross-layer CI screening method. By adding an intermediate phase between the CI generation and CI selection phases, this method enables designers to prune the outputs of the generation phase in order to guarantee that synthesized CIs meet the required reliability constraints. For this purpose, a holistic framework is developed to analyze the combined effects of the BTI and process variation as well as the soft error on the CIs by making a link between circuit-level and system-level information. Based on this information collected from different layers of abstraction, the screening method prunes those CIs which cannot meet the reliability constraints. Experiments illustrate that BTI-unaware CI selection techniques may not meet the desired lifetime because of BTI-induced delay shift of CIs. Moreover, according to the results, a remarkable percentage of CIs is vulnerable to soft error and should not be fed into CI selection phase.
Reliability aware throughput management of chip multi-processor architecture via thread migration
The Journal of Supercomputing, 2016
Fast and Accurate Architectural Vulnerability Analysis for Embedded Processors Using Instruction Vulnerability Factor
Microprocessors and Microsystems, 2016
Energy/throughput trade-off in a fully asynchronous NoC for GALS-based MPSoC architectures
5th International Conference on Design & Technology of Integrated Systems in Nanoscale Era, 2010
... router as long as there is enough space in the FIFO of neighbor'... more ... router as long as there is enough space in the FIFO of neighbor's input port (the FIFO depth is 8). ... Figure 2. Asynchronous NoC infrastructure ... latency, and throughput, we have modeled the NoC switch and its corresponding links in gate-level using VERILOG hardware description ...

Reliability considerations in dynamic voltage and frequency scaling schemes
5th International Conference on Design & Technology of Integrated Systems in Nanoscale Era, 2010
Dynamic voltage and frequency scaling (DVFS) is an effective method for controlling both energy a... more Dynamic voltage and frequency scaling (DVFS) is an effective method for controlling both energy and performance of a system. Since the increasing rate of radiation-induced transient faults depends on operating frequency and supply voltage, DVFM techniques are recently shown to have compromising advantages on electronic system reliability. Therefore, ignoring the effects of voltage scaling on fault rate could considerably decrease the system reliability. In this paper we propose a formula for accurate analytic modeling of the soft error rate of a system in which the frequency and voltage are scaled using a DVFS algorithm. The simulation experiments show that, compared with other published models, the results using proposed model are 30% closer to the results from SER estimation algorithm which considers electrical and logical masking for ISCAS85' circuits.
A Source-Based Multicast Scheme in IEEE 802.16 Mesh Mode
International Journal of Wireless and Microwave Technologies, 2012
Uploads
Papers by Ali Azarpeyvand