Papers by parvin razzaghi

Can Deep Learning Models Select Portrait Images Like Humans? : Subjective and Objective Approaches
The efficiency of portrait image selection and analysis systems is completely dependent on the qu... more The efficiency of portrait image selection and analysis systems is completely dependent on the quality of the face image, which depends on various factors. Since real-time manual selection of high-quality portrait photos from a sequence of different frames or images is usually impossible, using automatic methods can be useful in selecting photos, especially in large collections. On the other hand, existing automatic methods may not be able to perform like humans in portrait classification. These methods may consider only special factors like emotional state or gaze direction to select an image. In this work, we tried to simulate human choices for intelligent systems in portrait images and investigate whether our model can act like a human in choosing portraits. To achieve this goal, a large collection of facial images was collected, and under a subjective quality assessment study, each of the 200 images was judged by more than 80 people. The results obtained from this study were use...
An essential research topic at the heart of visual recognition is automatic captioning of images.... more An essential research topic at the heart of visual recognition is automatic captioning of images. This involves designing an algorithm that takes the image as input and generates a natural language description succinctly describing all or parts of the image. Such a system has wideranging applications such as annotating images and exploitation natural descriptions to search for images or texts. This changed significantly with the availability of large-scale annotated data, such as the ImageNet dataset and the application of deep learning techniques, specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs). This has led to the successful application of such deep networks to various other tasks including the task of image captioning and image-text retrieval.
TripletMultiDTI: Multimodal representation learning in drug-target interaction prediction with triplet loss function
Expert Systems with Applications
Multivariate pattern recognition by machine learning methods
Elsevier eBooks, 2023
Multimodal brain tumor detection using multimodal deep transfer learning
Applied Soft Computing
Introduction to python
Elsevier eBooks, 2023

Cornell University - arXiv, Oct 25, 2022
Detection of out-of-distribution samples is one of the critical tasks for real-world applications... more Detection of out-of-distribution samples is one of the critical tasks for real-world applications of computer vision. The advancement of deep learning has enabled us to analyze real-world data which contain unexplained samples, accentuating the need to detect out-of-distribution instances more than before. GAN-based approaches have been widely used to address this problem due to their ability to perform distribution fitting; however, they are accompanied by training instability and mode collapse. We propose a simple yet efficient reconstruction-based method that avoids adding complexities to compensate for the limitations of GAN models while outperforming them. Unlike previous reconstruction-based works that only utilize reconstruction error or generated samples, our proposed method simultaneously incorporates both of them in the detection task. Our model, which we call "Connective Novelty Detection" has two subnetworks, an autoencoder, and a binary classifier. The autoencoder learns the representation of the positive class by reconstructing them. Then, the model creates negative and connected positive examples using real and generated samples. Negative instances are generated via manipulating the real data, so their distribution is close to the positive class to achieve a more accurate boundary for the classifier. To boost the robustness of the detection to reconstruction error, connected positive samples are created by combining the real and generated samples. Finally, the binary classifier is trained using connected positive and negative examples. We demonstrate a considerable improvement in novelty detection over state-of-the-art methods on MNIST and Caltech-256 datasets.

TripletMultiDTI: Multimodal Representation Learning in Drug-Target Interaction Prediction
BackgroundIn drug discovery, drug-target interaction (DTI) plays a crucial role. Identifying DTI ... more BackgroundIn drug discovery, drug-target interaction (DTI) plays a crucial role. Identifying DTI in a wet-lab experiment is time-consuming, labor-intensive, and costly. Using reliable computational methods to predict DTI mitigates the enormous costs and time of drug discovery. Deep learning-based methods for predicting DTI have recently gained more attention. ResultsIn this paper, a new multimodal approach to DTI is proposed. It is shown that a discriminative feature representation of the drug-target pair plays the main role in multimodal DTI prediction. To achieve this goal, we propose a new multimodal approach that utilizes triplet loss jointly with the prediction loss. The proposed approach is abbreviately called TripletMultiDTI. The proposed approach has two main contributions: a new architecture that fuses the multimodal knowledge to predict interaction affinity labels and a new loss function that utilizes the triplet loss. Triplet loss encourages clustering of feature space su...

Proteochemometrics modeling for prediction of the interactions between caspase isoforms and their inhibitors
Molecular Diversity
Caspases (cysteine-aspartic proteases) play critical roles in inflammation and the programming of... more Caspases (cysteine-aspartic proteases) play critical roles in inflammation and the programming of cell death in the form of necroptosis, apoptosis, and pyroptosis. The name of these enzymes has been chosen in accordance with their cysteine protease activity. They act as cysteines in nucleophilically active sites to attack and cleave target proteins in the aspartic acid and amino acid C-terminal. Based on the substrate's structure and the specificity, the physiological activity of caspases is divided. However, in apoptosis, the division of caspases into initiating caspases (caspase 2, 8, 9, and 10) and executive caspases (caspase 3, 6, and 7) is essential. The present study aimed to perform Proteochemometrics Modeling to generalize the data on caspases, which could predict ligand and protein interactions. In this study, we employed protein and ligand descriptors. Moreover, protein descriptors were computed using the Protr R package, while PADEL-Descriptor was employed for the computation of ligand descriptors. In addition, NCA (Neighborhood Component Analyses) was used for descriptor selection, and SVR, decision tree, and ensemble methods were utilized for the proteochemometrics modeling. This study shows that the ensemble model demonstrates superior performance compared with other models in terms of R2, Q2, and RMSE criteria.

Data Science: From Research to Application, 2020
Nowadays, automobile manufacturers make efforts to develop ways to make cars fully safe. Monitori... more Nowadays, automobile manufacturers make efforts to develop ways to make cars fully safe. Monitoring driver's actions by computer vision techniques to detect driving mistakes in real-time and then planning for autonomous driving to avoid vehicle collisions is one of the most important issues that has been investigated in the machine vision and Intelligent Transportation Systems (ITS). The main goal of this study is to prevent accidents caused by fatigue, drowsiness, and driver distraction. To avoid these incidents, this paper proposes an integrated safety system that continuously monitors the driver's attention and vehicle surroundings, and finally decides whether the actual steering control status is safe or not. For this purpose, we equipped an ordinary car called FARAZ with a vision system consisting of four mounted cameras along with a universal car tool for communicating with surrounding factory-installed sensors and other car systems, and sending commands to actuators. The proposed system leverages a scene understanding pipeline using deep convolutional encoder-decoder networks and a driver state detection pipeline. We have been identifying and assessing domestic capabilities for the development of technologies specifically of the ordinary vehicles in order to manufacture smart cars and eke providing an intelligent system to increase safety and to assist the driver in various conditions/situations.

Discriminative Kernel Matrix for Domain Adaptation
Electrical Engineering (ICEE), Iranian Conference on, 2018
In this paper, we investigate the unsupervised domain transfer learning in which there is no labe... more In this paper, we investigate the unsupervised domain transfer learning in which there is no label in the target samples while the source samples are all labeled. In our approach the target and source samples are transferred to a new domain and each target sample is constructed by from the linear combination of the source samples in the new transformed domain. The low-rank and sparse constraints are imposed on the reconstruction coefficient matrix which maintains the local and global structure of the samples in the transferred domain. In this paper, the information content of the reconstruction coefficient matrix is utilized in order to consider the discriminative ability of the source samples. Here, we utilize the max-margin classifier in which the kernel matrix is defined using the reconstruction coefficient matrix. To evaluate the proposed method, it is applied on Office and Caltech-256 datasets. The experimental results show that our proposed approach is performed better than the state-of-the-art approaches.

Similarity based context for nonparametric scene parsing
2017 Iranian Conference on Electrical Engineering (ICEE), 2017
Scene parsing is an important research area in computer vision which aims to provide semantic lab... more Scene parsing is an important research area in computer vision which aims to provide semantic label for each pixel in an image. In this paper, we propose a new approach in non-parametric scene parsing. Typical non-parametric scene parsing approaches have two main steps: retrieving similar images to test image and label transferring from retrieved images to the test image. In our approach, in the label transferring step, we use an objective function in which object level and context level information are incorporated. The main contribution of this paper is to propose a new contextual term which it is adapted to the employed similarity distance measure in the retrieval stage. Also, we propose a new adaptive weighting procedure which balances the effectiveness of object-level and context level terms in the objective function. To evaluate the proposed approach, it is applied on the MSRC-21 datasets. The obtained results show that our approach outperforms comparable state-of-the-art nonparametric approaches.

Modality adaptation in multimodal data
Expert Systems with Applications, 2021
Abstract Recently, multimodal data has received much attention. In classical machine learning, it... more Abstract Recently, multimodal data has received much attention. In classical machine learning, it is assumed that all data comes from one modality while in multimodal machine learning, the information comes from different modalities. In multimodal machine learning, transiting, or fusing knowledge from different modalities is an important step. Hence, in these steps, the different marginal distributions between different modalities should be taken into account. However, in recent years, modality adaptation has not gotten enough attention. The motivation of this work is to consider modality adaptation to effectively encode the shared common or complementary knowledge in multimodal data. To reduce the modality shift, we present a new perspective on the modality adaptation algorithm. In multimodal data, by applying the existing domain adaptation techniques to reduce the modality shift, a problem arises because of the insufficient capability of those techniques in preserving complementary knowledge. Our proposed modality adaptation is designed such that it simultaneously considers both the shared and complementary knowledge of each modality while preserving the discriminative ability of each modality in the label space. To evaluate the proposed approach, we have applied it to two different multimodal applications: multi-view object detection and RGBD image semantic segmentation. Our results show that the proposed modality adaptation technique is successful in transferring and fusing knowledge.

Next Frame Prediction Using Flow Fields
Data Science: From Research to Application, 2020
Next frame prediction is the challenging task in computer vision and video prediction. Despite th... more Next frame prediction is the challenging task in computer vision and video prediction. Despite the longtime studies in video processing, the next frame prediction problem is rarely investigated and it is at its beginning. In next frame prediction, the main goal is to design a model which automatically generates the next frame using a sequence of previous frames. In videos, in most cases, the large portion of the current frame is similar to the previous frames and only a small portion of the frame has a motion field. This leads us to utilize the optic flow field. To do so, Laplacian pyramid of convolutional networks and adversarial learning are used to predict simultaneously the optic flow and the gray content of the next frame. To evaluate the proposed approach, it is applied on UCF101 dataset. The obtained results show that our approach achieves a better performance.
One of the important factors in driving safety at night is headlight. The most common system in t... more One of the important factors in driving safety at night is headlight. The most common system in the market is based on the manual switching between low and high light. In this paper, designed an array of LEDs with low power consumption, small size, longevity and rapid response time that can detect and recognize objects by using deep learning and computer vision in this system.
An Affine Invariant Descriptor for Action Recognition

ArXiv, 2020
This paper considers the task of matching images and sentences by learning a visual-textual embed... more This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations simultaneously to infer image-text similarity. The model learns which pairs are a match (positive) and which ones are a mismatch (negative) using a hinge-based triplet ranking. To learn about the joint representations, we leverage our newly extracted collection of tweets from Twitter. The main characteristic of our dataset is that the images and tweets are not standardized the same as the benchmarks. Furthermore, there can be a higher semantic correlation between the pictures and tweets contrary to benchmarks in which the descriptions are well-organized. Experimental results on MS-COCO benchm...
Incorporation of High Level Information in Images Retrieval

Pattern Analysis and Applications, 2012
The aim of this paper is to introduce a new descriptor for the spatio-temporal volume (STV). Huma... more The aim of this paper is to introduce a new descriptor for the spatio-temporal volume (STV). Human motion is completely represented by STV (action volume) which is constructed over successive frames by stacking human silhouettes in consecutive frames. Action volume comprehensively contains spatial and temporal information about an action. The main contribution of this paper is to propose a new affine invariant action volume descriptor based on a function of spherical harmonic coefficients. This means, it is invariant under rotation, non-uniform scaling and translation. In the 3D shape analysis literature, there have been a few attempts to use coefficients of spherical harmonics to describe a 3D shape. However, those descriptors are not affine invariant and they are only rotation invariant. In addition, the proposed approach employs a parametric form of spherical harmonics that handles genus zero surfaces regardless of whether they are stellar or not. Another contribution of this paper is the way that action volume is constructed. We applied the proposed descriptor to the KTH, Weizmann, IXMAS and Robust datasets and compared the performance of our algorithm to competing methods available in the literature. The results of our experiments show that our method has a comparable performance to the most successful and recent existing algorithms.

Journal of Visual Communication and Image Representation, 2014
In this paper, a new hierarchical approach for object detection is proposed. Object detection met... more In this paper, a new hierarchical approach for object detection is proposed. Object detection methods based on Implicit Shape Model (ISM) efficiently handle deformable objects, occlusions and clutters. The structure of each object in ISM is defined by a spring like graph. We introduce hierarchical ISM in which structure of each object is defined by a hierarchical star graph. Hierarchical ISM has two layers. In the first layer, a set of local ISMs are used to model object parts. In the second layer, structure of parts with respect to the object center is modeled by global ISM. In the proposed approach, the obtained parts for each object category have high discriminative ability. Therefore, our approach does not require a verification stage. We applied the proposed approach to some datasets and compared the performance of our algorithm to comparable methods. The results show that our method has a superior performance.
Uploads
Papers by parvin razzaghi