Papers by Tomonari Masada

Lecture Notes in Computer Science, 2017
This paper proposes a new method for estimating the word probabilities in latent Dirichlet alloca... more This paper proposes a new method for estimating the word probabilities in latent Dirichlet allocation (LDA). LDA uses a Dirichlet distribution as the prior for the per-document topic discrete distributions. While another Dirichlet prior can be introduced for the per-topic word discrete distributions, point estimations may lead to a better evaluation result, e.g. in terms of test perplexity. This paper proposes a method for the point estimation of the per-topic word probabilities in LDA by using multilayer perceptron (MLP). Our point estimation is performed in an online manner by mini-batch gradient ascent. We compared our method to the baseline method using a perceptron with no hidden layers and also to the collapsed Gibbs sampling (CGS). The evaluation experiment showed that the test perplexity of CGS could not be improved in almost all cases. However, there certainly were situations where our method achieved a better perplexity than the baseline. We also discuss a usage of our method as word embedding.
Lecture Notes in Computer Science
This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for... more This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for automatic Tanka composition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score. While a scoring of sequences can also be achieved by using RNN output probabilities, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. The experimental results, where we scored Japanese Tanka poems generated by RNN, show that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected by RNN output probabilities.
GPUを用いた位相限定相関法の高速化(ITS画像処理,映像メディア及び一般)
ITE Technical Report, 2009
In this paper, we propose a clustering method for disambiguating abbreviated author names appeari... more In this paper, we propose a clustering method for disambiguating abbreviated author names appearing in citation data by finding the correct full name for each instance of an abbreviated name. We use the standard naive Bayes mixture model and the two-variable mixture model, which is a newly proposed model having two hidden variables. In the experiment, we have used the DBLP data set and have selected 47 abbreviated author names corresponding to more than or equal to 50 full names for evaluation. The results show that our model
NAOSITE : Nagasaki University ' s Academic Output SITE Title Accelerating collapsed variational bayesian inference for latentdirichlet allocation with nvidia CUDA compatible devices
In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for l... more In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. Therefore, we accelerate CVB inference, an efficient deterministic inference method for LDA, with Nvidia CUDA. In the evaluation experiments, we used a set of 50,000 documents and a set of 10,000 images. We could obtain inference results comparable to sequential CVB inference.

Context-Dependent Token-Wise Variational Autoencoder for Topic Modeling
Current Trends in Web Engineering, 2020
This paper proposes a new variational autoencoder (VAE) for topic models. The variational inferen... more This paper proposes a new variational autoencoder (VAE) for topic models. The variational inference (VI) for Bayesian models ap- proximates the true posterior distribution by maximizing a lower bound of the log marginal likelihood. We can implement VI as VAE by us- ing a neural network, called encoder, and running it over observations to produce the approximate posterior parameters. However, VAE often su ers from latent variable collapse, where the approximate posterior de- generates to the local optima just mimicking the prior due to the over- minimization of the KL-divergence between the approximate posterior and the prior. To address this problem for topic modeling, we propose a new VAE. Since we marginalize out topic probabilities by following the method of Mimno et al., our VAE minimizes a KL-divergence that has not been considered in the existing VAE. Further, we draw samples from the variational posterior for each word token separately. This sampling for Monte-Carlo integration is performed with the Gumbel-softmax trick by using a document-speci c context information. We empirically inves- tigated if our new VAE could mitigate the diculty arising from latent variable collapse. The experimental results showed that our VAE im- proved the existing VAE for a half of the data sets in terms of perplexity or of normalized pairwise mutual information.
This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for... more This paper proposes a method of scoring sequences generated by recurrent neural network (RNN) for automatic Tanka composition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). When many word tokens in a sequence are assigned to the same topic, we give the sequence a high score. While a scoring of sequences can also be achieved by using RNN output probabilities, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. The experimental results, where we scored Japanese Tanka poems generated by RNN, show that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected by RNN output probabilities.
21-4 Development of prototype system for inspection of abnormal lung sounds
文書分類のための代表的な確率論的手法にナイーヴ・ベイズ分類器がある.しかし,ナイーヴ・ベイ ズ分類器は,スムージングと併用して初めて満足な分類精度を与える.さらに,スムージング・パラ メータは,... more 文書分類のための代表的な確率論的手法にナイーヴ・ベイズ分類器がある.しかし,ナイーヴ・ベイ ズ分類器は,スムージングと併用して初めて満足な分類精度を与える.さらに,スムージング・パラ メータは,文書集合の性質に応じて適切に決めなければならない.本論文では,パラメータ・チュー ニングの必要がなく,また,多様な文書集合に対して十分な分類精度を与える効果的な確率論的枠組 みとして,混合ディリクレ分布に注目する.混合ディリクレ分布の応用については,言語処理や画像 処理の分野で多く研究がある.特に,言語処理分野の研究では,現実の文書データを用いた実験も行 われている.だが,評価は,パープレキシティという純粋に理論的な尺度によることが多い.その一 方,テキスト・マイニングや情報検索の分野では,文書分類の評価に,正解ラベルとの照合によって 計算される精度を用いることが多い.本論文では,多言語テキスト・マイニングへの応用を視野に入 れて,英語の 20 newsgroupsデータ・セット,および,韓国語のWebニュース文書を用いて文書分 類の評価実験を行い,混合ディリクレ分布に基づく分類器とナイーヴ・ベイズ分類器の,定性的・定 量的な違いを明らかにする.

We propose a method for supporting query refinement using topical term clusters. First, we propos... more We propose a method for supporting query refinement using topical term clusters. First, we propose a new term weighting method that can extract terms strongly related to a specific topic, because a document set retrieved with an ambiguous query may include divergent topics. Our formulation of term weighting is based on the statistics of term co-occurrence. Then, we generate term clusters using extracted terms, and rerank the documents in the search results by using each term cluster as a query. This clustering procedure is intended to isolate each topic as a set of related terms. In our experiments, we evaluated our term weighting method by checking: 1) whether each of the top-ranked document sets corresponds to one topic; and 2) whether some of the top-ranked document sets cover all the topics included in the synthesized document set. The results of our experiment show our method outperforms the existing term weighting methods MI, KLD, CHI-square and RSV.

In this paper, we propose a clustering method for author name disambiguation in citation data. Mo... more In this paper, we propose a clustering method for author name disambiguation in citation data. Most article citations include first names of authors with their initials. Therefore, we need to disambiguate the abbreviated author names and to find the correct full name for each of them when constructing a bibliographic database. In this paper, we obtain a clustering of citation data, which consist of the three fields, i.e., co-author names, title words, journal or proceeding title words, with the two probabilistic models. The one model is a standard naive Bayes mixture model. For each citation data, we regard the most probable value of the hidden variable as the ID of the cluster to which the data belongs. Then all abbreviated name instances appearing in the same citation data cluster are taken as the abbreviation of the same full name. The other is a newly proposed model, which has two hidden variables. We partition citation data into clusters according to the most probable combinati...

Adversarial Learning for Topic Models
This paper proposes adversarial learning for topic models. Adversarial learning we consider here ... more This paper proposes adversarial learning for topic models. Adversarial learning we consider here is a method of density ratio estimation using a neural network called discriminator. In generative adversarial networks (GANs) we train discriminator for estimating the density ratio between the true data distribution and the generator distribution. Also in variational inference (VI) for Bayesian probabilistic models we can train discriminator for estimating the density ratio between the approximate posterior distribution and the prior distribution. With the adversarial learning in VI we can adopt implicit distribution as an approximate posterior. This paper proposes adversarial learning for latent Dirichlet allocation (LDA) to improve the expressiveness of the approximate posterior. Our experimental results showed that the quality of extracted topics was improved in terms of test perplexity.

Difference between Similars: A Novel Method to Use Topic Models for Sensor Data Analysis
We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. ... more We propose a novel method to use the topics obtained by topic modeling for sensor data analysis. This paper describes a case study where we perform an exploratory data analysis of manufacturing sensor data by using latent Dirichlet allocation (LDA) as a tool to discover remarkable change patterns. Our target is a set of time-series data originating from the sensors installed in a closed factory environment. Each sensor gives a different type of measurement of the same manufacturing process, which is operated repeatedly in a lot-by-lot manner. We first discretize the data based on the histogram of sensor measurements and construct a bag-of-words representation. We then apply LDA to discover change patterns across tens of thousands of lots. When we apply LDA to natural language documents, the resulting topics are widely different from each other because the documents intrinsically show considerable diversity. In contrast, our data, which come from the repeatedly operated manufacturing...

Document Modeling with Implicit Approximate Posterior Distributions
This paper proposes a Bayesian probabilistic document model whose variational inference is achiev... more This paper proposes a Bayesian probabilistic document model whose variational inference is achieved by using an implicit approximate posterior distribution. The proposed model generates a set of documents as follows. First, we draw a noise vector from the multivariate standard normal distribution. Second, we use the noise vector as an input to a neural network to obtain a parameter vector of multinomial distribution. Finally, we draw word counts from the multinomial distribution to generate a document. This generative story is similar to that of NVDM. However, we use an implicit approximate posterior distribution in the variational Bayesian inference for our model. The inference for NVDM is achieved with VAE, which does not use implicit distribution for approximating the posterior. Our main contributions are to provide an example of Bayesian document model whose posterior can be effectively approximated by an implicit distribution and to show that our model is comparable to LDA in t...

2020 International Conference on Information and Communication Technology Convergence (ICTC), 2020
The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system ... more The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system using an end-to-end generative model called Tacotron. We introduce the open-source implementation for Myanmar text-to-speech system with very high natural-sounding. In this paper, there are four main parts: speech corpus creation, data pre-processing, applying end-to-end generative model, and speech synthesis. Firstly, we develop a speech corpus of 8k sentences from a large set of news articles, novel books, daily usages and travel-related expressions for corpus creation. Secondly, we use a syllable segmenter and text normalizer for data pre-processing. Thirdly, we apply end-to-end generative model called Tacotron that synthesizes speech directly from the sequence of text characters. Finally, we use Griffin-Lim algorithm to convert the corresponding text into the output speech. For the subjective evaluation, we compare our synthesized speech output with the original recording speech in b...
Lecture Notes in Computer Science, 2018

Proceedings of the 13th International Conference on Enterprise Information Systems, 2011
This paper provides experimental results showing how we can use maximal substrings as elementary ... more This paper provides experimental results showing how we can use maximal substrings as elementary features in document clustering. We extract maximal substrings, i.e., the substrings each giving a smaller number of occurrences even after adding only one character at its head or tail, from the given document set and represent each document as a bag of maximal substrings after reducing the variety of maximal substrings by a simple frequency-based selection. This extraction can be done in an unsupervised manner. Our experiment aims to compare bag of maximal substrings representation with bag of words representation in document clustering. For clustering documents, we utilize Dirichlet compound multinomials, a Bayesian version of multinomial mixtures, and measure the results by F-score. Our experiment showed that maximal substrings were as effective as words extracted by a dictionary-based morphological analysis for Korean documents. For Chinese documents, maximal substrings were not so effective as words extracted by a supervised segmentation based on conditional random fields. However, one fourth of the clustering results given by bag of maximal substrings representation achieved F-scores better than the mean F-score given by bag of words representation. It can be said that the use of maximal substrings achieved an acceptable performance in document clustering.

International Journal of Advanced Research in Artificial Intelligence, 2016
This paper proposes a new method for query expansion based on bidirectional extraction of phrases... more This paper proposes a new method for query expansion based on bidirectional extraction of phrases as word n-grams from research paper titles. The proposed method aims to extract information relevant to users' needs and interests and thus to provide a useful system for technical paper retrieval. The outcome of proposed method are the trigrams as phrases that can be used for query expansion. First, word trigrams are extracted from research paper titles. Second, a co-occurrence graph of the extracted trigrams is constructed. To construct the co-occurrence graph, the direction of edges is considered in two ways: forward and reverse. In the forward and reverse co-occurrence graphs, the trigrams point to other trigrams appearing after and before them in a paper title, respectively. Third, Jaccard similarity is computed between trigrams as the weight of the graph edge. Fourth, the weighted version of PageRank is applied. Consequently, the following two types of phrases can be obtained as the trigrams associated with the higher PageRank scores. The trigrams of the one type, which are obtained from the forward co-occurrence graph, can form a more specific query when users add a technical word or words before them. Those of the other type, obtained from the reverse co-occurrence graph, can form a more specific query when users add a technical word or words after them. The extraction of phrases is evaluated as additional features in the paper title classification task using SVM. The experimental results show that the classification accuracy is improved than the accuracy achieved when the standard TF-IDF text features are only used. Moreover, the trigrams extracted by the proposed method can be utilized to expand query words in research paper retrieval.

Extraction of proper names from myanmar text using latent dirichlet allocation
2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2016
This paper proposes a method for proper names extraction from Myanmar text by using latent Dirich... more This paper proposes a method for proper names extraction from Myanmar text by using latent Dirichlet allocation (LDA). Our method aims to extract proper names that provide important information on the contents of Myanmar text. Our method consists of two steps. In the first step, we extract topic words from Myanmar news articles by using LDA. In the second step, we make a post-processing, because the resulting topic words contain some noisy words. Our post-processing, first of all, eliminates the topic words whose prefixes are Myanmar digits and suffixes are noun and verb particles. We then remove the duplicate words and discard the topic words that are contained in the existing dictionary. Consequently, we obtain the words as candidate of proper names, namely personal names, geographical names, unique object names, organization names, single event names, and so on. The evaluation is performed both from the subjective and quantitative perspectives. From the subjective perspective, we compare the accuracy of proper names extracted by our method with those extracted by latent semantic indexing (LSI) and rule-based method. It is shown that both LS] and our method can improve the accuracy of those obtained by rule-based method. However, our method can provide more interesting proper names than LSI. From the quantitative perspective, we use the extracted proper names as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by LSI and rule-based method in precision, recall and F-score.

A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
Lecture Notes in Computer Science, 2016
This paper proposes a new inference for the correlated topic model (CTM) [3]. CTM is an extension... more This paper proposes a new inference for the correlated topic model (CTM) [3]. CTM is an extension of LDA [4] for modeling correlations among latent topics. The proposed inference is an instance of the stochastic gradient variational Bayes (SGVB) [7, 8]. By constructing the inference network with the diagonal logistic normal distribution, we achieve a simple inference. Especially, there is no need to invert the covariance matrix explicitly. We performed a comparison with LDA in terms of predictive perplexity. The two inferences for LDA are considered: the collapsed Gibbs sampling (CGS) [5] and the collapsed variational Bayes with a zero-order Taylor expansion approximation (CVB0) [1]. While CVB0 for LDA gave the best result, the proposed inference achieved the perplexities comparable with those of CGS for LDA.
Uploads
Papers by Tomonari Masada