fly51fly / aicoco

“爱可可-爱生活”微博内容精选

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

今日学术视野

fly51fly opened this issue · comments

arXiv论文跟踪

astro-ph.IM - 仪器仪表和天体物理学方法
cond-mat.mtrl-sci - 材料科学
cs.AI - 人工智能
cs.CE - 计算工程、 金融和科学
cs.CL - 计算与语言
cs.CR - 加密与安全
cs.CV - 机器视觉与模式识别
cs.DC - 分布式、并行与集群计算
cs.GR - 计算机图形学
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.OS - 操作系统
cs.RO - 机器人学
cs.SI - 社交网络与信息网络
math.ST - 统计理论
physics.comp-ph - 计算物理学
stat.AP - 应用统计
stat.ME - 统计方法论
stat.ML - (统计)机器学习

• [astro-ph.IM]QuasarNET: Human-level spectral classification and redshifting with Deep Neural Networks
• [cond-mat.mtrl-sci]Fast and accessible first-principles calculations of vibrational properties of materials
• [cs.AI]ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games
• [cs.AI]Modeling OWL with Rules: The ROWL Protege Plugin
• [cs.AI]OWLAx: A Protege Plugin to Support Ontology Axiomatization through Diagramming
• [cs.AI]Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
• [cs.AI]Rule-based OWL Modeling with ROWLTab Protege Plugin
• [cs.CE]Symbolic regression based genetic approximations of the Colebrook equation for flow friction
• [cs.CL]A Quantum Many-body Wave Function Inspired Language Modeling Approach
• [cs.CL]Acquiring Annotated Data with Cross-lingual Explicitation for Implicit Discourse Relation Classification
• [cs.CL]Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
• [cs.CL]Comparative Studies of Detecting Abusive Language on Twitter
• [cs.CL]Correcting Length Bias in Neural Machine Translation
• [cs.CL]Direct Output Connection for a High-Rank Language Model
• [cs.CL]Generalize Symbolic Knowledge With Neural Rule Engine
• [cs.CL]Grammar Induction with Neural Language Models: An Unusual Replication
• [cs.CL]Hard Non-Monotonic Attention for Character-Level Transduction
• [cs.CL]KDSL: a Knowledge-Driven Supervised Learning Framework for Word Sense Disambiguation
• [cs.CL]Learning Neural Templates for Text Generation
• [cs.CL]Learning a Policy for Opportunistic Active Learning
• [cs.CL]Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering
• [cs.CL]Learning to adapt: a meta-learning approach for speaker adaptation
• [cs.CL]Modeling Empathy and Distress in Reaction to News Stories
• [cs.CL]Multi-Source Syntactic Neural Machine Translation
• [cs.CL]Notes on Deep Learning for NLP
• [cs.CL]Pronoun Translation in English-French Machine Translation: An Analysis of Error Types
• [cs.CL]Retrieval-Based Neural Code Generation
• [cs.CL]Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
• [cs.CL]Story Ending Generation with Incremental Encoding and Commonsense Knowledge
• [cs.CL]Towards a Better Metric for Evaluating Question Generation Systems
• [cs.CL]Zero-Shot Adaptive Transfer for Conversational Language Understanding
• [cs.CR]Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
• [cs.CR]VirtualIdentity: Privacy-Preserving User Profiling
• [cs.CV]AAD: Adaptive Anomaly Detection through traffic surveillance videos
• [cs.CV]Artifacts Detection and Error Block Analysis from Broadcasted Videos
• [cs.CV]Automated Scene Flow Data Generation for Training and Verification
• [cs.CV]CNN-PS: CNN-based Photometric Stereo for General Non-Convex Surfaces
• [cs.CV]Deep Chronnectome Learning via Full Bidirectional Long Short-Term Memory Networks for MCI Diagnosis
• [cs.CV]Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
• [cs.CV]Dense Scene Flow from Stereo Disparity and Optical Flow
• [cs.CV]Interpretable Intuitive Physics Model
• [cs.CV]PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
• [cs.CV]Super-Resolution for Hyperspectral and Multispectral Image Fusion Accounting for Seasonal Spectral Variability
• [cs.CV]The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments
• [cs.CV]Towards Effective Deep Embedding for Zero-Shot Learning
• [cs.CV]iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
• [cs.DC]A study of integer sorting on multicores
• [cs.DC]Self-stabilizing Overlays for high-dimensional Monotonic Searchability
• [cs.GR]Differential and integral invariants under Mobius transformation
• [cs.IR]Analyze Unstructured Data Patterns for Conceptual Representation
• [cs.IR]Centroid estimation based on symmetric KL divergence for Multinomial text classification problem
• [cs.IR]Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features
• [cs.IR]Recommendation Through Mixtures of Heterogeneous Item Relationships
• [cs.IR]Understanding Latent Factors Using a GWAP
• [cs.IT]A Radix-M Construction for Complementary Sets
• [cs.IT]Analysis of Frequency Agile Radar via Compressed Sensing
• [cs.IT]Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems
• [cs.IT]Capacity of Locally Recoverable Codes
• [cs.IT]Decentralized Detection with Robust Information Privacy Protection
• [cs.IT]Space-Time Block Coding Based Beamforming for Beam Squint Compensation
• [cs.LG]A Coordinate-Free Construction of Scalable Natural Gradient
• [cs.LG]A Unified Analysis of Stochastic Momentum Methods for Deep Learning
• [cs.LG]DP-ADMM: ADMM-based Distributed Learning with Differential Privacy
• [cs.LG]Gaussian Mixture Generative Adversarial Networks for Diverse Datasets, and the Unsupervised Clustering of Images
• [cs.LG]Group calibration is a byproduct of unconstrained learning
• [cs.LG]IEA: Inner Ensemble Average within a convolutional neural network
• [cs.LG]Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
• [cs.LG]Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator
• [cs.LG]Searching Toward Pareto-Optimal Device-Aware Neural Architectures
• [cs.LG]Semi-Metrification of the Dynamic Time Warping Distance
• [cs.LG]Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds
• [cs.LG]Towards Reproducible Empirical Research in Meta-Learning
• [cs.OS]Profiling and Improving the Duty-Cycling Performance of Linux-based IoT Devices
• [cs.RO]A Variational Feature Encoding Method of 3D Object for Probabilistic Semantic SLAM
• [cs.RO]Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm
• [cs.RO]Configuration Space Singularities of The Delta Manipulator
• [cs.RO]Design of an Autonomous Precision Pollination Robot
• [cs.RO]RoI-based Robotic Grasp Detection in Object Overlapping Scenes Using Convolutional Neural Network
• [cs.RO]Robot_gym: accelerated robot training through simulation in the cloud with ROS and Gazebo
• [cs.SI]Asymptotic analysis of the Friedkin-Johnsen model when the matrix of the susceptibility weights approaches the identity matrix
• [cs.SI]On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook
• [cs.SI]Uncovering intimate and casual relationships from mobile phone communication
• [math.ST]A Divergence Proof for Latuszynski's Counter-Example Approaching Infinity with Probability "Near" One
• [math.ST]Differentially Private Change-Point Detection
• [math.ST]Maximum likelihood estimator and its consistency for an $(L,1)$ random walk in a parametric random environment
• [math.ST]Minimal inference from incomplete 2x2-tables
• [math.ST]Quadratic Discriminant Analysis under Moderate Dimension
• [physics.comp-ph]High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs
• [stat.AP]An Introduction to Inductive Statistical Inference -- from Parameter Estimation to Decision-Making
• [stat.AP]Reducing post-surgery recovery bed occupancy through an analytical prediction model
• [stat.ME]Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA)
• [stat.ME]Adaptative significance levels in normal mean hypothesis testing
• [stat.ME]Optimal shrinkage covariance matrix estimation under random sampling from elliptical distributions
• [stat.ML]Discriminative Learning of Similarity and Group Equivariant Representations
• [stat.ML]Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers
• [stat.ML]Nested multi-instance classification
• [stat.ML]Physically-inspired Gaussian processes for transcriptional regulation in Drosophila melanogaster

·····································

• [astro-ph.IM]QuasarNET: Human-level spectral classification and redshifting with Deep Neural Networks
Nicolas Busca, Christophe Balland
http://arxiv.org/abs/1808.09955v1

We introduce QuasarNET, a deep convolutional neural network that performs classification and redshift estimation of astrophysical spectra with human-expert accuracy. We pose these two tasks as a \emph{feature detection} problem: presence or absence of spectral features determines the class, and their wavelength determines the redshift, very much like human-experts proceed. When ran on BOSS data to identify quasars through their emission lines, QuasarNET defines a sample $99.51\pm0.03$% pure and $99.52\pm0.03$% complete, well above the requirements of many analyses using these data. QuasarNET significantly reduces the problem of line-confusion that induces catastrophic redshift failures to below 0.2%. We also extend QuasarNET to classify spectra with broad absorption line (BAL) features, achieving an accuracy of $98.0\pm0.4$% for recognizing BAL and $97.0\pm0.2$% for rejecting non-BAL quasars. QuasarNET is trained on data of low signal-to-noise and medium resolution, typical of current and future astrophysical surveys, and could be easily applied to classify spectra from current and upcoming surveys such as eBOSS, DESI and 4MOST.

• [cond-mat.mtrl-sci]Fast and accessible first-principles calculations of vibrational properties of materials
Timur Bazhirov, E. X. Abot
http://arxiv.org/abs/1808.10011v1

We present example applications of an approach to first-principles calculations of vibrational properties of materials implemented within the Exabyte.io platform. We deploy models based on the Density Functional Perturbation Theory to extract the phonon dispersion relations and densities of states for an example set of 35 samples and find the results to be in agreement with prior similar calculations. We construct modeling workflows that are both accessible, accurate, and efficient with respect to the human time involved. This is achieved through efficient parallelization of the tasks for the individual vibrational modes. We report achieved speedups in the 10-100 range, approximately, and maximum attainable speedups in the 30-300 range, correspondingly. We analyze the execution times on the current up-to-date computational infrastructure centrally available from a public cloud provider. Results and all associated data, including the materials and simulation workflows, are made available online in an accessible, repeatable and extensible setting.

• [cs.AI]ExpIt-OOS: Towards Learning from Planning in Imperfect Information Games
Andy Kitchen, Michela Benedetti
http://arxiv.org/abs/1808.10120v1

The current state of the art in playing many important perfect information games, including Chess and Go, combines planning and deep reinforcement learning with self-play. We extend this approach to imperfect information games and present ExIt-OOS, a novel approach to playing imperfect information games within the Expert Iteration framework and inspired by AlphaZero. We use Online Outcome Sampling, an online search algorithm for imperfect information games in place of MCTS. While training online, our neural strategy is used to improve the accuracy of playouts in OOS, allowing a learning and planning feedback loop for imperfect information games.

• [cs.AI]Modeling OWL with Rules: The ROWL Protege Plugin
Md. Kamruzzaman Sarker, David Carral, Adila A. Krisnadhi, Pascal Hitzler
http://arxiv.org/abs/1808.10104v1

In our experience, some ontology users find it much easier to convey logical statements using rules rather than OWL (or description logic) axioms. Based on recent theoretical developments on transformations between rules and description logics, we develop ROWL, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule.

• [cs.AI]OWLAx: A Protege Plugin to Support Ontology Axiomatization through Diagramming
Md. Kamruzzaman Sarker, Adila A. Krisnadhi, Pascal Hitzler
http://arxiv.org/abs/1808.10105v1

Once the conceptual overview, in terms of a somewhat informal class diagram, has been designed in the course of engineering an ontology, the process of adding many of the appropriate logical axioms is mostly a routine task. We provide a Protege plugin which supports this task, together with a visual user interface, based on established methods for ontology design pattern modeling.

• [cs.AI]Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark
http://arxiv.org/abs/1808.10012v1

Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted effects of actions in the context of a paragraph can be improved in two ways: (1) by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and (2) by biasing reading with preferences from large-scale corpora (e.g., trees rarely move). Unlike earlier methods, we treat the problem as a neural structured prediction task, allowing hard and soft constraints to steer the model away from unlikely predictions. We show that the new model significantly outperforms earlier systems on a benchmark dataset for procedural text comprehension (+8% relative gain), and that it also avoids some of the nonsensical predictions that earlier systems make.

• [cs.AI]Rule-based OWL Modeling with ROWLTab Protege Plugin
Md. Kamruzzaman Sarker, Adila Krisnadhi, David Carral, Pascal Hitzler
http://arxiv.org/abs/1808.10108v1

It has been argued that it is much easier to convey logical statements using rules rather than OWL (or description logic (DL)) axioms. Based on recent theoretical developments on transformations between rules and DLs, we have developed ROWLTab, a Protege plugin that allows users to enter OWL axioms by way of rules; the plugin then automatically converts these rules into OWL 2 DL axioms if possible, and prompts the user in case such a conversion is not possible without weakening the semantics of the rule. In this paper, we present ROWLTab, together with a user evaluation of its effectiveness compared to entering axioms using the standard Protege interface. Our evaluation shows that modeling with ROWLTab is much quicker than the standard interface, while at the same time, also less prone to errors for hard modeling tasks.

• [cs.CE]Symbolic regression based genetic approximations of the Colebrook equation for flow friction
Pavel Praks, Dejan Brkic
http://arxiv.org/abs/1808.10394v1

Widely used in hydraulics, the Colebrook equation for flow friction relates implicitly to the input parameters; the Reynolds number, and the relative roughness of inner pipe surface, with the output unknown parameter; the flow friction factor. In this paper, a few explicit approximations to the Colebrook equation are generated using the ability of artificial intelligence to make inner patterns to connect input and output parameters in explicit way not knowing their nature or the physical law that connects them, but only knowing raw numbers. The fact that the used genetic programming tool does not know the structure of the Colebrook equation which is based on computationally expensive logarithmic law, is used to obtain better structure of the approximations which is less demanding for calculation but also enough accurate. All generated approximations are with low computational cost because they contain a limited number of logarithmic forms used although for normalization of input parameters or for acceleration, but they are also sufficiently accurate. The relative error regarding the friction factor in best case is up to 0.13% with only two logarithmic forms used. As the second logarithm can be accurately approximated by the Pade approximation, practically the same error is obtained also using only one logarithm.

• [cs.CL]A Quantum Many-body Wave Function Inspired Language Modeling Approach
Peng Zhang, Zhan Su, Lipeng Zhang, Benyou Wang, Dawei Song
http://arxiv.org/abs/1808.09891v2

The recently proposed quantum language model (QLM) aimed at a principled approach to modeling term dependency by applying the quantum probability theory. The latest development for a more effective QLM has adopted word embeddings as a kind of global dependency information and integrated the quantum-inspired idea in a neural network architecture. While these quantum-inspired LMs are theoretically more general and also practically effective, they have two major limitations. First, they have not taken into account the interaction among words with multiple meanings, which is common and important in understanding natural language text. Second, the integration of the quantum-inspired LM with the neural network was mainly for effective training of parameters, yet lacking a theoretical foundation accounting for such integration. To address these two issues, in this paper, we propose a Quantum Many-body Wave Function (QMWF) inspired language modeling approach. The QMWF inspired LM can adopt the tensor product to model the aforesaid interaction among words. It also enables us to reveal the inherent necessity of using Convolutional Neural Network (CNN) in QMWF language modeling. Furthermore, our approach delivers a simple algorithm to represent and match text/sentence pairs. Systematic evaluation shows the effectiveness of the proposed QMWF-LM algorithm, in comparison with the state of the art quantum-inspired LMs and a couple of CNN-based methods, on three typical Question Answering (QA) datasets.

• [cs.CL]Acquiring Annotated Data with Cross-lingual Explicitation for Implicit Discourse Relation Classification
Wei Shi, Frances Yung, Vera Demberg
http://arxiv.org/abs/1808.10290v1

Implicit discourse relation classification is one of the most challenging and important tasks in discourse parsing, due to the lack of connective as strong linguistic cues. A principle bottleneck to further improvement is the shortage of training data (ca.~16k instances in the PDTB). Shi et al. (2017) proposed to acquire additional data by exploiting connectives in translation: human translators mark discourse relations which are implicit in the source language explicitly in the translation. Using back-translations of such explicitated connectives improves discourse relation parsing performance. This paper addresses the open question of whether the choice of the translation language matters, and whether multiple translations into different languages can be effectively used to improve the quality of the additional data.

• [cs.CL]Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Antonio Toral, Sheila Castilho, Ke Hu, Andy Way
http://arxiv.org/abs/1808.10432v1

We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT.

• [cs.CL]Comparative Studies of Detecting Abusive Language on Twitter
Younghun Lee, Seunghyun Yoon, Kyomin Jung
http://arxiv.org/abs/1808.10245v1

The context-dependent nature of online aggression makes annotating large collections of data extremely difficult. Previously studied datasets in abusive language detection have been insufficient in size to efficiently train deep learning models. Recently, Hate and Abusive Speech on Twitter, a dataset much greater in size and reliability, has been released. However, this dataset has not been comprehensively studied to its potential. In this paper, we conduct the first comparative study of various learning models on Hate and Abusive Speech on Twitter, and discuss the possibility of using additional features and context data for improvements. Experimental results show that bidirectional GRU networks trained on word-level features, with Latent Topic Clustering modules, is the most accurate model scoring 0.805 F1.

• [cs.CL]Correcting Length Bias in Neural Machine Translation
Kenton Murray, David Chiang
http://arxiv.org/abs/1808.10006v1

We study two problems in neural machine translation (NMT). First, in beam search, whereas a wider beam should in principle help translation, it often hurts NMT. Second, NMT has a tendency to produce translations that are too short. Here, we argue that these problems are closely related and both rooted in label bias. We show that correcting the brevity problem almost eliminates the beam problem; we compare some commonly-used methods for doing this, finding that a simple per-word reward works well; and we introduce a simple and quick way to tune this reward using the perceptron algorithm.

• [cs.CL]Direct Output Connection for a High-Rank Language Model
Sho Takase, Jun Suzuki, Masaaki Nagata
http://arxiv.org/abs/1808.10143v1

This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab- nlp/doc_lm.

• [cs.CL]Generalize Symbolic Knowledge With Neural Rule Engine
Shen Li, Hengru Xu, Zhengdong Lu
http://arxiv.org/abs/1808.10326v1

Neural-symbolic learning aims to take the advantages of both neural networks and symbolic knowledge to build better intelligent systems. As neural networks have dominated the state-of-the-art results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge. Different from existing works, this paper investigates the combination of these two powerful paradigms from the knowledge-driven side. We propose Neural Rule Engine (NRE), which can learn knowledge explicitly from logic rules and then generalize them implicitly with neural networks. NRE is implemented with neural module networks in which each module represents an action of the logic rule. The experiments show that NRE could greatly improve the generalization abilities of logic rules with a significant increase on recall. Meanwhile, the precision is still maintained at a high level.

• [cs.CL]Grammar Induction with Neural Language Models: An Unusual Replication
Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman
http://arxiv.org/abs/1808.10000v1

A substantial thread of recent work on latent tree learning has attempted to develop neural network models with parse-valued latent variables and train them on non-parsing tasks, in the hope of having them discover interpretable tree structure. In a recent paper, Shen et al. (2018) introduce such a model and report near-state-of-the-art results on the target task of language modeling, and the first strong latent tree learning result on constituency parsing. In an attempt to reproduce these results, we discover issues that make the original results hard to trust, including tuning and even training on what is effectively the test set. Here, we attempt to reproduce these results in a fair experiment and to extend them to two new datasets. We find that the results of this work are robust: All variants of the model under study outperform all latent tree learning baselines, and perform competitively with symbolic grammar induction systems. We find that this model represents the first empirical success for latent tree learning, and that neural network language modeling warrants further study as a setting for grammar induction.

• [cs.CL]Hard Non-Monotonic Attention for Character-Level Transduction
Shijie Wu, Pamela Shapiro, Ryan Cotterell
http://arxiv.org/abs/1808.10024v1

Character-level string-to-string transduction is an important component of various NLP tasks. The goal is to map an input string to an output string, where the strings may be of different lengths and have characters taken from different alphabets. Recent approaches have used sequence-to-sequence models with an attention mechanism to learn which parts of the input string the model should focus on during the generation of the output string. Both soft attention and hard monotonic attention have been used, but hard non-monotonic attention has only been used in other sequence modeling tasks such as image captioning and has required a stochastic approximation to compute the gradient. In this work, we introduce an exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1. We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the stochastic approximation and outperforms soft attention.

• [cs.CL]KDSL: a Knowledge-Driven Supervised Learning Framework for Word Sense Disambiguation
Shi Yin, Yi Zhou, Chenguang Li, Shangfei Wang, Jianmin Ji, Xiaoping Chen, Ruili Wang
http://arxiv.org/abs/1808.09888v2

We propose KDSL, a new word sense disambiguation (WSD) framework that utilizes knowledge to automatically generate sense-labeled data for supervised learning. First, from WordNet, we automatically construct a semantic knowledge base called DisDict, which provides refined feature words that highlight the differences among word senses, i.e., synsets. Second, we automatically generate new sense-labeled data by DisDict from unlabeled corpora. Third, these generated data, together with manually labeled data and unlabeled data, are fed to a neural framework conducting supervised and unsupervised learning jointly to model the semantic relations among synsets, feature words and their contexts. The experimental results show that KDSL outperforms several representative state-of-the-art methods on various major benchmarks. Interestingly, it performs relatively well even when manually labeled data is unavailable, thus provides a new promising backoff strategy for WSD.

• [cs.CL]Learning Neural Templates for Text Generation
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush
http://arxiv.org/abs/1808.10122v1

While neural, encoder-decoder models have had significant empirical success in text generation, there remain several unaddressed problems with this style of generation. Encoder-decoder models are largely (a) uninterpretable, and (b) difficult to control in terms of their phrasing or content. This work proposes a neural generation system using a hidden semi-markov model (HSMM) decoder, which learns latent, discrete templates jointly with learning to generate. We show that this model learns useful templates, and that these templates make generation both more interpretable and controllable. Furthermore, we show that this approach scales to real data sets and achieves strong performance nearing that of encoder-decoder text generation models.

• [cs.CL]Learning a Policy for Opportunistic Active Learning
Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
http://arxiv.org/abs/1808.10009v1

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.

• [cs.CL]Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering
Jianmo Ni, Chenguang Zhu, Weizhu Chen, Julian McAuley
http://arxiv.org/abs/1808.09492v2

Scientific Question Answering (SQA) is a challenging open-domain task which requires the capability to understand questions and choices, collect useful information, and reason over evidence. Previous work typically formulates this task as a reading comprehension or entailment problem given evidence retrieved from search engines. However, existing techniques struggle to retrieve indirectly related evidence when no directly related evidence is provided, especially for complex questions where it is hard to parse precisely what the question asks. In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process. We build 1) an essential-term-aware retriever' which first identifies the most important words in a question, then reformulates the queries and searches for related evidence 2) an enhanced reader' to distinguish between essential terms and distracting words to predict the answer. We experimentally evaluate our model on the ARC dataset where it outperforms the existing state-of-the-art model by 7.4%.

• [cs.CL]Learning to adapt: a meta-learning approach for speaker adaptation
Ondřej Klejch, Joachim Fainberg, Peter Bell
http://arxiv.org/abs/1808.10239v1

The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers. The success of speaker adaptation methods relies on selecting weights that are suitable for adaptation and using good adaptation schedules to update these weights in order not to overfit to the adaptation data. In this paper we investigate a principled way of adapting all the weights of the acoustic model using a meta-learning. We show that the meta-learner can learn to perform supervised and unsupervised speaker adaptation and that it outperforms a strong baseline adapting LHUC parameters when adapting a DNN AM with 1.5M parameters. We also report initial experiments on adapting TDNN AMs, where the meta-learner achieves comparable performance with LHUC.

• [cs.CL]Modeling Empathy and Distress in Reaction to News Stories
Sven Buechel, Anneke Buffone, Barry Slaff, Lyle Ungar, João Sedoc
http://arxiv.org/abs/1808.10399v1

Computational detection and understanding of empathy is an important factor in advancing human-computer interaction. Yet to date, text-based empathy prediction has the following major limitations: It underestimates the psychological complexity of the phenomenon, adheres to a weak notion of ground truth where empathic states are ascribed by third parties, and lacks a shared corpus. In contrast, this contribution presents the first publicly available gold standard for empathy prediction. It is constructed using a novel annotation methodology which reliably captures empathy assessments by the writer of a statement using multi-item scales. This is also the first computational work distinguishing between multiple forms of empathy, empathic concern, and personal distress, as recognized throughout psychology. Finally, we present experimental results for three different predictive models, of which a CNN performs the best.

• [cs.CL]Multi-Source Syntactic Neural Machine Translation
Anna Currey, Kenneth Heafield
http://arxiv.org/abs/1808.10267v1

We introduce a novel multi-source technique for incorporating source syntax into neural machine translation using linearized parses. This is achieved by employing separate encoders for the sequential and parsed versions of the same source sentence; the resulting representations are then combined using a hierarchical attention mechanism. The proposed model improves over both seq2seq and parsed baselines by over 1 BLEU on the WMT17 English-German task. Further analysis shows that our multi-source syntactic model is able to translate successfully without any parsed input, unlike standard parsed methods. In addition, performance does not deteriorate as much on long sentences as for the baselines.

• [cs.CL]Notes on Deep Learning for NLP
Antoine J. -P. Tixier
http://arxiv.org/abs/1808.09772v2

My notes on Deep Learning for NLP.

• [cs.CL]Pronoun Translation in English-French Machine Translation: An Analysis of Error Types
Christian Hardmeier, Liane Guillou
http://arxiv.org/abs/1808.10196v1

Pronouns are a long-standing challenge in machine translation. We present a study of the performance of a range of rule-based, statistical and neural MT systems on pronoun translation based on an extensive manual evaluation using the PROTEST test suite, which enables a fine-grained analysis of different pronoun types and sheds light on the difficulties of the task. We find that the rule-based approaches in our corpus perform poorly as a result of oversimplification, whereas SMT and early NMT systems exhibit significant shortcomings due to a lack of awareness of the functional and referential properties of pronouns. A recent Transformer-based NMT system with cross-sentence context shows very promising results on non-anaphoric pronouns and intra-sentential anaphora, but there is still considerable room for improvement in examples with cross-sentence dependencies.

• [cs.CL]Retrieval-Based Neural Code Generation
Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig
http://arxiv.org/abs/1808.10025v1

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU.

• [cs.CL]Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan
http://arxiv.org/abs/1808.10128v1

Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to allow Tacotron to utilize textual and acoustic knowledge contained in large, publicly-available text and speech corpora. Importantly, these external data are unpaired and potentially noisy. Specifically, first we embed each word in the input text into word vectors and condition the Tacotron encoder on them. We then use an unpaired speech corpus to pre-train the Tacotron decoder in the acoustic domain. Finally, we fine-tune the model using available paired data. We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.

• [cs.CL]Story Ending Generation with Incremental Encoding and Commonsense Knowledge
Jian Guan, Yansen Wang, Minlie Huang
http://arxiv.org/abs/1808.10113v1

Story ending generation is a strong indication of story comprehension. This task requires not only to understand the context clues which plays the most important role in planning the plot, but also to handle implicit knowledge to make a reasonable, coherent story. In this paper, we devise a novel model for story ending generation. The model adopts an incremental encoding scheme with multi-source attention to deal with context clues spanning in the story context. In addition, the model is empowered with commonsense knowledge through multi-source attention to produce reasonable story endings. Experiments show that our model can generate more reasonable story endings than state-of-the-art baselines.

• [cs.CL]Towards a Better Metric for Evaluating Question Generation Systems
Preksha Nema, Mitesh M. Khapra
http://arxiv.org/abs/1808.10192v1

There has always been criticism for using $n$-gram based similarity metrics, such as BLEU, NIST, \textit{etc}, for evaluating the performance of NLG systems. However, these metrics continue to remain popular and are recently being used for evaluating the performance of systems which automatically generate questions from documents, knowledge graphs, images, \textit{etc}. Given the rising interest in such automatic question generation (AQG) systems, it is important to objectively examine whether these metrics are suitable for this task. In particular, it is important to verify whether such metrics used for evaluating AQG systems focus on \textit{answerability} of the generated question by preferring questions which contain all relevant information such as question type (Wh-types), entities, relations, \textit{etc}. In this work, we show that current automatic evaluation metrics based on $n$-gram similarity do not always correlate well with human judgments about \textit{answerability} of a question. To alleviate this problem and as a first step towards better evaluation metrics for AQG, we introduce a scoring function to capture \textit{answerability} and show that when this scoring function is integrated with existing metrics, they correlate significantly better with human judgments. The scripts and data developed as a part of this work are made publicly available at https://github.com/PrekshaNema25/Answerability-Metric.

• [cs.CL]Zero-Shot Adaptive Transfer for Conversational Language Understanding
Sungjin Lee, Rahul Jha
http://arxiv.org/abs/1808.10059v1

Conversational agents such as Alexa and Google Assistant constantly need to increase their language understanding capabilities by adding new domains. A massive amount of labeled data is required for training each new domain. While domain adaptation approaches alleviate the annotation cost, prior approaches suffer from increased training time and suboptimal concept alignments. To tackle this, we introduce a novel Zero-Shot Adaptive Transfer method for slot tagging that utilizes the slot description for transferring reusable concepts across domains, and enjoys efficient training without any explicit concept alignments. Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime.

• [cs.CR]Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
Cong Liao, Haoti Zhong, Anna Squicciarini, Sencun Zhu, David Miller
http://arxiv.org/abs/1808.10307v1

Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a {\em backdoor injection attack}. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above $90%$) at a small cost of model accuracy loss (below $1%$) with a small injection rate (around $1%$), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.

• [cs.CR]VirtualIdentity: Privacy-Preserving User Profiling
Sisi Wang, Wing-Sea Poon, Golnoosh Farnadi, Caleb Horst, Kebra Thompson, Michael Nickels, Rafael Dowsley, Anderson C. A. Nascimento, Martine De Cock
http://arxiv.org/abs/1808.10151v1

User profiling from user generated content (UGC) is a common practice that supports the business models of many social media companies. Existing systems require that the UGC is fully exposed to the module that constructs the user profiles. In this paper we show that it is possible to build user profiles without ever accessing the user's original data, and without exposing the trained machine learning models for user profiling -- which are the intellectual property of the company -- to the users of the social media site. We present VirtualIdentity, an application that uses secure multi-party cryptographic protocols to detect the age, gender and personality traits of users by classifying their user-generated text and personal pictures with trained support vector machine models in a privacy-preserving manner.

• [cs.CV]AAD: Adaptive Anomaly Detection through traffic surveillance videos
Mohammmad Farhadi Bajestani, Seyed Soroush Heidari Rahmat Abadi, Seyed Mostafa Derakhshandeh Fard, Roozbeh Khodadadeh
http://arxiv.org/abs/1808.10044v1

Anomaly detection through video analysis is of great importance to detect any anomalous vehicle/human behavior at a traffic intersection. While most existing works use neural networks and conventional machine learning methods based on provided dataset, we will use object recognition (Faster R-CNN) to identify objects labels and their corresponding location in the video scene as the first step to implement anomaly detection. Then, the optical flow will be utilized to identify adaptive traffic flows in each region of the frame. Basically, we propose an alternative method for unusual activity detection using an adaptive anomaly detection framework. Compared to the baseline method described in the reference paper, our method is more efficient and yields the comparable accuracy.

• [cs.CV]Artifacts Detection and Error Block Analysis from Broadcasted Videos
Md Mehedi Hasan, Tasneem Rahman, Kiok Ahn, Oksam Chae
http://arxiv.org/abs/1808.10086v1

With the advancement of IPTV and HDTV technology, previous subtle errors in videos are now becoming more prominent because of the structure oriented and compression based artifacts. In this paper, we focus towards the development of a real-time video quality check system. Light weighted edge gradient magnitude information is incorporated to acquire the statistical information and the distorted frames are then estimated based on the characteristics of their surrounding frames. Then we apply the prominent texture patterns to classify them in different block errors and analyze them not only in video error detection application but also in error concealment, restoration and retrieval. Finally, evaluating the performance through experiments on prominent datasets and broadcasted videos show that the proposed algorithm is very much efficient to detect errors for video broadcast and surveillance applications in terms of computation time and analysis of distorted frames.

• [cs.CV]Automated Scene Flow Data Generation for Training and Verification
Oliver Wasenmüller, René Schuster, Didier Stricker, Karl Leiss, Jürger Pfister, Oleksandra Ganus, Julian Tatsch, Artem Savkin, Nikolas Brasch
http://arxiv.org/abs/1808.10232v1

Scene flow describes the 3D position as well as the 3D motion of each pixel in an image. Such algorithms are the basis for many state-of-the-art autonomous or automated driving functions. For verification and training large amounts of ground truth data is required, which is not available for real data. In this paper, we demonstrate a technology to create synthetic data with dense and precise scene flow ground truth.

• [cs.CV]CNN-PS: CNN-based Photometric Stereo for General Non-Convex Surfaces
Satoshi Ikehata
http://arxiv.org/abs/1808.10093v1

Most conventional photometric stereo algorithms inversely solve a BRDF-based image formation model. However, the actual imaging process is often far more complex due to the global light transport on the non-convex surfaces. This paper presents a photometric stereo network that directly learns relationships between the photometric stereo input and surface normals of a scene. For handling unordered, arbitrary number of input images, we merge all the input data to the intermediate representation called {\it observation map} that has a fixed shape, is able to be fed into a CNN. To improve both training and prediction, we take into account the rotational pseudo-invariance of the observation map that is derived from the isotropic constraint. For training the network, we create a synthetic photometric stereo dataset that is generated by a physics-based renderer, therefore the global light transport is considered. Our experimental results on both synthetic and real datasets show that our method outperforms conventional BRDF-based photometric stereo algorithms especially when scenes are highly non-convex.

• [cs.CV]Deep Chronnectome Learning via Full Bidirectional Long Short-Term Memory Networks for MCI Diagnosis
Weizheng Yan, Han Zhang, Jing Sui, Dinggang Shen
http://arxiv.org/abs/1808.10383v1

Brain functional connectivity (FC) extracted from resting-state fMRI (RS-fMRI) has become a popular approach for disease diagnosis, where discriminating subjects with mild cognitive impairment (MCI) from normal controls (NC) is still one of the most challenging problems. Dynamic functional connectivity (dFC), consisting of time-varying spatiotemporal dynamics, may characterize "chronnectome" diagnostic information for improving MCI classification. However, most of the current dFC studies are based on detecting discrete major brain status via spatial clustering, which ignores rich spatiotemporal dynamics contained in such chronnectome. We propose Deep Chronnectome Learning for exhaustively mining the comprehensive information, especially the hidden higher-level features, i.e., the dFC time series that may add critical diagnostic power for MCI classification. To this end, we devise a new Fully-connected Bidirectional Long Short-Term Memory Network (Full-BiLSTM) to effectively learn the periodic brain status changes using both past and future information for each brief time segment and then fuse them to form the final output. We have applied our method to a rigorously built large-scale multi-site database (i.e., with 164 data from NCs and 330 from MCIs, which can be further augmented by 25 folds). Our method outperforms other state-of-the-art approaches with an accuracy of 73.6% under solid cross-validations. We also made extensive comparisons among multiple variants of LSTM models. The results suggest high feasibility of our method with promising value also for other brain disorder diagnoses.

• [cs.CV]Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Victor Vaquero, Alberto Sanfeliu, Francesc Moreno-Noguer
http://arxiv.org/abs/1808.09526v2

Perception technologies in Autonomous Driving are experiencing their golden age due to the advances in Deep Learning. Yet, most of these systems rely on the semantically rich information of RGB images. Deep Learning solutions applied to the data of other sensors typically mounted on autonomous cars (e.g. lidars or radars) are not explored much. In this paper we propose a novel solution to understand the dynamics of moving vehicles of the scene from only lidar information. The main challenge of this problem stems from the fact that we need to disambiguate the proprio-motion of the 'observer' vehicle from that of the external 'observed' vehicles. For this purpose, we devise a CNN architecture which at testing time is fed with pairs of consecutive lidar scans. However, in order to properly learn the parameters of this network, during training we introduce a series of so-called pretext tasks which also leverage on image data. These tasks include semantic information about vehicleness and a novel lidar-flow feature which combines standard image-based optical flow with lidar scans. We obtain very promising results and show that including distilled image information only during training, allows improving the inference results of the network at test time, even when image data is no longer used.

• [cs.CV]Dense Scene Flow from Stereo Disparity and Optical Flow
René Schuster, Oliver Wasenmüller, Didier Stricker
http://arxiv.org/abs/1808.10146v1

Scene flow describes 3D motion in a 3D scene. It can either be modeled as a single task, or it can be reconstructed from the auxiliary tasks of stereo depth and optical flow estimation. While the second method can achieve real-time performance by using real-time auxiliary methods, it will typically produce non-dense results. In this representation of a basic combination approach for scene flow estimation, we will tackle the problem of non-density by interpolation.

• [cs.CV]Interpretable Intuitive Physics Model
Tian Ye, Xiaolong Wang, James Davidson, Abhinav Gupta
http://arxiv.org/abs/1808.10002v1

Humans have a remarkable ability to use physical commonsense and predict the effect of collisions. But do they understand the underlying factors? Can they predict if the underlying factors have changed? Interestingly, in most cases humans can predict the effects of similar collisions with different conditions such as changes in mass, friction, etc. It is postulated this is primarily because we learn to model physics with meaningful latent variables. This does not imply we can estimate the precise values of these meaningful variables (estimate exact values of mass or friction). Inspired by this observation, we propose an interpretable intuitive physics model where specific dimensions in the bottleneck layers correspond to different physical properties. In order to demonstrate that our system models these underlying physical properties, we train our model on collisions of different shapes (cube, cone, cylinder, spheres etc.) and test on collisions of unseen combinations of shapes. Furthermore, we demonstrate our model generalizes well even when similar scenes are simulated with different underlying properties.

• [cs.CV]PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors
Haowen Deng, Tolga Birdal, Slobodan Ilic
http://arxiv.org/abs/1808.10322v1

We present PPF-FoldNet for unsupervised learning of 3D local descriptors on pure point cloud geometry. Based on the folding-based auto-encoding of well known point pair features, PPF-FoldNet offers many desirable properties: it necessitates neither supervision, nor a sensitive local reference frame, benefits from point-set sparsity, is end-to-end, fast, and can extract powerful rotation invariant descriptors. Thanks to a novel feature visualization, its evolution can be monitored to provide interpretable insights. Our extensive experiments demonstrate that despite having six degree-of-freedom invariance and lack of training labels, our network achieves state of the art results in standard benchmark datasets and outperforms its competitors when rotations and varying point densities are present. PPF-FoldNet achieves $9%$ higher recall on standard benchmarks, $23%$ higher recall when rotations are introduced into the same datasets and finally, a margin of $&gt;35%$ is attained when point density is significantly decreased.

• [cs.CV]Super-Resolution for Hyperspectral and Multispectral Image Fusion Accounting for Seasonal Spectral Variability
Ricardo Augusto Borsoi, Tales Imbiriba, José Carlos Moreira Bermudez
http://arxiv.org/abs/1808.10072v1

Image fusion combines data from different heterogeneous sources to obtain more precise information about an underlying scene. Hyperspectral-multispectral (HS-MS) image fusion is currently attracting great interest in remote sensing since it allows the generation of high spatial resolution HS images, circumventing the main limitation of this imaging modality. Existing HS-MS fusion algorithms, however, neglect the spectral variability often existing between images acquired at different time instants. This time difference causes variations in spectral signatures of the underlying constituent materials due to different acquisition and seasonal conditions. This paper introduces a novel HS-MS image fusion strategy that combines an unmixing-based formulation with an explicit parametric model for typical spectral variability between the two images. Simulations with synthetic and real data show that the proposed strategy leads to a significant performance improvement under spectral variability and state-of-the-art performance otherwise.

• [cs.CV]The Impact of Preprocessing on Deep Representations for Iris Recognition on Unconstrained Environments
Luiz A. Zanlorensi, Eduardo Luz, Rayson Laroca, Alceu S. Britto Jr., Luiz S. Oliveira, David Menotti
http://arxiv.org/abs/1808.10032v1

The use of iris as a biometric trait is widely used because of its high level of distinction and uniqueness. Nowadays, one of the major research challenges relies on the recognition of iris images obtained in visible spectrum under unconstrained environments. In this scenario, the acquired iris are affected by capture distance, rotation, blur, motion blur, low contrast and specular reflection, creating noises that disturb the iris recognition systems. Besides delineating the iris region, usually preprocessing techniques such as normalization and segmentation of noisy iris images are employed to minimize these problems. But these techniques inevitably run into some errors. In this context, we propose the use of deep representations, more specifically, architectures based on VGG and ResNet-50 networks, for dealing with the images using (and not) iris segmentation and normalization. We use transfer learning from the face domain and also propose a specific data augmentation technique for iris images. Our results show that the approach using non-normalized and only circle-delimited iris images reaches a new state of the art in the official protocol of the NICE.II competition, a subset of the UBIRIS database, one of the most challenging databases on unconstrained environments, reporting an average Equal Error Rate (EER) of 13.98% which represents an absolute reduction of about 5%.

• [cs.CV]Towards Effective Deep Embedding for Zero-Shot Learning
Lei Zhang, Peng Wang, Lingqiao Liu, Chunhua Shen, Wei Wei, Yannning Zhang, Anton Van Den Hengel
http://arxiv.org/abs/1808.10075v1

Zero-shot learning (ZSL) attempts to recognize visual samples of unseen classes by virtue of the semantic descriptions of those classes. We posit that the key to ZSL is to exploit an effective embedding space where 1) visual samples can be tightly centred around the semantic descriptions of classes that they belong to; 2) visual samples of different classes are separated from each other with a large enough margin. Towards this goal, we present a simple but surprisingly effective deep embedding model. In our model, we separately embed visual samples and semantic descriptions into a latent intermediate space such that visual samples not only coincide with associated semantic descriptions, but also can be correctly discriminated by a trainable linear classifier. By doing this, visual samples can be tightly centred around associated semantic descriptions and more importantly, they can be separated from other semantic descriptions with a large margin, thus leading to a new state-of-the-art for ZSL. Furthermore, due to lacking training samples, the generalization capacity of the learned embedding space to unseen classes can be further improved. To this end, we propose to upgrade our model with a refining strategy which progressively calibrates the embedding space based upon some test samples chosen from unseen classes with high-confidence pseudo labels, and ultimately improves the generalization capacity greatly. Experimental results on five benchmarks demonstrate the great advantage of our model over current state-of-the-art competitors. For example, on AwA1 dataset, our model improves the recognition accuracy on unseen classes by 16.9% in conventional ZSL setting and even by 38.6% in the generalized ZSL setting.

• [cs.CV]iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
Chen Gao, Yuliang Zou, Jia-Bin Huang
http://arxiv.org/abs/1808.10437v1

Recent years have witnessed rapid progress in detecting and recognizing individual object instances. To understand the situation in a scene, however, computers need to recognize how humans interact with surrounding objects. In this paper, we tackle the challenging task of detecting human-object interactions (HOI). Our core idea is that the appearance of a person or an object instance contains informative cues on which relevant parts of an image to attend to for facilitating interaction prediction. To exploit these cues, we propose an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance. Such an attention-based network allows us to selectively aggregate features relevant for recognizing HOIs. We validate the efficacy of the proposed network on the Verb in COCO and HICO-DET datasets and show that our approach compares favorably with the state-of-the-arts.

• [cs.DC]A study of integer sorting on multicores
Alexandros V. Gerbessiotis
http://arxiv.org/abs/1808.10292v1

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher's bitonic sorting algorithm. In this work we present an experimental study of integer sorting on multicore processors. We have implemented serial and parallel radix-sort for various radixes, deterministic regular oversampling and random oversampling parallel sorting, and also some previously little explored or unexplored variants of bitonic-sort and odd-even transposition sort. The study uses multithreading and multiprocessing parallel programming libraries with the C language implementations working under Open MPI, MulticoreBSP, and BSPlib utilizing the same source code. A secondary objective is to attempt to model the performance of these algorithm implementations under the MBSP (Multi-memory BSP) model. We first provide some general high-level observations on the performance of these implementations. If we can conclude anything is that accurate prediction of performance by taking into consideration architecture dependent features such as the structure and characteristics of multiple memory hierarchies is difficult and more often than not untenable. To some degree this is affected by the overhead imposed by the high-level library used in the programming effort. We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable.

• [cs.DC]Self-stabilizing Overlays for high-dimensional Monotonic Searchability
Michael Feldmann, Christina Kolb, Christian Scheideler
http://arxiv.org/abs/1808.10300v1

We extend the concept of monotonic searchability for self-stabilizing systems from one to multiple dimensions. A system is self-stabilizing if it can recover to a legitimate state from any initial illegal state. These kind of systems are most often used in distributed applications. Monotonic searchability provides guarantees when searching for nodes while the recovery process is going on. More precisely, if a search request started at some node $u$ succeeds in reaching its destination $v$, then all future search requests from $u$ to $v$ succeed as well. Although there already exists a self-stabilizing protocol for a two-dimensional topology and an universal approach for monotonic searchability, it is not clear how both of these concepts fit together effectively. The latter concept even comes with some restrictive assumptions on messages, which is not the case for our protocol. We propose a simple novel protocol for a self-stabilizing two-dimensional quadtree that satisfies monotonic searchability. Our protocol can easily be extended to higher dimensions and offers routing in $\mathcal O(\log n)$ hops for any search request.

• [cs.GR]Differential and integral invariants under Mobius transformation
He Zhang, Hanlin Mo, You Hao, Qi Li, Hua Li
http://arxiv.org/abs/1808.10083v1

One of the most challenging problems in the domain of 2-D image or 3-D shape is to handle the non-rigid deformation. From the perspective of transformation groups, the conformal transformation is a key part of the diffeomorphism. According to the Liouville Theorem, an important part of the conformal transformation is the Mobius transformation, so we focus on Mobius transformation and propose two differential expressions that are invariable under 2-D and 3-D Mobius transformation respectively. Next, we analyze the absoluteness and relativity of invariance on them and their components. After that, we propose integral invariants under Mobius transformation based on the two differential expressions. Finally, we propose a conjecture about the structure of differential invariants under conformal transformation according to our observation on the composition of the above two differential invariants.

• [cs.IR]Analyze Unstructured Data Patterns for Conceptual Representation
Aboubakr Aqle, Dena Al-Thani, Ali Jaoua
http://arxiv.org/abs/1808.10259v1

Online news media provides aggregated news and stories from different sources all over the world and up-to-date news coverage. The main goal of this study is to have a solution that considered as a homogeneous source for the news and to represent the news in a new conceptual framework. Furthermore, the user can easily find different updated news in a fast way through the designed interface. The Mobile App implementation is based on modeling the multi-level conceptual analysis discipline. Discovering main concepts of any domain is captured from the hidden unstructured data that are analyzed by the proposed solution. Concepts are discovered through analyzing data patterns to be structured into a tree-based interface for easy navigation for the end user, through the discovered news concepts. Our final experiment results showing that analyzing the news before displaying to the end-user and restructuring the final output in a conceptual multilevel structure, that producing new display frame for the end user to find the related information to his interest.

• [cs.IR]Centroid estimation based on symmetric KL divergence for Multinomial text classification problem
Jiangning Chen, Heinrich Matzinger, Haoyan Zhai, Mi Zhou
http://arxiv.org/abs/1808.10261v1

We define a new method to estimate centroid for text classification based on the symmetric KL-divergence between the distribution of words in training documents and their class centroids. Experiments on several standard data sets indicate that the new method achieves substantial improvements over the traditional classifiers.

• [cs.IR]Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio Features
Albin Andrew Correya, Romain Hennequin, Mickaël Arcos
http://arxiv.org/abs/1808.10351v1

Cover song detection is a very relevant task in Music Information Retrieval (MIR) studies and has been mainly addressed using audio-based systems. Despite its potential impact in industrial contexts, low performances and lack of scalability have prevented such systems from being adopted in practice for large applications. In this work, we investigate whether textual music information (such as metadata and lyrics) can be used along with audio for large-scale cover identification problem in a wide digital music library. We benchmark this problem using standard text and state of the art audio similarity measures. Our studies shows that these methods can significantly increase the accuracy and scalability of cover detection systems on Million Song Dataset (MSD) and Second Hand Song (SHS) datasets. By only leveraging standard tf-idf based text similarity measures on song titles and lyrics, we achieved 35.5% of absolute increase in mean average precision compared to the current scalable audio content-based state of the art methods on MSD. These experimental results suggests that new methodologies can be encouraged among researchers to leverage and identify more sophisticated NLP-based techniques to improve current cover song identification systems in digital music libraries with metadata.

• [cs.IR]Recommendation Through Mixtures of Heterogeneous Item Relationships
Wang-Cheng Kang, Mengting Wan, Julian McAuley
http://arxiv.org/abs/1808.10031v1

Recommender Systems have proliferated as general-purpose approaches to model a wide variety of consumer interaction data. Specific instances make use of signals ranging from user feedback, item relationships, geographic locality, social influence (etc.). Typically, research proceeds by showing that making use of a specific signal (within a carefully designed model) allows for higher-fidelity recommendations on a particular dataset. Of course, the real situation is more nuanced, in which a combination of many signals may be at play, or favored in different proportion by individual users. Here we seek to develop a framework that is capable of combining such heterogeneous item relationships by simultaneously modeling (a) what modality of recommendation is a user likely to be susceptible to at a particular point in time; and (b) what is the best recommendation from each modality. Our method borrows ideas from mixtures-of-experts approaches as well as knowledge graph embeddings. We find that our approach naturally yields more accurate recommendations than alternatives, while also providing intuitive `explanations' behind the recommendations it provides.

• [cs.IR]Understanding Latent Factors Using a GWAP
Johannes Kunkel, Benedikt Loepp, Jürgen Ziegler
http://arxiv.org/abs/1808.10260v1

Recommender systems relying on latent factor models often appear as black boxes to their users. Semantic descriptions for the factors might help to mitigate this problem. Achieving this automatically is, however, a non-straightforward task due to the models' statistical nature. We present an output-agreement game that represents factors by means of sample items and motivates players to create such descriptions. A user study shows that the collected output actually reflects real-world characteristics of the factors.

• [cs.IT]A Radix-M Construction for Complementary Sets
Srdjan Z. Budisin
http://arxiv.org/abs/1808.10400v1

We extend the paraunitary (PU) theory for complementary pairs to comple- mentary sets and complete complementary codes (CCC) by proposing a new PU construction. A special, but very important case of complementary sets (and CC- C), based on standard delays, is analyzed in details and a new 'Radix-M generator' (RM-G) is presented. The RM-G can be viewed as a generalization of the Boolean generator for complementary pairs. An efficient correlator for standard complemen- tary sets and CCC is also presented. Finally, examples of polyphase, QAM and hexagonal PU sets of three sequences are given.

• [cs.IT]Analysis of Frequency Agile Radar via Compressed Sensing
Tianyao Huang, Yimin Liu, Xingyu Xu, Yonina C. Eldar, Xiqin Wang
http://arxiv.org/abs/1808.09124v2

Frequency agile radar (FAR) is known to have excellent electronic counter-countermeasures (ECCM) performance and the potential to realize spectrum sharing in dense electromagnetic environments. Many compressed sensing (CS) based algorithms have been developed for joint range and Doppler estimation in FAR. This paper considers theoretical analysis of FAR via CS algorithms. In particular, we analyze the properties of the sensing matrix, which is a highly structured random matrix. We then derive bounds on the number of recoverable targets. Numerical simulations and field experiments validate the theoretical findings and demonstrate the effectiveness of CS approaches to FAR.

• [cs.IT]Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems
Mladen Kovačević, Vincent Y. F. Tan
http://arxiv.org/abs/1808.10328v1

A (tandem) duplication of length $ k $ is an insertion of an exact copy of a substring of length $ k $ next to its original position. This and related types of impairments are of relevance in modeling communication in the presence of synchronization errors, as well as in several information storage applications. We demonstrate that Levenshtein's construction of binary codes correcting insertions of zeros is, with minor modifications, applicable also to channels with arbitrary alphabets and with duplication errors of arbitrary (but fixed) length $ k $. Furthermore, we derive bounds on the cardinality of optimal $ q $-ary codes correcting up to $ t $ duplications of length $ k $, and establish the following corollaries in the asymptotic regime of growing block-length: 1.) the presented family of codes is optimal for every $ q, t, k $, in the sense of the asymptotic scaling of code redundancy; 2.) the upper bound, when specialized to $ q = 2 $, $ k = 1 $, improves upon Levenshtein's bound for every $ t \geq 3 $; 3.) the bounds coincide for $ t = 1 $, thus yielding the exact asymptotic behavior of the size of optimal single-duplication-correcting codes.

• [cs.IT]Capacity of Locally Recoverable Codes
Arya Mazumdar
http://arxiv.org/abs/1808.10262v1

Motivated by applications in distributed storage, the notion of a locally recoverable code (LRC) was introduced a few years back. In an LRC, any coordinate of a codeword is recoverable by accessing only a small number of other coordinates. While different properties of LRCs have been well-studied, their performance on channels with random erasures or errors has been mostly unexplored. In this note, we analyze the performance of LRCs over such stochastic channels. In particular, for input-symmetric discrete memoryless channels, we give a tight characterization of the gap to Shannon capacity when LRCs are used over the channel.

• [cs.IT]Decentralized Detection with Robust Information Privacy Protection
Meng Sun, Wee Peng Tay
http://arxiv.org/abs/1808.10082v1

We consider a decentralized detection network whose aim is to infer a public hypothesis of interest. However, the raw sensor observations also allow the fusion center to infer private hypotheses that we wish to protect. We consider the case where there are an uncountable number of private hypotheses belonging to an uncertainty set, and develop local privacy mappings at every sensor so that the sanitized sensor information minimizes the Bayes error of detecting the public hypothesis at the fusion center, while achieving information privacy for all private hypotheses. We introduce the concept of a most favorable hypothesis (MFH) and show how to find a MFH in the set of private hypotheses. By protecting the information privacy of the MFH, information privacy for every other private hypothesis is also achieved. We provide an iterative algorithm to find the optimal local privacy mappings, and derive some theoretical properties of these privacy mappings. Simulation results demonstrate that our proposed approach allows the fusion center to infer the public hypothesis with low error while protecting information privacy of all the private hypotheses.

• [cs.IT]Space-Time Block Coding Based Beamforming for Beam Squint Compensation
Ximei Liu, Deli Qiao
http://arxiv.org/abs/1808.10117v1

In this paper, the beam squint problem, which causes significant variations in radiated beam gain over frequencies in millimeter wave communication system, is investigated. A constant modulus beamformer design, which is formulated to maximize the expected average beam gain within the bandwidth with limited variation over frequencies within the bandwidth, is proposed. A semidefinite relaxation (SDR) method is developed to solve the optimization problem under the constant modulus constraints. Depending on the eigenvalues of the optimal solution, either direct beamforming or transmit diversity based beamforming is employed for data transmissions. Through numerical results, the proposed transmission scheme can compensate for beam squint effectively and improve system throughput. Overall, a transmission scheme for beam squint compensation in wideband wireless communication systems is provided.

• [cs.LG]A Coordinate-Free Construction of Scalable Natural Gradient
Kevin Luk, Roger Grosse
http://arxiv.org/abs/1808.10340v1

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC applied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.

• [cs.LG]A Unified Analysis of Stochastic Momentum Methods for Deep Learning
Yan Yan, Tianbao Yang, Zhe Li, Qihang Lin, Yi Yang
http://arxiv.org/abs/1808.10396v1

Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

• [cs.LG]DP-ADMM: ADMM-based Distributed Learning with Differential Privacy
Zonghao Huang, Rui Hu, Yanmin Gong, Eric Chan-Tin
http://arxiv.org/abs/1808.10101v1

Distributed machine learning is making great changes in a wide variety of domains but also brings privacy risk from the exchanged information during the learning process. This paper focuses on a class of regularized empirical risk minimization problems, and develops a privacy-preserving distributed learning algorithm. We use Alternating Direction Method of Multipliers (ADMM) to decentralize the learning algorithm, and apply Gaussian mechanisms locally to guarantee differential privacy. However, simply combining ADMM and local randomization mechanisms would result in an unconvergent algorithm with bad performance, especially when the introduced noise is large to guarantee a low total privacy loss. Besides, this approach cannot be applied to the learning problems with non-smooth objective functions. To figure out these concerns, we propose an improved ADMM-based differentially private distributed learning algorithm: DP-ADMM, where an approximate augmented Lagrangian function and Gaussian mechanisms with time-varying variance are utilized. We also apply the moment accountant method to bound the total privacy loss. Our theoretical analysis proves that DP-ADMM can be applied to a general class of convex learning problems, provides differential privacy guarantee, and achieves an $O(1/\sqrt{t})$ rate of convergence, where $t$ is the number of iterations. Our evaluations demonstrate that our approach can achieve good accuracy and effectiveness even with a low total privacy leakage.

• [cs.LG]Gaussian Mixture Generative Adversarial Networks for Diverse Datasets, and the Unsupervised Clustering of Images
Matan Ben-Yosef, Daphna Weinshall
http://arxiv.org/abs/1808.10356v1

Generative Adversarial Networks (GANs) have been shown to produce realistically looking synthetic images with remarkable success, yet their performance seems less impressive when the training set is highly diverse. In order to provide a better fit to the target data distribution when the dataset includes many different classes, we propose a variant of the basic GAN model, called Gaussian Mixture GAN (GM-GAN), where the probability distribution over the latent space is a mixture of Gaussians. We also propose a supervised variant which is capable of conditional sample synthesis. In order to evaluate the model's performance, we propose a new scoring method which separately takes into account two (typically conflicting) measures - diversity vs. quality of the generated data. Through a series of empirical experiments, using both synthetic and real-world datasets, we quantitatively show that GM-GANs outperform baselines, both when evaluated using the commonly used Inception Score, and when evaluated using our own alternative scoring method. In addition, we qualitatively demonstrate how the \textit{unsupervised} variant of GM-GAN tends to map latent vectors sampled from different Gaussians in the latent space to samples of different classes in the data space. We show how this phenomenon can be exploited for the task of unsupervised clustering, and provide quantitative evaluation showing the superiority of our method for the unsupervised clustering of image datasets. Finally, we demonstrate a feature which further sets our model apart from other GAN models: the option to control the quality-diversity trade-off by altering, post-training, the probability distribution of the latent space. This allows one to sample higher quality and lower diversity samples, or vice versa, according to one's needs.

• [cs.LG]Group calibration is a byproduct of unconstrained learning
Lydia T. Liu, Max Simchowitz, Moritz Hardt
http://arxiv.org/abs/1808.10013v1

Much recent work on fairness in machine learning has focused on how well a score function is calibrated in different groups within a given population, where each group is defined by restricting one or more sensitive attributes. We investigate to which extent group calibration follows from unconstrained empirical risk minimization on its own, without the need for any explicit intervention. We show that under reasonable conditions, the deviation from satisfying group calibration is bounded by the excess loss of the empirical risk minimizer relative to the Bayes optimal score function. As a corollary, it follows that empirical risk minimization can simultaneously achieve calibration for many groups, a task that prior work deferred to highly complex algorithms. We complement our results with a lower bound, and a range of experimental findings. Our results challenge the view that group calibration necessitates an active intervention, suggesting that often we ought to think of it as a byproduct of unconstrained machine learning.

• [cs.LG]IEA: Inner Ensemble Average within a convolutional neural network
Abduallah A. Mohamed, Christian Claudel
http://arxiv.org/abs/1808.10350v1

Ensemble learning is a method of combining multiple trained models to improve the model accuracy. We introduce the usage of such methods, specifically ensemble average inside Convolutional Neural Networks (CNNs) architectures. By Inner Average Ensemble (IEA) of multiple convolutional neural layers (CNLs) replacing the single CNLs inside the CNN architecture, the accuracy of the CNN increased. A visual and a similarity score analysis of the features generated from IEA explains why it boosts the model performance. Empirical results using different benchmarking datasets and well-known deep model architectures shows that IEA outperforms the ordinary CNL used in CNNs.

• [cs.LG]Learning End-to-end Autonomous Driving using Guided Auxiliary Supervision
Ashish Mehta, Adithya Subramanian, Anbumani Subramanian
http://arxiv.org/abs/1808.10393v1

Learning to drive faithfully in highly stochastic urban settings remains an open problem. To that end, we propose a Multi-task Learning from Demonstration (MT-LfD) framework which uses supervised auxiliary task prediction to guide the main task of predicting the driving commands. Our framework involves an end-to-end trainable network for imitating the expert demonstrator's driving commands. The network intermediately predicts visual affordances and action primitives through direct supervision which provide the aforementioned auxiliary supervised guidance. We demonstrate that such joint learning and supervised guidance facilitates hierarchical task decomposition, assisting the agent to learn faster, achieve better driving performance and increases transparency of the otherwise black-box end-to-end network. We run our experiments to validate the MT-LfD framework in CARLA, an open-source urban driving simulator. We introduce multiple non-player agents in CARLA and induce temporal noise in them for realistic stochasticity.

• [cs.LG]Rational Neural Networks for Approximating Jump Discontinuities of Graph Convolution Operator
Zhiqian Chen, Feng Chen, Rongjie Lai, Xuchao Zhang, Chang-Tien Lu
http://arxiv.org/abs/1808.10073v1

For node level graph encoding, a recent important state-of-art method is the graph convolutional networks (GCN), which nicely integrate local vertex features and graph topology in the spectral domain. However, current studies suffer from several drawbacks: (1) graph CNNs relies on Chebyshev polynomial approximation which results in oscillatory approximation at jump discontinuities; (2) Increasing the order of Chebyshev polynomial can reduce the oscillations issue, but also incurs unaffordable computational cost; (3) Chebyshev polynomials require degree $\Omega$(poly(1/$\epsilon$)) to approximate a jump signal such as $|x|$, while rational function only needs $\mathcal{O}$(poly log(1/$\epsilon$))\cite{liang2016deep,telgarsky2017neural}. However, it's non-trivial to apply rational approximation without increasing computational complexity due to the denominator. In this paper, the superiority of rational approximation is exploited for graph signal recovering. RatioanlNet is proposed to integrate rational function and neural networks. We show that rational function of eigenvalues can be rewritten as a function of graph Laplacian, which can avoid multiplication by the eigenvector matrix. Focusing on the analysis of approximation on graph convolution operation, a graph signal regression task is formulated. Under graph signal regression task, its time complexity can be significantly reduced by graph Fourier transform. To overcome the local minimum problem of neural networks model, a relaxed Remez algorithm is utilized to initialize the weight parameters. Convergence rate of RatioanlNet and polynomial based methods on jump signal is analyzed for a theoretical guarantee. The extensive experimental results demonstrated that our approach could effectively characterize the jump discontinuities, outperforming competing methods by a substantial margin on both synthetic and real-world graphs.

• [cs.LG]Searching Toward Pareto-Optimal Device-Aware Neural Architectures
An-Chieh Cheng, Jin-Dong Dong, Chi-Hung Hsu, Shu-Huan Chang, Min Sun, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
http://arxiv.org/abs/1808.09830v2

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.

• [cs.LG]Semi-Metrification of the Dynamic Time Warping Distance
Brijnesh J. Jain
http://arxiv.org/abs/1808.09964v1

The dynamic time warping (dtw) distance fails to satisfy the triangle inequality and the identity of indiscernibles. As a consequence, the dtw-distance is not warping-invariant, which in turn results in peculiarities in data mining applications. This article converts the dtw-distance to a semi-metric and shows that its canonical extension is warping-invariant. Empirical results indicate that the nearest-neighbor classifier in the proposed semi-metric space performs comparable to the same classifier in the standard dtw-space. To overcome the undesirable peculiarities of dtw-spaces, this result suggest to further explore the semi-metric space for data mining applications.

• [cs.LG]Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds
Xiaohan Chen, Jialin Liu, Zhangyang Wang, Wotao Yin
http://arxiv.org/abs/1808.10038v1

In recent years, unfolding iterative algorithms as neural networks has become an empirical success in solving sparse recovery problems. However, its theoretical understanding is still immature, which prevents us from fully utilizing the power of neural networks. In this work, we study unfolded ISTA (Iterative Shrinkage Thresholding Algorithm) for sparse signal recovery. We introduce a weight structure that is necessary for asymptotic convergence to the true sparse signal. With this structure, unfolded ISTA can attain a linear convergence, which is better than the sublinear convergence of ISTA/FISTA in general cases. Furthermore, we propose to incorporate thresholding in the network to perform support selection, which is easy to implement and able to boost the convergence rate both theoretically and empirically. Extensive simulations, including sparse vector recovery and a compressive sensing experiment on real image data, corroborate our theoretical results and demonstrate their practical usefulness.

• [cs.LG]Towards Reproducible Empirical Research in Meta-Learning
Adriano Rivolli, Luís P. F. Garcia, Carlos Soares, Joaquin Vanschoren, André C. P. L. F. de Carvalho
http://arxiv.org/abs/1808.10406v1

Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in a large number of studies, meta-features are not uniformly described and computed, making many empirical studies irreproducible and hard to compare. This paper aims to remedy this by systematizing and standardizing data characterization measures used in meta-learning, and performing an in-depth analysis of their utility. Moreover, it presents MFE, a new tool for extracting meta-features from datasets and identify more subtle reproducibility issues in the literature, proposing guidelines for data characterization that strengthen reproducible empirical research in meta-learning.

• [cs.OS]Profiling and Improving the Duty-Cycling Performance of Linux-based IoT Devices
Immanuel Amirtharaj, Tai Groot, Behnam Dezfouli
http://arxiv.org/abs/1808.10097v1

Minimizing the energy consumption of Linux-based devices is an essential step towards their wide deployment in various IoT scenarios. Energy saving methods such as duty-cycling aim to address this constraint by limiting the amount of time the device is powered on. In this work we study and improve the amount of time a Linux-based IoT device is powered on to accomplish its tasks. We analyze the processes of system boot up and shutdown on two platforms, the Raspberry Pi 3 and Zero Wireless, and enhance duty-cycling performance by identifying and disabling time consuming or unnecessary units initialized in the userspace. We also study whether SD card speed and SD card capacity utilization affect boot up duration and energy consumption. In addition, we propose Pallex, a parallel execution framework built on top of the \texttt{systemd init} system to run a user application concurrently with userspace initialization. We validate the performance impact of Pallex when applied to various IoT application scenarios: (i) capturing an image, (ii) capturing and encrypting an image, (iii) capturing and classifying an image using the the k-nearest neighbor algorithm, and (iv) capturing images and sending them to a cloud server. Our results show that system lifetime is increased by 18.3%, 16.8%, 13.9% and 30.2%, for these application scenarios, respectively.

• [cs.RO]A Variational Feature Encoding Method of 3D Object for Probabilistic Semantic SLAM
H. W. Yu, B. H. Lee
http://arxiv.org/abs/1808.10180v1

This paper presents a feature encoding method of complex 3D objects for high-level semantic features. Recent approaches to object recognition methods become important for semantic simultaneous localization and mapping (SLAM). However, there is a lack of consideration of the probabilistic observation model for 3D objects, as the shape of a 3D object basically follows a complex probability distribution. Furthermore, since the mobile robot equipped with a range sensor observes only a single view, much information of the object shape is discarded. These limitations are the major obstacles to semantic SLAM and view-independent loop closure using 3D object shapes as features. In order to enable the numerical analysis for the Bayesian inference, we approximate the true observation model of 3D objects to tractable distributions. Since the observation likelihood can be obtained from the generative model, we formulate the true generative model for 3D object with the Bayesian networks. To capture these complex distributions, we apply a variational auto-encoder. To analyze the approximated distributions and encoded features, we perform classification with maximum likelihood estimation and shape retrieval.

• [cs.RO]Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm
Fan Zhu, Lin Ma, Xin Xu, Dingfeng Guo, Xiao Cui, Qi Kong
http://arxiv.org/abs/1808.10134v1

For any autonomous driving vehicle, control module determines its road performance and safety, i.e. its precision and stability should stay within a carefully-designed range. Nonetheless, control algorithms require vehicle dynamics (such as longitudinal dynamics) as inputs, which, unfortunately, are obscure to calibrate in real time. As a result, to achieve reasonable performance, most, if not all, research-oriented autonomous vehicles do manual calibrations in a one-by-one fashion. Since manual calibration is not sustainable once entering into mass production stage for industrial purposes, we here introduce a machine-learning based auto-calibration system for autonomous driving vehicles. In this paper, we will show how we build a data-driven longitudinal calibration procedure using machine learning techniques. We first generated offline calibration tables from human driving data. The offline table serves as an initial guess for later uses and it only needs twenty-minutes data collection and process. We then used an online-learning algorithm to appropriately update the initial table (the offline table) based on real-time performance analysis. This longitudinal auto-calibration system has been deployed to more than one hundred Baidu Apollo self-driving vehicles (including hybrid family vehicles and electronic delivery-only vehicles) since April 2018. By August 27, 2018, it had been tested for more than two thousands hours, ten thousands kilometers (6,213 miles) and yet proven to be effective.

• [cs.RO]Configuration Space Singularities of The Delta Manipulator
Marc Diesse
http://arxiv.org/abs/1808.10064v1

We investigate the configuration space of the Delta-Manipulator, identify 24 points in the configuration space, where the Jacobian of the Constraint Equations looses rank and show, that these are not manifold points of the Real Algebraic Set, which is defined by the Constraint Equations.

• [cs.RO]Design of an Autonomous Precision Pollination Robot
Nicholas Ohi, Kyle Lassak, Ryan Watson, Jared Strader, Yixin Du, Chizhao Yang, Gabrielle Hedrick, Jennifer Nguyen, Scott Harper, Dylan Reynolds, Cagri Kilic, Jacob Hikes, Sarah Mills, Conner Castle, Benjamin Buzzo, Nicole Waterland, Jason Gross, Yong-Lak Park, Xin Li, Yu Gu
http://arxiv.org/abs/1808.10010v1

Precision robotic pollination systems can not only fill the gap of declining natural pollinators, but can also surpass them in efficiency and uniformity, helping to feed the fast-growing human population on Earth. This paper presents the design and ongoing development of an autonomous robot named "BrambleBee", which aims at pollinating bramble plants in a greenhouse environment. Partially inspired by the ecology and behavior of bees, BrambleBee employs state-of-the-art localization and mapping, visual perception, path planning, motion control, and manipulation techniques to create an efficient and robust autonomous pollination system.

• [cs.RO]RoI-based Robotic Grasp Detection in Object Overlapping Scenes Using Convolutional Neural Network
Hanbo Zhang, Xuguang Lan, Xinwen Zhou, Nanning Zheng
http://arxiv.org/abs/1808.10313v1

Grasp detection is an essential skill for widespread use of robots. Recent works demonstrate the advanced performance of Convolutional Neural Network (CNN) on robotic grasp detection. However, a significant shortcoming of existing grasp detection algorithms is that they all ignore the affiliation between grasps and targets. In this paper, we propose a robotic grasp detection algorithm based on Region of Interest (RoI) to simultaneously detect targets and their grasps in object overlapping scenes. Our proposed algorithm uses Regions of Interest (RoIs) to detect grasps while doing classification and location regression of targets. To train the network, we contribute a much bigger multi-object grasp dataset than Cornell Grasp Dataset, which is based on Visual Manipulation Relationship Dataset. Experimental results demonstrate that our algorithm achieves 24.0% miss rate at 1FPPI and 70.5% mAP with grasp on our dataset. Robotic experiments demonstrate that our proposed algorithm can help robots grasp specified target in multi-object scenes at 84% success rate.

• [cs.RO]Robot_gym: accelerated robot training through simulation in the cloud with ROS and Gazebo
Víctor Mayoral Vilches, Alejandro Hernández Cordero, Asier Bilbao Calvo, Irati Zamalloa Ugarte, Risto Kojcev
http://arxiv.org/abs/1808.10369v1

Rather than programming, training allows robots to achieve behaviors that generalize better and are capable to respond to real-world needs. However, such training requires a big amount of experimentation which is not always feasible for a physical robot. In this work, we present robot_gym, a framework to accelerate robot training through simulation in the cloud that makes use of roboticists' tools, simplifying the development and deployment processes on real robots. We unveil that, for simple tasks, simple 3DoF robots require more than 140 attempts to learn. For more complex, 6DoF robots, the number of attempts increases to more than 900 for the same task. We demonstrate that our framework, for simple tasks, accelerates the robot training time by more than 33% while maintaining similar levels of accuracy and repeatability.

• [cs.SI]Asymptotic analysis of the Friedkin-Johnsen model when the matrix of the susceptibility weights approaches the identity matrix
Alfredo Pironti
http://arxiv.org/abs/1808.10379v1

In this paper we analyze the Friedkin-Johnsen model of opinions [1] when the coefficients weighting the agent susceptibilities to interpersonal influence approach 1. We will show that in this case, under suitable assumptions, the model converges to a quasi-consensus condition among the agents. In general the achieved consensus value will be different to the one obtained by the corresponding DeGroot model [2].

• [cs.SI]On Microtargeting Socially Divisive Ads: A Case Study of Russia-Linked Ad Campaigns on Facebook
Filipe N. Ribeiro, Koustuv Saha, Mahmoudreza Babaei, Lucas Henrique, Johnnatan Messias, Oana Goga, Fabricio Benevenuto, Krishna P. Gummadi, Elissa M. Redmiles
http://arxiv.org/abs/1808.09218v2

Targeted advertising is meant to improve the efficiency of matching advertisers to their customers. However, targeted advertising can also be abused by malicious advertisers to efficiently reach people susceptible to false stories, stoke grievances, and incite social conflict. Since targeted ads are not seen by non-targeted and non-vulnerable people, malicious ads are likely to go unreported and their effects undetected. This work examines a specific case of malicious advertising, exploring the extent to which political ads from the Russian Intelligence Research Agency (IRA) run prior to 2016 U.S. elections exploited Facebook's targeted advertising infrastructure to efficiently target ads on divisive or polarizing topics (e.g., immigration, race-based policing) at vulnerable sub-populations. In particular, we do the following: (a) We conduct U.S. census-representative surveys to characterize how users with different political ideologies report, approve, and perceive truth in the content of the IRA ads. Our surveys show that many ads are 'divisive': they elicit very different reactions from people belonging to different socially salient groups. (b) We characterize how these divisive ads are targeted to sub-populations that feel particularly aggrieved by the status quo. Our findings support existing calls for greater transparency of content and targeting of political ads. (c) We particularly focus on how the Facebook ad API facilitates such targeting. We show how the enormous amount of personal data Facebook aggregates about users and makes available to advertisers enables such malicious targeting.

• [cs.SI]Uncovering intimate and casual relationships from mobile phone communication
Mikaela Irene D. Fudolig, Daniel Monsivais, Kunal Bhattacharya, Hang-Hyun Jo, Kimmo Kaski
http://arxiv.org/abs/1808.10166v1

We analyze a large-scale mobile phone call dataset with the metadata of the mobile phone users, including age, gender, and billing locality, to uncover the nature of relationships between peers or individuals of similar ages. We show that in addition to the age and gender of users, the information about the ranks of users to each other in their egocentric networks is crucial in characterizing intimate and casual relationships of peers. The opposite-gender pairs in intimate relationships are found to show the highest levels of call frequency and daily regularity, consistent with small-scale studies on romantic partners. This is followed by the same-gender pairs in intimate relationships, while the lowest call frequency and daily regularity are observed for the pairs in casual relationships. We also find that older pairs tend to call less frequently and less regularly than younger pairs, while the average call durations exhibit a more complex dependence on age. We expect that a more detailed analysis can help us better characterize the nature of peer relationships and distinguish various types of relations, such as siblings, friends, and romantic partners, more clearly.

• [math.ST]A Divergence Proof for Latuszynski's Counter-Example Approaching Infinity with Probability "Near" One
Yufan Li
http://arxiv.org/abs/1808.10121v1

This note is a technical supplement to the following paper: \citep{latuszynski2013adaptive}. In the said paper, the authors explored various convergence conditions for adaptive Gibbs samplers. A significant portion of the paper seeks to prove false a set of convergence conditions proposed in an earlier paper: \citep{levine2006optimizing}. This is done by providing a proof that the counter-example constructed (essentially a state-dependent, time-dependent random walk on $\mathbb{R}^2$) approaches infinity with probability larger than $0$. The author noted that it is very likely that the said random walk approaches infinity with probability $1$ according to their numerical simulation (See Proposition 3.2, Remark 3.3). But they also noted that due to technicalities, they were only able to provide a proof that the process tends to infinity with probability strictly larger than $0$ (Remark 3.3). Upon checking their proof, we notice that their approach may be simplified and an alternative approach yields stronger result. We detail our method and result here out of technical interest.

• [math.ST]Differentially Private Change-Point Detection
Rachel Cummings, Sara Krehbiel, Yajun Mei, Rui Tuo, Wanrong Zhang
http://arxiv.org/abs/1808.10056v1

The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point detection through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and provide empirical validation of our results.

• [math.ST]Maximum likelihood estimator and its consistency for an $(L,1)$ random walk in a parametric random environment
Hua-Ming Wang, Meijuan Zhang
http://arxiv.org/abs/1808.10092v1

Consider an $(L,1)$ random walk in an i.i.d. random environment, whose environment involves certain parameter. We get the maximum likelihood estimator(MLE) of the environment parameter which can be written as functionals of a multitype branching process with immigration in a random environment(BPIRE). Because the offspring distributions of the involved multitype BPIRE are of the linear fractional type, the limit invariant distribution of the multitype BPIRE can be computed explicitly. As a result, we get the consistency of the MLE. Our result is a generalization of Comets et al. [Stochastic Process. Appl. 2014, 124, 268-288].

• [math.ST]Minimal inference from incomplete 2x2-tables
Li-Chun Zhang, Raymond L. Chambers
http://arxiv.org/abs/1808.10185v1

Estimates based on 2x2 tables of frequencies are widely used in statistical applications. However, in many cases these tables are incomplete in the sense that the data required to compute the frequencies for a subset of the cells defining the table are unavailable. Minimal inference addresses those situations where this incompleteness leads to target parameters for these tables that are interval, rather than point, identifiable. In particular, we develop the concept of corroboration as a measure of the statistical evidence in the observed data that is not based on likelihoods. The corroboration function identifies the parameter values that are the hardest to refute, i.e., those values which, under repeated sampling, remain interval identified. This enables us to develop a general approach to inference from incomplete 2x2 tables when the additional assumptions required to support a likelihood-based approach cannot be sustained based on the data available. This minimal inference approach then provides a foundation for further analysis that aims at making sharper inference supported by plausible external beliefs.

• [math.ST]Quadratic Discriminant Analysis under Moderate Dimension
Qing Yang, Guang Cheng
http://arxiv.org/abs/1808.10065v1

Quadratic discriminant analysis (QDA) is a simple method to classify a subject into two populations, and was proven to perform as well as the Bayes rule when the data dimension p is fixed. The main purpose of this paper is to examine the empirical and theoretical behaviors of QDA where p grows proportionally to the sample sizes without imposing any structural assumption on the parameters. The first finding in this moderate dimension regime is that QDA can perform as poorly as random guessing even when the two populations deviate significantly. This motivates a generalized version of QDA that automatically adapts to dimensionality. Under a finite fourth moment condition, we derive misclassification rates for both the generalized QDA and the optimal one. A direct comparison reveals one "easy" case where the difference between two rates converges to zero and one "hard" case where that converges to some strictly positive constant. For the latter, a divide-and-conquer approach over dimension (rather than sample) followed by a screening procedure is proposed to narrow the gap. Various numerical studies are conducted to back up the proposed methodology.

• [physics.comp-ph]High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs
Zhihua Dong, Yao-Lung L. Fang, Xiaojing Huang, Hanfei Yan, Sungsoo Ha, Wei Xu, Yong S. Chu, Stuart I. Campbell, Meifeng Lin
http://arxiv.org/abs/1808.10375v1

Ptychography is an emerging imaging technique that is able to provide wavelength-limited spatial resolution from specimen with extended lateral dimensions. As a scanning microscopy method, a typical two-dimensional image requires a number of data frames. As a diffraction-based imaging technique, the real-space image has to be recovered through iterative reconstruction algorithms. Due to these two inherent aspects, a ptychographic reconstruction is generally a computation-intensive and time-consuming process, which limits the throughput of this method. We report an accelerated version of the multi-mode difference map algorithm for ptychography reconstruction using multiple distributed GPUs. This approach leverages available scientific computing packages in Python, including mpi4py and PyCUDA, with the core computation functions implemented in CUDA C. We find that interestingly even with MPI collective communications, the weak scaling in the number of GPU nodes can still remain nearly constant. Most importantly, for realistic diffraction measurements, we observe a speedup ranging from a factor of $10$ to $10^3$ depending on the data size, which reduces the reconstruction time remarkably from hours to typically about 1 minute and is thus critical for real-time data processing and visualization.

• [stat.AP]An Introduction to Inductive Statistical Inference -- from Parameter Estimation to Decision-Making
Henk van Elst
http://arxiv.org/abs/1808.10173v1

These lecture notes aim at a post-Bachelor audience with a backgound at an introductory level in Applied Mathematics and Applied Statistics. They discuss the logic and methodology of the Bayes-Laplace approach to inductive statistical inference that places common sense and the guiding lines of the scientific method at the heart of systematic analyses of quantitative-empirical data. Following an exposition of exactly solvable cases of single- and two-parameter estimation, the main focus is laid on Markov Chain Monte Carlo (MCMC) simulations on the basis of Gibbs sampling and Hamiltonian Monte Carlo sampling of posterior joint probability distributions for regression parameters occurring in generalised linear models. The modelling of fixed as well as of varying effects (varying intercepts) is considered, and the simulation of posterior predictive distributions is outlined. The issues of model comparison with Bayes factors and the assessment of models' relative posterior predictive accuracy with information entropy-based criteria DIC and WAIC are addressed. Concluding, a conceptual link to the behavioural subjective expected utility representation of a single decision-maker's choice behaviour in static one-shot decision problems is established. Codes for MCMC simulations of multi-dimensional posterior joint probability distributions with the JAGS and Stan packages implemented in the statistical software R are provided. The lecture notes are fully hyperlinked. They direct the reader to original scientific research papers and to pertinent biographical information.

• [stat.AP]Reducing post-surgery recovery bed occupancy through an analytical prediction model
Belinda Spratt, Erhan Kozan
http://arxiv.org/abs/1808.10132v1

Operations Research approaches to surgical scheduling are becoming increasingly popular in both theory and practice. Often these models neglect stochasticity in order to reduce the computational complexity of the problem. In this paper, historical data is used to examine the occupancy of post-surgery recovery spaces as a function of the initial surgical case sequence. We show that the number of patients in the recovery space is well modelled by a Poisson binomial random variable. A mixed integer nonlinear programming model for the surgical case sequencing problem is presented that reduces the maximum expected occupancy in post-surgery recovery spaces. Given the complexity of the problem, Simulated Annealing is used to produce good solutions in short amounts of computational time. Computational experiments are performed to compare the methodology here to a full year of historical data. The solution techniques presented are able to reduce maximum expected recovery occupancy by 18% on average. This reduction alleviates a large amount of stress on staff in the post-surgery recovery spaces and improves the quality of care provided to patients.

• [stat.ME]Accelerating Parallel Tempering: Quantile Tempering Algorithm (QuanTA)
Nicholas G. Tawn, Gareth O. Roberts
http://arxiv.org/abs/1808.10415v1

Using MCMC to sample from a target distribution, $\pi(x)$ on a $d$-dimensional state space can be a difficult and computationally expensive problem. Particularly when the target exhibits multimodality, then the traditional methods can fail to explore the entire state space and this results in a bias sample output. Methods to overcome this issue include the parallel tempering algorithm which utilises an augmented state space approach to help the Markov chain traverse regions of low probability density and reach other modes. This method suffers from the curse of dimensionality which dramatically slows the transfer of mixing information from the auxiliary targets to the target of interest as $d \rightarrow \infty$. This paper introduces a novel prototype algorithm, QuanTA, that uses a Gaussian motivated transformation in an attempt to accelerate the mixing through the temperature schedule of a parallel tempering algorithm. This new algorithm is accompanied by a comprehensive theoretical analysis quantifying the improved efficiency and scalability of the approach; concluding that under weak regularity conditions the new approach gives accelerated mixing through the temperature schedule. Empirical evidence of the effectiveness of this new algorithm is illustrated on canonical examples.

• [stat.ME]Adaptative significance levels in normal mean hypothesis testing
Alejandra Estefanía Patiño Hoyos, Victor Fossaluza
http://arxiv.org/abs/1808.10019v1

The Full Bayesian Significance Test (FBST) for precise hypotheses was presented by Pereira and Stern (1999) as a Bayesian alternative instead of the traditional significance test based on p-value. The FBST uses the evidence in favor of the null hypothesis ($H_0$) calculated as the complement of the posterior probability of the highest posterior density region, which is tangent to the set defined by $H_0$. An important practical issue for the implementation of the FBST is the determination of how large the evidence must be in order to decide for its rejection. In the Classical significance tests, the most used measure for rejecting a hypothesis is p-value. It is known that p-value decreases as sample size increases, so by setting a single significance level, it usually leads $H_0$ rejection. In the FBST procedure, the evidence in favor of $H_0$ exhibits the same behavior as the p-value when the sample size increases. This suggests that the cut-off point to define the rejection of $H_0$ in the FBST should be a sample size function. In this work, we focus on the case of two-sided normal mean hypothesis testing and present a method to find a cut-off value for the evidence in the FBST by minimizing the linear combination of the type I error probability and the expected type II error probability for a given sample size.

• [stat.ME]Optimal shrinkage covariance matrix estimation under random sampling from elliptical distributions
Esa Ollila, Elias Raninen
http://arxiv.org/abs/1808.10188v1

This paper considers the problem of estimating a high-dimensional (HD) covariance matrix when the sample size is smaller, or not much larger, than the dimensionality of the data, which could potentially be very large. We develop a regularized sample covariance matrix (RSCM) estimator which can be applied in commonly occurring sparse data problems. The proposed RSCM estimator is based on estimators of the unknown optimal (oracle) shrinkage parameters that yield the minimum mean squared error (MMSE) between the RSCM and the true covariance matrix when the data is sampled from an unspecified elliptically symmetric distribution. We propose two variants of the RSCM estimator which differ in the approach in which they estimate the underlying sphericity parameter involved in the theoretical optimal shrinkage parameter. The performance of the proposed RSCM estimators are evaluated with numerical simulation studies. In particular when the sample sizes are low, the proposed RSCM estimators often show a significant improvement over the conventional RSCM estimator by Ledoit and Wolf (2004). We further evaluate the performance of the proposed estimators in classification and portfolio optimization problems with real data wherein the proposed methods are able to outperform the benchmark methods.

• [stat.ML]Discriminative Learning of Similarity and Group Equivariant Representations
Shubhendu Trivedi
http://arxiv.org/abs/1808.10078v1

One of the most fundamental problems in machine learning is to compare examples: Given a pair of objects we want to return a value which indicates degree of (dis)similarity. Similarity is often task specific, and pre-defined distances can perform poorly, leading to work in metric learning. However, being able to learn a similarity-sensitive distance function also presupposes access to a rich, discriminative representation for the objects at hand. In this dissertation we present contributions towards both ends. In the first part of the thesis, assuming good representations for the data, we present a formulation for metric learning that makes a more direct attempt to optimize for the k-NN accuracy as compared to prior work. We also present extensions of this formulation to metric learning for kNN regression, asymmetric similarity learning and discriminative learning of Hamming distance. In the second part, we consider a situation where we are on a limited computational budget i.e. optimizing over a space of possible metrics would be infeasible, but access to a label aware distance metric is still desirable. We present a simple, and computationally inexpensive approach for estimating a well motivated metric that relies only on gradient estimates, discussing theoretical and experimental results. In the final part, we address representational issues, considering group equivariant convolutional neural networks (GCNNs). Equivariance to symmetry transformations is explicitly encoded in GCNNs; a classical CNN being the simplest example. In particular, we present a SO(3)-equivariant neural network architecture for spherical data, that operates entirely in Fourier space, while also providing a formalism for the design of fully Fourier neural networks that are equivariant to the action of any continuous compact group.

• [stat.ML]Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers
Edoardo Vignotto, Sebastian Engelke
http://arxiv.org/abs/1808.09902v2

Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown classes. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm can fail when the geometries of known and unknown classes differ. To overcome this problem, we propose two new algorithms relying on approximations from extreme value theory. We show the effectiveness of our classifiers in simulations and on the LETTER and MNIST data sets.

• [stat.ML]Nested multi-instance classification
Alexander Stec, Diego Klabjan, Jean Utke
http://arxiv.org/abs/1808.10430v1

There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multi-instance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from other previous works in that we organize instances into relevant groups that are treated differently. We also introduce a method to replace instances that are missing which successfully creates neutral input instances and consistently outperforms standard fill-in methods in real world use cases. In addition, we propose a method for manual dropout when a whole group of instances is missing that allows us to use richer training data and obtain higher accuracy at the end of training. With specific pretraining, we find that the model works to great effect on our real world and pub-lic datasets in comparison to baseline methods, justifying the different treatment among groups of instances.

• [stat.ML]Physically-inspired Gaussian processes for transcriptional regulation in Drosophila melanogaster
Andrés F. López-Lopera, Nicolas Durrande, Mauricio A. Alvarez
http://arxiv.org/abs/1808.10026v1

The regulatory process in Drosophila melanogaster is thoroughly studied for understanding several principles in systems biology. Since transcriptional regulation of the Drosophila depends on spatiotemporal interactions between mRNA expressions and gap-gene proteins, proper physically-inspired stochastic models are required to describe the existing link between both biological quantities. Many studies have shown that the use of Gaussian processes (GPs) and differential equations yields promising inference results when modelling regulatory processes. In order to exploit the benefits of GPs, two types of physically-inspired GPs based on the reaction-diffusion equation are further investigated in this paper. The main difference between both approaches lies on whether the GP prior is placed: either over mRNA expressions or protein concentrations. Contrarily to other stochastic frameworks, discretising the spatial space is not required here. Both GP models are tested under different conditions depending on the availability of biological data. Finally, their performances are assessed using a high-resolution dataset describing the blastoderm stage of the early embryo of Drosophila.

cs.AI - 人工智能
cs.CL - 计算与语言
cs.CR - 加密与安全
cs.CV - 机器视觉与模式识别
cs.DC - 分布式、并行与集群计算
cs.DS - 数据结构与算法
cs.ET - 新兴技术
cs.IR - 信息检索
cs.IT - 信息论
cs.LG - 自动学习
cs.LO - 计算逻辑
cs.NE - 神经与进化计算
cs.RO - 机器人学
cs.SI - 社交网络与信息网络
eess.IV - 图像与视频处理
math.ST - 统计理论
physics.bio-ph - 生物物理
q-bio.PE - 人口与发展
stat.AP - 应用统计
stat.ME - 统计方法论
stat.ML - (统计)机器学习

• [cs.AI]Boosting Binary Optimization via Binary Classification: A Case Study of Job Shop Scheduling
• [cs.AI]Gibson Env: Real-World Perception for Embodied Agents
• [cs.AI]Multi-Hop Knowledge Graph Reasoning with Reward Shaping
• [cs.AI]Using a Game Engine to Simulate Critical Incidents and Data Collection by Autonomous Drones
• [cs.CL]An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs
• [cs.CL]Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
• [cs.CL]Bottom-Up Abstractive Summarization
• [cs.CL]Cognate-aware morphological segmentation for multilingual neural translation
• [cs.CL]Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
• [cs.CL]Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report
• [cs.CL]Explicit State Tracking with Semi-Supervision for Neural Dialogue Generation
• [cs.CL]Extracting Keywords from Open-Ended Business Survey Questions
• [cs.CL]How agents see things: On visual representations in an emergent language game
• [cs.CL]Imitation Learning for Neural Morphological String Transduction
• [cs.CL]Learning to Describe Differences Between Pairs of Similar Images
• [cs.CL]Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
• [cs.CL]Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
• [cs.CL]Spherical Latent Spaces for Stable Variational Autoencoders
• [cs.CL]The MeMAD Submission to the WMT18 Multimodal Translation Task
• [cs.CR]Improve Blockchain Performance using Graph Data Structure and Parallel Mining
• [cs.CV]A Unified Mammogram Analysis Method via Hybrid Deep Supervision
• [cs.CV]Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal
• [cs.CV]MobiBits: Multimodal Mobile Biometric Database
• [cs.CV]Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic Retinopathy Grading Kang
• [cs.CV]Seeing Colors: Learning Semantic Text Encoding for Classification
• [cs.CV]Spoofing PRNU Patterns of Iris Sensors while Preserving Iris Recognition
• [cs.DC]Scalable Manifold Learning for Big Data with Apache Spark
• [cs.DS]Graph reduction by local variation
• [cs.ET]Learning in Memristive Neural Network Architectures using Analog Backpropagation Circuits
• [cs.IR]Content-based feature exploration for transparent music recommendation using self-attentive genre classification
• [cs.IR]Spectral Collaborative Filtering
• [cs.IT]Impact of Device Orientation on Error Performance of LiFi Systems
• [cs.LG]A Multi-layer Gaussian Process for Motor Symptom Estimation in People with Parkinson's Disease
• [cs.LG]A novel extension of Generalized Low-Rank Approximation of Matrices based on multiple-pairs of transformations
• [cs.LG]A novel graph-based model for hybrid recommendations in cold-start scenarios
• [cs.LG]APES: a Python toolbox for simulating reinforcement learning environments
• [cs.LG]Adaptation and Robust Learning of Probabilistic Movement Primitives
• [cs.LG]Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information
• [cs.LG]Bayesian Classifier for Route Prediction with Markov Chains
• [cs.LG]Directed Exploration in PAC Model-Free Reinforcement Learning
• [cs.LG]Learning Data-adaptive Nonparametric Kernels
• [cs.LG]Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
• [cs.LG]Proximity Forest: An effective and scalable distance-based classifier for time series
• [cs.LG]Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction
• [cs.LO]Finite LTL Synthesis with Environment Assumptions and Quality Measures
• [cs.NE]Algoritmos Genéticos Aplicado ao Problema de Roteamento de Veículos
• [cs.NE]Autonomous Configuration of Network Parameters in Operating Systems using Evolutionary Algorithms
• [cs.RO]Bioinspired Straight Walking Task-Space Planner
• [cs.RO]Gradual Collective Upgrade of a Swarm of Autonomous Buoys for Dynamic Ocean Monitoring
• [cs.RO]Modified Self-Organized Task Allocation in a Group of Robots
• [cs.RO]PythonRobotics: a Python code collection of robotics algorithms
• [cs.SI]Diversity, Topology, and the Risk of Node Re-identification in Labeled Social Graphs
• [cs.SI]Influence Dynamics and Consensus in an Opinion-Neighborhood based Modified Vicsek-like Social Network
• [cs.SI]Securing Tag-based recommender systems against profile injection attacks: A comparative study
• [eess.IV]Automatic Lung Cancer Prediction from Chest X-ray Images Using Deep Learning Approach
• [math.ST]Asymptotic Seed Bias in Respondent-driven Sampling
• [math.ST]Bayesian quadrature and energy minimization for space-filling design
• [math.ST]Determining the signal dimension in second order source separation
• [math.ST]On Second Order Conditions in the Multivariate Block Maxima and Peak over Threshold Method
• [math.ST]Sup-norm adaptive simultaneous drift estimation for ergodic diffusions
• [physics.bio-ph]Maximum Entropy Principle Analysis in Network Systems with Short-time Recordings
• [q-bio.PE]A global model for predicting the arrival of imported dengue infections
• [stat.AP]Penalized Component Hub Models
• [stat.AP]The Causal Effect of Answer Changing on Multiple-Choice Items
• [stat.AP]Understanding the Characteristics of Frequent Users of Emergency Departments: What Role Do Medical Conditions Play?
• [stat.ME]An explicit mean-covariance parameterization for multivariate response linear regression
• [stat.ME]Gaussian process regression for survival time prediction with genome-wide gene expression
• [stat.ME]Generalized probabilistic principal component analysis of correlated data
• [stat.ME]Improved Chebyshev inequality: new probability bounds with known supremum of PDF
• [stat.ML]Data-driven discovery of PDEs in complex datasets
• [stat.ML]On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data
• [stat.ML]Speaker Fluency Level Classification Using Machine Learning Techniques

·····································

• [cs.AI]Boosting Binary Optimization via Binary Classification: A Case Study of Job Shop Scheduling
Oleg V. Shylo, Hesam Shams
http://arxiv.org/abs/1808.10813v1

Many optimization techniques evaluate solutions consecutively, where the next candidate for evaluation is determined by the results of previous evaluations. For example, these include iterative methods, "black box" optimization algorithms, simulated annealing, evolutionary algorithms and tabu search, to name a few. When solving an optimization problem, these algorithms evaluate a large number of solutions, which raises the following question: Is it possible to learn something about the optimum using these solutions? In this paper, we define this "learning" question in terms of a logistic regression model and explore its predictive accuracy computationally. The proposed model uses a collection of solutions to predict the components of the optimal solutions. To illustrate the utility of such predictions, we embed the logistic regression model into the tabu search algorithm for job shop scheduling problem. The resulting framework is simple to implement, yet provides a significant boost to the performance of the standard tabu search.

• [cs.AI]Gibson Env: Real-World Perception for Embodied Agents
Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese
http://arxiv.org/abs/1808.10654v1

Developing visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given rise to learning-in-simulation which consequently casts a question on whether the results transfer to real-world. In this paper, we are concerned with the problem of developing real-world perception for active agents, propose Gibson Virtual Environment for this purpose, and showcase sample perceptual tasks learned therein. Gibson is based on virtualizing real spaces, rather than using artificially designed ones, and currently includes over 1400 floor spaces from 572 full buildings. The main characteristics of Gibson are: I. being from the real-world and reflecting its semantic complexity, II. having an internal synthesis mechanism, "Goggles", enabling deploying the trained models in real-world without needing further domain adaptation, III. embodiment of agents and making them subject to constraints of physics and space.

• [cs.AI]Multi-Hop Knowledge Graph Reasoning with Reward Shaping
Xi Victoria Lin, Richard Socher, Caiming Xiong
http://arxiv.org/abs/1808.10568v1

Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.

• [cs.AI]Using a Game Engine to Simulate Critical Incidents and Data Collection by Autonomous Drones
David L. Smyth, Frank G. Glavin, Michael G. Madden
http://arxiv.org/abs/1808.10784v1

Using a game engine, we have developed a virtual environment which models important aspects of critical incident scenarios. We focused on modelling phenomena relating to the identification and gathering of key forensic evidence, in order to develop and test a system which can handle chemical, biological, radiological/nuclear or explosive (CBRNe) events autonomously. This allows us to build and validate AI-based technologies, which can be trained and tested in our custom virtual environment before being deployed in real-world scenarios. We have used our virtual scenario to rapidly prototype a system which can use simulated Remote Aerial Vehicles (RAVs) to gather images from the environment for the purpose of mapping. Our environment provides us with an effective medium through which we can develop and test various AI methodologies for critical incident scene assessment, in a safe and controlled manner

• [cs.CL]An Empirical Analysis of the Role of Amplifiers, Downtoners, and Negations in Emotion Classification in Microblogs
Florian Strohm, Roman Klinger
http://arxiv.org/abs/1808.10653v1

The effect of amplifiers, downtoners, and negations has been studied in general and particularly in the context of sentiment analysis. However, there is only limited work which aims at transferring the results and methods to discrete classes of emotions, e. g., joy, anger, fear, sadness, surprise, and disgust. For instance, it is not straight-forward to interpret which emotion the phrase "not happy" expresses. With this paper, we aim at obtaining a better understanding of such modifiers in the context of emotion-bearing words and their impact on document-level emotion classification, namely, microposts on Twitter. We select an appropriate scope detection method for modifiers of emotion words, incorporate it in a document-level emotion classification model as additional bag of words and show that this approach improves the performance of emotion classification. In addition, we build a term weighting approach based on the different modifiers into a lexical model for the analysis of the semantics of modifiers and their impact on emotion meaning. We show that amplifiers separate emotions expressed with an emotion- bearing word more clearly from other secondary connotations. Downtoners have the opposite effect. In addition, we discuss the meaning of negations of emotion-bearing words. For instance we show empirically that "not happy" is closer to sadness than to anger and that fear-expressing words in the scope of downtoners often express surprise.

• [cs.CL]Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
Nikolaos Pappas, Lesly Miculicich Werlen, James Henderson
http://arxiv.org/abs/1808.10681v1

Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying.

• [cs.CL]Bottom-Up Abstractive Summarization
Sebastian Gehrmann, Yuntian Deng, Alexander M. Rush
http://arxiv.org/abs/1808.10792v1

Neural network-based methods for abstractive summarization produce outputs that are more fluent than other techniques, but which can be poor at content selection. This work proposes a simple technique for addressing this issue: use a data-efficient content selector to over-determine phrases in a source document that should be part of the summary. We use this selector as a bottom-up attention step to constrain the model to likely phrases. We show that this approach improves the ability to compress text, while still generating fluent summaries. This two-step process is both simpler and higher performing than other end-to-end content selection models, leading to significant improvements on ROUGE for both the CNN-DM and NYT corpus. Furthermore, the content selector can be trained with as little as 1,000 sentences, making it easy to transfer a trained summarizer to a new domain.

• [cs.CL]Cognate-aware morphological segmentation for multilingual neural translation
Stig-Arne Grönroos, Sami Virpioja, Mikko Kurimo
http://arxiv.org/abs/1808.10791v1

This article describes the Aalto University entry to the WMT18 News Translation Shared Task. We participate in the multilingual subtrack with a system trained under the constrained condition to translate from English to both Finnish and Estonian. The system is based on the Transformer model. We focus on improving the consistency of morphological segmentation for words that are similar orthographically, semantically, and distributionally; such words include etymological cognates, loan words, and proper names. For this, we introduce Cognate Morfessor, a multilingual variant of the Morfessor method. We show that our approach improves the translation quality particularly for Estonian, which has less resources for training the translation model.

• [cs.CL]Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
Jaap Jumelet, Dieuwke Hupkes
http://arxiv.org/abs/1808.10627v1

In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguis- tics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a subset of such constructions. We show that the model finds a relation between the licensing context and the negative polarity item and appears to be aware of the scope of this context, which we extract from a parse tree of the sentence. With this research, we hope to pave the way for other studies linking formal linguistics to deep learning.

• [cs.CL]Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report
Renjie Zheng, Yilin Yang, Mingbo Ma, Liang Huang
http://arxiv.org/abs/1808.10592v1

This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation. In this paper, we introduce a simple approach to incorporate image information by feeding image features to the decoder side. We also explore different sequence level training methods including scheduled sampling and reinforcement learning which lead to substantial improvements. Our systems ensemble several models using different architectures and training methods and achieve the best performance for three subtasks: En-De and En-Cs in task 1 and (En+De+Fr)-Cs task 1B.

• [cs.CL]Explicit State Tracking with Semi-Supervision for Neural Dialogue Generation
Xisen Jin, Wenqiang Lei, Zhaochun Ren, Hongshen Chen, Shangsong Liang, Yihong Zhao, Dawei Yin
http://arxiv.org/abs/1808.10596v1

The task of dialogue generation aims to automatically provide responses given previous utterances. Tracking dialogue states is an important ingredient in dialogue generation for estimating users' intention. However, the \emph{expensive nature of state labeling} and the \emph{weak interpretability} make the dialogue state tracking a challenging problem for both task-oriented and non-task-oriented dialogue generation: For generating responses in task-oriented dialogues, state tracking is usually learned from manually annotated corpora, where the human annotation is expensive for training; for generating responses in non-task-oriented dialogues, most of existing work neglects the explicit state tracking due to the unlimited number of dialogue states. In this paper, we propose the \emph{semi-supervised explicit dialogue state tracker} (SEDST) for neural dialogue generation. To this end, our approach has two core ingredients: \emph{CopyFlowNet} and \emph{posterior regularization}. Specifically, we propose an encoder-decoder architecture, named \emph{CopyFlowNet}, to represent an explicit dialogue state with a probabilistic distribution over the vocabulary space. To optimize the training procedure, we apply a posterior regularization strategy to integrate indirect supervision. Extensive experiments conducted on both task-oriented and non-task-oriented dialogue corpora demonstrate the effectiveness of our proposed model. Moreover, we find that our proposed semi-supervised dialogue state tracker achieves a comparable performance as state-of-the-art supervised learning baselines in state tracking procedure.

• [cs.CL]Extracting Keywords from Open-Ended Business Survey Questions
Barbara McGillivray, Gard Jenset, Dominik Heil
http://arxiv.org/abs/1808.10685v1

Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond predefined categories and are a very valuable source of new insights into people's opinions. At the same time, surveys always make ontological assumptions about the nature of the entities that are researched, and this has vital ethical consequences. Human interpretations and opinions can only be properly ascertained in their richness using textual data sources; if these sources are analyzed appropriately, the essential linguistic nature of humans and social entities is safeguarded. Natural Language Processing (NLP) offers possibilities for meeting this ethical business challenge by automating the analysis of natural language and thus allowing for insightful investigations of human judgements. We present a computational pipeline for analysing large amounts of responses to open-ended questions in surveys and extract keywords that appropriately represent people's opinions. This pipeline addresses the need to perform such tasks outside the scope of both commercial software and bespoke analysis, exceeds the performance to state-of-the-art systems, and performs this task in a transparent way that allows for scrutinising and exposing potential biases in the analysis. Following the principle of Open Data Science, our code is open-source and generalizable to other datasets.

• [cs.CL]How agents see things: On visual representations in an emergent language game
Diane Bouchacourt, Marco Baroni
http://arxiv.org/abs/1808.10696v1

There is growing interest in the language developed by agents interacting in emergent-communication settings. Earlier studies have focused on the agents' symbol usage, rather than on their representation of visual input. In this paper, we consider the referential games of Lazaridou et al. (2017), and investigate the representations the agents develop during their evolving interaction. We find that the agents establish successful communication by inducing visual representations that almost perfectly align with each other, but, surprisingly, do not capture the conceptual properties of the objects depicted in the input images. We conclude that, if we care about developing language-like communication systems, we must pay more attention to the visual semantics agents associate to the symbols they use.

• [cs.CL]Imitation Learning for Neural Morphological String Transduction
Peter Makarov, Simon Clematide
http://arxiv.org/abs/1808.10701v1

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization. Previous approaches to training this type of model either rely on an external character aligner for the production of gold action sequences, which results in a suboptimal model due to the unwarranted dependence on a single gold action sequence despite spurious ambiguity, or require warm starting with an MLE model. Our approach only requires a simple expert policy, eliminating the need for a character aligner or warm start. It also addresses familiar MLE training biases and leads to strong and state-of-the-art performance on several benchmarks.

• [cs.CL]Learning to Describe Differences Between Pairs of Similar Images
Harsh Jhamtani, Taylor Berg-Kirkpatrick
http://arxiv.org/abs/1808.10584v1

In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. Annotators were asked to succinctly describe all the differences in a short paragraph. As a result, our novel dataset provides an opportunity to explore models that align language and vision, and capture visual salience. The dataset may also be a useful benchmark for coherent multi-sentence generation. We perform a firstpass visual analysis that exposes clusters of differing pixels as a proxy for object-level differences. We propose a model that captures visual salience by using a latent variable to align clusters of differing pixels with output sentences. We find that, for both single-sentence generation and as well as multi-sentence generation, the proposed model outperforms the models that use attention alone.

• [cs.CL]Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
Matthew Roddy, Gabriel Skantze, Naomi Harte
http://arxiv.org/abs/1808.10785v1

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models. We propose that there is an appropriate temporal granularity at which modalities should be modeled. We design a multiscale RNN architecture to model modalities at separate timescales in a continuous manner. Our results show that modeling linguistic and acoustic features at separate temporal rates can be beneficial for turn-taking modeling. We also show that our approach can be used to incorporate gaze features into turn-taking models.

• [cs.CL]Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, Junji Tomita
http://arxiv.org/abs/1808.10628v1

This study considers the task of machine reading at scale (MRS) wherein, given a question, a system first performs the information retrieval (IR) task of finding relevant passages in a knowledge source and then carries out the reading comprehension (RC) task of extracting an answer span from the passages. Previous MRS studies, in which the IR component was trained without considering answer spans, struggled to accurately find a small number of relevant passages from a large set of passages. In this paper, we propose a simple and effective approach that incorporates the IR and RC tasks by using supervised multi-task learning in order that the IR component can be trained by considering answer spans. Experimental results on the standard benchmark, answering SQuAD questions using the full Wikipedia as the knowledge source, showed that our model achieved state-of-the-art performance. Moreover, we thoroughly evaluated the individual contributions of our model components with our new Japanese dataset and SQuAD. The results showed significant improvements in the IR task and provided a new perspective on IR for RC: it is effective to teach which part of the passage answers the question rather than to give only a relevance score to the whole passage.

• [cs.CL]Spherical Latent Spaces for Stable Variational Autoencoders
Jiacheng Xu, Greg Durrett
http://arxiv.org/abs/1808.10805v1

A hallmark of variational autoencoders (VAEs) for text processing is their combination of powerful encoder-decoder models, such as LSTMs, with simple latent distributions, typically multivariate Gaussians. These models pose a difficult optimization problem: there is an especially bad local optimum where the variational posterior always equals the prior and the model does not use the latent variable at all, a kind of "collapse" which is encouraged by the KL divergence term of the objective. In this work, we experiment with another choice of latent distribution, namely the von Mises-Fisher (vMF) distribution, which places mass on the surface of the unit hypersphere. With this choice of prior and posterior, the KL divergence term now only depends on the variance of the vMF distribution, giving us the ability to treat it as a fixed hyperparameter. We show that doing so not only averts the KL collapse, but consistently gives better likelihoods than Gaussians across a range of modeling conditions, including recurrent language modeling and bag-of-words document modeling. An analysis of the properties of our vMF representations shows that they learn richer and more nuanced structures in their latent representations than their Gaussian counterparts.

• [cs.CL]The MeMAD Submission to the WMT18 Multimodal Translation Task
Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez
http://arxiv.org/abs/1808.10802v1

This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for English-to-German and fifth for English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

• [cs.CR]Improve Blockchain Performance using Graph Data Structure and Parallel Mining
Jia Kan, Shangzhe Chen, Xin Huang
http://arxiv.org/abs/1808.10810v1

Blockchain technology is ushering in another break- out year, the challenge of blockchain still remains to be solved. This paper analyzes the features of Bitcoin and Bitcoin-NG system based on blockchian, proposes an improved method of implementing blockchain systems by replacing the structure of the original chain with the graph data structure. It was named GraphChain. Each block represents a transaction and contains the balance status of the traders. Additionally, as everyone knows all the transactions in Bitcoin system will be baled by only one miner that will result in a lot of wasted effort, so another way to improve resource utilization is to change the original way to compete for miner to election and parallel mining. Researchers simulated blockchain with graph structure and parallel mining through python, and suggested the conceptual new graph model which can improve both capacity and performance.

• [cs.CV]A Unified Mammogram Analysis Method via Hybrid Deep Supervision
Rongzhao Zhang, Han Zhang, Albert C. S. Chung
http://arxiv.org/abs/1808.10646v1

Automatic mammogram classification and mass segmentation play a critical role in a computer-aided mammogram screening system. In this work, we present a unified mammogram analysis framework for both whole-mammogram classification and segmentation. Our model is designed based on a deep U-Net with residual connections, and equipped with the novel hybrid deep supervision (HDS) scheme for end-to-end multi-task learning. As an extension of deep supervision (DS), HDS not only can force the model to learn more discriminative features like DS, but also seamlessly integrates segmentation and classification tasks into one model, thus the model can benefit from both pixel-wise and image-wise supervisions. We extensively validate the proposed method on the widely-used INbreast dataset. Ablation study corroborates that pixel-wise and image-wise supervisions are mutually beneficial, evidencing the efficacy of HDS. The results of 5-fold cross validation indicate that our unified model matches state-of-the-art performance on both mammogram segmentation and classification tasks, which achieves an average segmentation Dice similarity coefficient (DSC) of 0.85 and a classification accuracy of 0.89. The code is available at https://github.com/angrypudding/hybrid-ds.

• [cs.CV]Fully Dense UNet for 2D Sparse Photoacoustic Tomography Artifact Removal
Steven Guan, Amir Khan, Siddhartha Sikdar, Parag V. Chitnis
http://arxiv.org/abs/1808.10848v1

Photoacoustic imaging is an emerging imaging modality that is based upon the photoacoustic effect. In photoacoustic tomography (PAT), the induced acoustic pressure waves are measured by an array of detectors and used to reconstruct an image of the initial pressure distribution. A common challenge faced in PAT is that the measured acoustic waves can only be sparsely sampled. Reconstructing sparsely sampled data using standard methods results in severe artifacts that obscure information within the image. We propose a novel convolutional neural network (CNN) architecture termed Fully Dense UNet (FD-UNet) for removing artifacts from 2D PAT images reconstructed from sparse data and compare the proposed CNN with the standard UNet in terms of reconstructed image quality.

• [cs.CV]MobiBits: Multimodal Mobile Biometric Database
Ewelina Bartuzi, Katarzyna Roszczewska, Mateusz Trokielewicz, Radosław Białobrzeski
http://arxiv.org/abs/1808.10710v1

This paper presents a novel database comprising representations of five different biometric characteristics, collected in a mobile, unconstrained or semi-constrained setting with three different mobile devices, including characteristics previously unavailable in existing datasets, namely hand images, thermal hand images, and thermal face images, all acquired with a mobile, off-the-shelf device. In addition to this collection of data we perform an extensive set of experiments providing insight on benchmark recognition performance that can be achieved with these data, carried out with existing commercial and academic biometric solutions. This is the first known to us mobile biometric database introducing samples of biometric traits such as thermal hand images and thermal face images. We hope that this contribution will make a valuable addition to the already existing databases and enable new experiments and studies in the field of mobile authentication. The MobiBits database is made publicly available to the research community at no cost for non-commercial purposes.

• [cs.CV]Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic Retinopathy Grading Kang
Kang Zhou, Zaiwang Gu, Wen Liu, Weixin Luo, Jun Cheng, Shenghua Gao, Jiang Liu
http://arxiv.org/abs/1808.10564v1

Diabetic Retinopathy (DR) is a non-negligible eye disease among patients with Diabetes Mellitus, and automatic retinal image analysis algorithm for the DR screening is in high demand. Considering the resolution of retinal image is very high, where small pathological tissues can be detected only with large resolution image and large local receptive field are required to identify those late stage disease, but directly training a neural network with very deep architecture and high resolution image is both time computational expensive and difficult because of gradient vanishing/exploding problem, we propose a \textbf{Multi-Cell} architecture which gradually increases the depth of deep neural network and the resolution of input image, which both boosts the training time but also improves the classification accuracy. Further, considering the different stages of DR actually progress gradually, which means the labels of different stages are related. To considering the relationships of images with different stages, we propose a \textbf{Multi-Task} learning strategy which predicts the label with both classification and regression. Experimental results on the Kaggle dataset show that our method achieves a Kappa of 0.841 on test set which is the 4-th rank of all state-of-the-arts methods. Further, our Multi-Cell Multi-Task Convolutional Neural Networks (M$^2$CNN) solution is a general framework, which can be readily integrated with many other deep neural network architectures.

• [cs.CV]Seeing Colors: Learning Semantic Text Encoding for Classification
Shah Nawaz, Alessandro Calefati, Muhammad Kamran Janjua, Ignazio Gallo
http://arxiv.org/abs/1808.10822v1

The question we answer with this work is: can we convert a text document into an image to exploit best image classification models to classify documents? To answer this question we present a novel text classification method which converts a text document into an encoded image, using word embedding and capabilities of Convolutional Neural Networks (CNNs), successfully employed in image classification. We evaluate our approach by obtaining promising results on some well-known benchmark datasets for text classification. This work allows the application of many of the advanced CNN architectures developed for Computer Vision to Natural Language Processing. We test the proposed approach on a multi-modal dataset, proving that it is possible to use a single deep model to represent text and image in the same feature space.

• [cs.CV]Spoofing PRNU Patterns of Iris Sensors while Preserving Iris Recognition
Sudipta Banerjee, Vahid Mirjalili, Arun Ross
http://arxiv.org/abs/1808.10765v1

The principle of Photo Response Non-Uniformity (PRNU) is used to link an image with its source, i.e., the sensor that produced it. In this work, we investigate if it is possible to modify an iris image acquired using one sensor in order to spoof the PRNU noise pattern of a different sensor. In this regard, we develop an image perturbation routine that iteratively modifies blocks of pixels in the original iris image such that its PRNU pattern approaches that of a target sensor. Experiments indicate the efficacy of the proposed perturbation method in spoofing PRNU patterns present in an iris image whilst still retaining its biometric content.

• [cs.DC]Scalable Manifold Learning for Big Data with Apache Spark
Frank Schoeneman, Jaroslaw Zola
http://arxiv.org/abs/1808.10776v1

Non-linear spectral dimensionality reduction methods, such as Isomap, remain important technique for learning manifolds. However, due to computational complexity, exact manifold learning using Isomap is currently impossible from large-scale data. In this paper, we propose a distributed memory framework implementing end-to-end exact Isomap under Apache Spark model. We show how each critical step of the Isomap algorithm can be efficiently realized using basic Spark model, without the need to provision data in the secondary storage. We show how the entire method can be implemented using PySpark, offloading compute intensive linear algebra routines to BLAS. Through experimental results, we demonstrate excellent scalability of our method, and we show that it can process datasets orders of magnitude larger than what is currently possible, using a 25-node parallel~cluster.

• [cs.DS]Graph reduction by local variation
Andreas Loukas
http://arxiv.org/abs/1808.10650v1

How can we reduce the size of a graph without significantly altering its basic properties? We approach the graph reduction problem from the perspective of restricted similarity, a modification of a well-known measure for graph approximation. Our choice is motivated by the observation that restricted similarity implies strong spectral guarantees and can be used to prove statements about certain unsupervised learning problems. The paper then focuses on coarsening, a popular type of graph reduction. We derive sufficient conditions for a small graph to approximate a larger one in the sense of restricted similarity. Our theoretical findings give rise to a novel quasi-linear algorithm. Compared to both standard and advanced graph reduction methods, the proposed algorithm finds coarse graphs of improved quality -often by a large margin- without sacrificing speed.

• [cs.ET]Learning in Memristive Neural Network Architectures using Analog Backpropagation Circuits
Olga Krestinskaya, Khaled Nabil Salama, Alex Pappachen James
http://arxiv.org/abs/1808.10631v1

The on-chip implementation of learning algorithms would speed-up the training of neural networks in crossbar arrays. The circuit level design and implementation of backpropagation algorithm using gradient descent operation for neural network architectures is an open problem. In this paper, we proposed the analog backpropagation learning circuits for various memristive learning architectures, such as Deep Neural Network (DNN), Binary Neural Network (BNN), Multiple Neural Network (MNN), Hierarchical Temporal Memory (HTM) and Long-Short Term Memory (LSTM). The circuit design and verification is done using TSMC 180nm CMOS process models, and TiO2 based memristor models. The application level validations of the system are done using XOR problem, MNIST character and Yale face image databases

• [cs.IR]Content-based feature exploration for transparent music recommendation using self-attentive genre classification
Seungjin Lee, Juheon Lee, Kyogu lee
http://arxiv.org/abs/1808.10600v1

Interpretation of retrieved results is an important issue in music recommender systems, particularly from a user perspective. In this study, we investigate the methods for providing interpretability of content features using self-attention. We extract lyric features with the self-attentive genre classification model trained on 140,000 tracks of lyrics. Likewise, we extract acoustic features using the acoustic model with self-attention trained on 120,000 tracks of acoustic signals. The experimental results show that the proposed methods provide the characteristics that are interpretable in terms of both lyrical and musical contents. We demonstrate this by visualizing the attention weights, and by presenting the most similar songs found using lyric or audio features.

• [cs.IR]Spectral Collaborative Filtering
Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, Philip S. Yu
http://arxiv.org/abs/1808.10523v1

Despite the popularity of Collaborative Filtering (CF), CF-based methods are haunted by the \textit{cold-start} problem, which has a significantly negative impact on users' experiences with Recommender Systems (RS). In this paper, to overcome the aforementioned drawback, we first formulate the relationships between users and items as a bipartite graph. Then, we propose a new spectral convolution operation directly performing in the \textit{spectral domain}, where not only the proximity information of a graph but also the connectivity information hidden in the graph are revealed. With the proposed spectral convolution operation, we build a deep recommendation model called Spectral Collaborative Filtering (SpectralCF). Benefiting from the rich information of connectivity existing in the \textit{spectral domain}, SpectralCF is capable of discovering deep connections between users and items and therefore, alleviates the \textit{cold-start} problem for CF. To the best of our knowledge, SpectralCF is the first CF-based method directly learning from the \textit{spectral domains} of user-item bipartite graphs. We apply our method on several standard datasets. It is shown that SpectralCF significantly outperforms state-of-the-art models. Code and data are available at \url{https://github.com/lzheng21/SpectralCF}.

• [cs.IT]Impact of Device Orientation on Error Performance of LiFi Systems
Mohammad Dehghani Soltani, Ardimas Andi Purwita, Iman Tavakkolnia, Harald Haas, Majid Safari
http://arxiv.org/abs/1808.10476v1

Most studies on optical wireless communications (OWCs) have neglected the effect of random orientation in their performance analysis due to the lack of a proper model for the random orientation. Our recent empirical-based research illustrates that the random orientation follows a Laplace distribution for static user equipment (UE). In this paper, we analyze the device orientation and assess its importance on system performance. The probability of establishing a line-of-sight link is investigated and the probability density function (PDF) of signal-to-noise ratio (SNR) for a randomly-oriented device is derived. By means of the PDF of SNR, the bit-error ratio (BER) of DC biased optical orthogonal frequency division multiplexing (DCO-OFDM) in additive white Gaussian noise (AWGN) channels is evaluated. A closed form approximation for the BER of UE with random orientation is presented which shows a good match with Monte-Carlo simulation results.

• [cs.LG]A Multi-layer Gaussian Process for Motor Symptom Estimation in People with Parkinson's Disease
Muriel Lang, Urban Fietzek, Jakob Fröhner, Franz M. J. Pfister, Daniel Pichler, Kian Abedinpour, Terry T. Um, Dana Kulić, Satoshi Endo, Sandra Hirche
http://arxiv.org/abs/1808.10663v1

The assessment of Parkinson's disease (PD) poses a significant challenge as it is influenced by various factors which lead to a complex and fluctuating symptom manifestation. Thus, a frequent and objective PD assessment is highly valuable for effective health management of people with Parkinson's disease (PwP). Here, we propose a method for monitoring PwP by stochastically modeling the relationships between their wrist movements during unscripted daily activities and corresponding annotations about clinical displays of movement abnormalities. We approach the estimation of PD motor signs by independently modeling and hierarchically stacking Gaussian process models for three classes of commonly observed movement abnormalities in PwP including tremor, (non-tremulous) bradykinesia, and (non-tremulous) dyskinesia. We use clinically adopted severity measures as annotations for training the models, thus allowing our multi-layer Gaussian process prediction models to estimate not only their presence but also their severities. The experimental validation of our approach demonstrates strong agreement of the model predictions with these PD annotations. Our results show the proposed method produces promising results in objective monitoring of movement abnormalities of PD in the presence of arbitrary and unknown voluntary motions, and makes an important step towards continuous monitoring of PD in the home environment.

• [cs.LG]A novel extension of Generalized Low-Rank Approximation of Matrices based on multiple-pairs of transformations
Soheil Ahmadi, Mansoor Rezghi
http://arxiv.org/abs/1808.10632v1

Dimension reduction is a main step in learning process which plays a essential role in many applications. The most popular methods in this field like SVD, PCA, and LDA, only can apply to vector data. This means that for higher order data like matrices or more generally tensors, data should be fold to a vector. By this folding, the probability of overfitting is increased and also maybe some important spatial features are ignored. Then, to tackle these issues, methods are proposed which work directly on data with their own format like GLRAM, MPCA, and MLDA. In these methods the spatial relationship among data are preserved and furthermore, the probability of overfitiing has fallen. Also the time and space complexity are less than vector-based ones. Having said that, because of the less parameters in multilinear methods, they have a much smaller search space to find an optimal answer in comparison with vector-based approach. To overcome this drawback of multilinear methods like GLRAM, we proposed a new method which is a general form of GLRAM and by preserving the merits of it have a larger search space. We have done plenty of experiments to show that our proposed method works better than GLRAM. Also, applying this approach to other multilinear dimension reduction methods like MPCA and MLDA is straightforwar

• [cs.LG]A novel graph-based model for hybrid recommendations in cold-start scenarios
Cesare Bernardis, Maurizio Ferrari Dacrema, Paolo Cremonesi
http://arxiv.org/abs/1808.10664v1

Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid content-based approaches, which usually yield to lower recommendation quality than collaborative ones. Some techniques to optimize performance of this type of approaches have been studied in recent past. One of them is called feature weighting, which assigns to every feature a real value, called weight, that estimates its importance. Statistical techniques for feature weighting commonly used in Information Retrieval, like TF-IDF, have been adapted for Recommender Systems, but they often do not provide sufficient quality improvements. More recent approaches, FBSM and LFW, estimate weights by leveraging collaborative information via machine learning, in order to learn the importance of a feature based on other users opinions. This type of models have shown promising results compared to classic statistical analyzes cited previously. We propose a novel graph, feature-based machine learning model to face the cold-start item scenario, learning the relevance of features from probabilities of item-based collaborative filtering algorithms.

• [cs.LG]APES: a Python toolbox for simulating reinforcement learning environments
Aqeel Labash, Ardi Tampuu, Tambet Matiisen, Jaan Aru, Raul Vicente
http://arxiv.org/abs/1808.10692v1

Assisted by neural networks, reinforcement learning agents have been able to solve increasingly complex tasks over the last years. The simulation environment in which the agents interact is an essential component in any reinforcement learning problem. The environment simulates the dynamics of the agents' world and hence provides feedback to their actions in terms of state observations and external rewards. To ease the design and simulation of such environments this work introduces $\texttt{APES}$, a highly customizable and open source package in Python to create 2D grid-world environments for reinforcement learning problems. $\texttt{APES}$ equips agents with algorithms to simulate any field of vision, it allows the creation and positioning of items and rewards according to user-defined rules, and supports the interaction of multiple agents.

• [cs.LG]Adaptation and Robust Learning of Probabilistic Movement Primitives
Sebastian Gomez-Gonzalez, Gerhard Neumann, Bernhard Schölkopf, Jan Peters
http://arxiv.org/abs/1808.10648v1

Probabilistic representations of movement primitives open important new possibilities for machine learning in robotics. These representations are able to capture the variability of the demonstrations from a teacher as a probability distribution over trajectories, providing a sensible region of exploration and the ability to adapt to changes in the robot environment. However, to be able to capture variability and correlations between different joints, a probabilistic movement primitive requires the estimation of a larger number of parameters compared to their deterministic counterparts, that focus on modeling only the mean behavior. In this paper, we make use of prior distributions over the parameters of a probabilistic movement primitive to make robust estimates of the parameters with few training instances. In addition, we introduce general purpose operators to adapt movement primitives in joint and task space. The proposed training method and adaptation operators are tested in a coffee preparation and in robot table tennis task. In the coffee preparation task we evaluate the generalization performance to changes in the location of the coffee grinder and brewing chamber in a target area, achieving the desired behavior after only two demonstrations. In the table tennis task we evaluate the hit and return rates, outperforming previous approaches while using fewer task specific heuristics.

• [cs.LG]Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information
Henry Charlesworth
http://arxiv.org/abs/1808.10442v1

We introduce a new virtual environment for simulating a card game known as "Big 2". This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed "Proximal Policy Optimization" algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.

• [cs.LG]Bayesian Classifier for Route Prediction with Markov Chains
Jonathan P. Epperlein, Julien Monteil, Mingming Liu, Yingqi Gu, Sergiy Zhuk, Robert Shorten
http://arxiv.org/abs/1808.10705v1

We present here a general framework and a specific algorithm for predicting the destination, route, or more generally a pattern, of an ongoing journey, building on the recent work of [Y. Lassoued, J. Monteil, Y. Gu, G. Russo, R. Shorten, and M. Mevissen, "Hidden Markov model for route and destination prediction," in IEEE International Conference on Intelligent Transportation Systems, 2017]. In the presented framework, known journey patterns are modelled as stochastic processes, emitting the road segments visited during the journey, and the ongoing journey is predicted by updating the posterior probability of each journey pattern given the road segments visited so far. In this contribution, we use Markov chains as models for the journey patterns, and consider the prediction as final, once one of the posterior probabilities crosses a predefined threshold. Despite the simplicity of both, examples run on a synthetic dataset demonstrate high accuracy of the made predictions.

• [cs.LG]Directed Exploration in PAC Model-Free Reinforcement Learning
Min-hwan Oh, Garud Iyengar
http://arxiv.org/abs/1808.10552v1

We study an exploration method for model-free RL that generalizes the counter-based exploration bonus methods and takes into account long term exploratory value of actions rather than a single step look-ahead. We propose a model-free RL method that modifies Delayed Q-learning and utilizes the long-term exploration bonus with provable efficiency. We show that our proposed method finds a near-optimal policy in polynomial time (PAC-MDP), and also provide experimental evidence that our proposed algorithm is an efficient exploration method.

• [cs.LG]Learning Data-adaptive Nonparametric Kernels
Fanghui Liu, Xiaolin Huang, Chen Gong, Jie Yang, Li Li
http://arxiv.org/abs/1808.10724v1

Traditional kernels or their combinations are often not sufficiently flexible to fit the data in complicated practical tasks. In this paper, we present a Data-Adaptive Nonparametric Kernel (DANK) learning framework by imposing an adaptive matrix on the kernel/Gram matrix in an entry-wise strategy. Since we do not specify the formulation of the adaptive matrix, each entry in it can be directly and flexibly learned from the data. Therefore, the solution space of the learned kernel is largely expanded, which makes DANK flexible to adapt to the data. Specifically, the proposed kernel learning framework can be seamlessly embedded to support vector machines (SVM) and support vector regression (SVR), which has the capability of enlarging the margin between classes and reducing the model generalization error. Theoretically, we demonstrate that the objective function of our devised model is gradient-Lipschitz continuous. Thereby, the training process for kernel and parameter learning in SVM/SVR can be efficiently optimized in a unified framework. Further, to address the scalability issue in DANK, a decomposition-based scalable approach is developed, of which the effectiveness is demonstrated by both empirical studies and theoretical guarantees. Experimentally, our method outperforms other representative kernel learning based algorithms on various classification and regression benchmark datasets.

• [cs.LG]Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti
Nikita Gordienko, Peng Gang, Yuri Gordienko, Wei Zeng, Oleg Alienin, Oleksandr Rokovyi, Sergii Stirenko
http://arxiv.org/abs/1808.10862v1

Machine learning techniques are presented for automatic recognition of the historical letters (XI-XVIII centuries) carved on the stoned walls of St.Sophia cathedral in Kyiv (Ukraine). A new image dataset of these carved Glagolitic and Cyrillic letters (CGCL) was assembled and pre-processed for recognition and prediction by machine learning methods. The dataset consists of more than 4000 images for 34 types of letters. The explanatory data analysis of CGCL and notMNIST datasets shown that the carved letters can hardly be differentiated by dimensionality reduction methods, for example, by t-distributed stochastic neighbor embedding (tSNE) due to the worse letter representation by stone carving in comparison to hand writing. The multinomial logistic regression (MLR) and a 2D convolutional neural network (CNN) models were applied. The MLR model demonstrated the area under curve (AUC) values for receiver operating characteristic (ROC) are not lower than 0.92 and 0.60 for notMNIST and CGCL, respectively. The CNN model gave AUC values close to 0.99 for both notMNIST and CGCL (despite the much smaller size and quality of CGCL in comparison to notMNIST) under condition of the high lossy data augmentation. CGCL dataset was published to be available for the data science community as an open source resource.

• [cs.LG]Proximity Forest: An effective and scalable distance-based classifier for time series
Benjamin Lucas, Ahmed Shifaz, Charlotte Pelletier, Lachlan O'Neill, Nayyar Zaidi, Bart Goethals, Francois Petitjean, Geoffrey I. Webb
http://arxiv.org/abs/1808.10594v1

Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10 thousand time series only; which may explain why the primary research focus has been in creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state of the art models Elastic Ensemble and COTE.

• [cs.LG]Tensor Embedding: A Supervised Framework for Human Behavioral Data Mining and Prediction
Homa Hosseinmardi, Amir Ghasemian, Shrikanth Narayanan, Kristina Lerman, Emilio Ferrara
http://arxiv.org/abs/1808.10867v1

Today's densely instrumented world offers tremendous opportunities for continuous acquisition and analysis of multimodal sensor data providing temporal characterization of an individual's behaviors. Is it possible to efficiently couple such rich sensor data with predictive modeling techniques to provide contextual, and insightful assessments of individual performance and wellbeing? Prediction of different aspects of human behavior from these noisy, incomplete, and heterogeneous bio-behavioral temporal data is a challenging problem, beyond unsupervised discovery of latent structures. We propose a Supervised Tensor Embedding (STE) algorithm for high dimension multimodal data with join decomposition of input and target variable. Furthermore, we show that features selection will help to reduce the contamination in the prediction and increase the performance. The efficiently of the methods was tested via two different real world datasets.

• [cs.LO]Finite LTL Synthesis with Environment Assumptions and Quality Measures
Alberto Camacho, Meghyn Bienvenu, Sheila A. McIlraith
http://arxiv.org/abs/1808.10831v1

In this paper, we investigate the problem of synthesizing strategies for linear temporal logic (LTL) specifications that are interpreted over finite traces -- a problem that is central to the automated construction of controllers, robot programs, and business processes. We study a natural variant of the finite LTL synthesis problem in which strategy guarantees are predicated on specified environment behavior. We further explore a quantitative extension of LTL that supports specification of quality measures, utilizing it to synthesize high-quality strategies. We propose new notions of optimality and associated algorithms that yield strategies that best satisfy specified quality measures. Our algorithms utilize an automata-game approach, positioning them well for future implementation via existing state-of-the-art techniques.

• [cs.NE]Algoritmos Genéticos Aplicado ao Problema de Roteamento de Veículos
Felipe F. Müller, Luis A. A. Meira
http://arxiv.org/abs/1808.10866v1

Routing problems are often faced by companies who serve costumers through vehicles. Such problems have a challenging structure to optimize, despite the recent advances in combinatorial optimization. The goal of this project is to study and propose optimization algorithms to the vehicle routing problems (VRP). Focus will be on the problem variant in which the length of the route is restricted by a constant. A real problem will be tackled: optimization of postmen routes. Such problem was modeled as {multi-objective} in a roadmap with 25 vehicles and {30,000 deliveries} per day.

• [cs.NE]Autonomous Configuration of Network Parameters in Operating Systems using Evolutionary Algorithms
Bartosz Gembala, Anis Yazidi, Hårek Haugerud, Stefano Nichele
http://arxiv.org/abs/1808.10733v1

By default, the Linux network stack is not configured for highspeed large file transfer. The reason behind this is to save memory resources. It is possible to tune the Linux network stack by increasing the network buffers size for high-speed networks that connect server systems in order to handle more network packets. However, there are also several other TCP/IP parameters that can be tuned in an Operating System (OS). In this paper, we leverage Genetic Algorithms (GAs) to devise a system which learns from the history of the network traffic and uses this knowledge to optimize the current performance by adjusting the parameters. This can be done for a standard Linux kernel using sysctl or /proc. For a Virtual Machine (VM), virtually any type of OS can be installed and an image can swiftly be compiled and deployed. By being a sandboxed environment, risky configurations can be tested without the danger of harming the system. Different scenarios for network parameter configurations are thoroughly tested, and an increase of up to 65% throughput speed is achieved compared to the default Linux configuration.

• [cs.RO]Bioinspired Straight Walking Task-Space Planner
Carlo Tiseo, Kalyana C Veluvolu, Wei Tech Ang
http://arxiv.org/abs/1808.10799v1

Although the attention on bipedal locomotion has increased over the last decades, robots are still far behind compared to human locomotor abilities. Their performance limitations can be partially attributed to the hardware, but the primary constrain has been the poor understanding of bipedal dynamics. Based on the recently developed model of potential energy for bipedal structures, this work proposes a task-space planner for human-like straight locomotion. The proposed architecture is based on potential energy model and employs locomotor strategies obtained from human data as a reference for optimal behaviour. The model generates CoM trajectory, foot swing trajectory and the base of support from the knowledge of the desired speed, initial posture, height, weight, number of steps and the angle between the foot and the ground during heel-strike. The data show that the proposed architecture can generate behaviour in line with human walking strategies for both the CoM and the foot swing. Although the planned trajectory is not smooth compared to human trajectories, the proposed model significantly reduces the error in the estimation of the CoM vertical trajectory. Moreover, being the planner able to generate a single stride in less than 140 ms and sequences of 10 strides in less than 600 ms, it allows an online task-space planning for locomotion. Lastly, the proposed architecture is also supported by analogies with current theories on human motor control of locomotion.

• [cs.RO]Gradual Collective Upgrade of a Swarm of Autonomous Buoys for Dynamic Ocean Monitoring
Francesco Vallegra, David Mateo, Grgur Tokić, Roland Bouffanais, Dick K. P. Yue
http://arxiv.org/abs/1808.10617v1

Swarms of autonomous surface vehicles equipped with environmental sensors and decentralized communications bring a new wave of attractive possibilities for the monitoring of dynamic features in oceans and other waterbodies. However, a key challenge in swarm robotics design is the efficient collective operation of heterogeneous systems. We present both theoretical analysis and field experiments on the responsiveness in dynamic area coverage of a collective of 22 autonomous buoys, where 4 units are upgraded to a new design that allows them to move 80% faster than the rest. This system is able to react on timescales of the minute to changes in areas on the order of a few thousand square meters. We have observed that this partial upgrade of the system significantly increases its average responsiveness, without necessarily improving the spatial uniformity of the deployment. These experiments show that the autonomous buoy designs and the cooperative control rule described in this work provide an efficient, flexible, and scalable solution for the pervasive and persistent monitoring of water environments.

• [cs.RO]Modified Self-Organized Task Allocation in a Group of Robots
Chang Liu
http://arxiv.org/abs/1808.10444v1

This paper introduces a modified self-organized task allocation algorithm, where robots are assigned to pick up one of the two types of object. This paper also demonstrates both algorithms by showing the simulation results of the conventional self-organized task allocation algorithm and the simulation results of its modification.

• [cs.RO]PythonRobotics: a Python code collection of robotics algorithms
Atsushi Sakai, Daniel Ingram, Joseph Dinius, Karan Chawla, Antonin Raffin, Alexis Paques
http://arxiv.org/abs/1808.10703v1

This paper describes an Open Source Software (OSS) project: PythonRobotics. This is a collection of robotics algorithms implemented in the Python programming language. The focus of the project is on autonomous navigation, and the goal is for beginners in robotics to understand the basic ideas behind each algorithm. In this project, the algorithms which are practical and widely used in both academia and industry are selected. Each sample code is written in Python3 and only depends on some standard modules for readability and ease of use. Each algorithm is written in Python3 and only depends on some common modules for readability, portability and ease of use. It includes intuitive animations to understand the behavior of the simulation.

• [cs.SI]Diversity, Topology, and the Risk of Node Re-identification in Labeled Social Graphs
Sameera Horawalavithana, Clayton Gandy, Juan Arroyo Flores, John Skvoretz, Adriana Iamnitchi
http://arxiv.org/abs/1808.10837v1

Real network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. When nodes have associated attributes, the privacy risks increase. In this paper we quantitatively study the impact of binary node attributes on node privacy by employing machine-learning-based re-identification attacks and exploring the interplay between graph topology and attribute placement. Our experiments show that the population's diversity on the binary attribute consistently degrades anonymity.

• [cs.SI]Influence Dynamics and Consensus in an Opinion-Neighborhood based Modified Vicsek-like Social Network
Narayani Vedam, Debasish Ghose
http://arxiv.org/abs/1808.10716v1

We propose a modified Vicsek-like model to study influence dynamics and opinion formation in social networks. We work on the premise that opinions of members of a group may be considered to be analogous to the direction of motion of a particle in space. The opinions are susceptible to change under the influence of familiar individuals who maintain similar beliefs. This is unlike the bounded-confidence models which solely rely on interactions based on closeness of opinions. The influence network evolves either when similar-minded individuals acquaint or when they fall out over their beliefs. This yields an adaptive network to which are assigned dynamic centrality scores and varying influence strengths. A mix of individuals - rigid and flexible - is assumed to constitute groups - liberal and conservative. We analyse emergent group behaviours subject to different initial conditions, agent types, their densities and tolerances. The model accurately predicts the role of rigid agents in hampering consensus. Also, a few structural properties of the dynamic network, which result as a consequence of the proposed model have been established.

• [cs.SI]Securing Tag-based recommender systems against profile injection attacks: A comparative study
Georgios Pitsilis, Heri Ramampiaro, Helge Langseth
http://arxiv.org/abs/1808.10550v1

This work addresses challenges related to attacks on social tagging systems, which often comes in a form of malicious annotations or profile injection attacks. In particular, we study various countermeasures against two types of threats for such systems, the Overload and the Piggyback attacks. The studied countermeasures include baseline classifiers such as, Naive Bayes filter and Support Vector Machine, as well as a deep learning-based approach. Our evaluation performed over synthetic spam data, generated from del.icio.us, shows that in most cases, the deep learning-based approach provides the best protection against threats.

• [eess.IV]Automatic Lung Cancer Prediction from Chest X-ray Images Using Deep Learning Approach
Worawate Ausawalaithong, Sanparith Marukatat, Arjaree Thirach, Theerawit Wilaiprasitporn
http://arxiv.org/abs/1808.10858v1

Since, cancer is curable when diagnosed at an early stage, lung cancer screening plays an important role in preventive care. Although both low dose computed tomography (LDCT) and computed tomography (CT) scans provide more medical information than normal chest x-rays, there is very limited access to these technologies in rural areas. Recently, there is a trend in using computer-aided diagnosis (CADx) to assist in screening and diagnosing of cancer from biomedical images. In this study, the 121-layer convolutional neural network also known as DenseNet-121 by G. Huang et. al., along with the transfer learning scheme was explored as a means to classify lung cancer using chest X-ray images. The model was trained on a lung nodules dataset before training on the lung cancer dataset to alleviate the problem of a small dataset. The proposed model yields 74.43$\pm$6.01% of mean accuracy, 74.96$\pm$9.85% of mean specificity, and 74.68$\pm$15.33% of mean sensitivity. The proposed model also provides a heatmap for identifying the location of the lung nodule. These findings are promising for further development of chest x-ray-based lung cancer diagnosis using the deep learning approach. Moreover, these findings solve the problem of small dataset.

• [math.ST]Asymptotic Seed Bias in Respondent-driven Sampling
Yuling Yan, Bret Hanlon, Sebastien Roch, Karl Rohe
http://arxiv.org/abs/1808.10593v1

Respondent-driven sampling (RDS) collects a sample of individuals in a networked population by incentivizing the sampled individuals to refer their contacts into the sample. This iterative process is initialized from some seed node(s). Sometimes, this selection creates a large amount of seed bias. Other times, the seed bias is small. This paper gains a deeper understanding of this bias by characterizing its effect on the limiting distribution of various RDS estimators. Using classical tools and results from multi-type branching processes (Kesten and Stigum, 1966), we show that the seed bias is negligible for the Generalized Least Squares (GLS) estimator and non-negligible for both the inverse probability weighted and Volz-Heckathorn (VH) estimators. In particular, we show that (i) above a critical threshold, VH converge to a non-trivial mixture distribution, where the mixture component depends on the seed node, and the mixture distribution is possibly multi-modal. Moreover, (ii) GLS converges to a Gaussian distribution independent of the seed node, under a certain condition on the Markov process. Numerical experiments with both simulated data and empirical social networks suggest that these results appear to hold beyond the Markov conditions of the theorems.

• [math.ST]Bayesian quadrature and energy minimization for space-filling design
Luc Pronzato, Anatoly Zhigljavsky
http://arxiv.org/abs/1808.10722v1

A standard objective in computer experiments is to approximate the behaviour of an unknown function on a compact domain from a few evaluations inside the domain. When little is known about the function, space-filling design is advisable: typically, points of evaluation spread out across the available space are obtained by minimizing a geometrical (for instance, covering radius) or a discrepancy criterion measuring distance to uniformity. The paper investigates connections between design for integration (quadrature design), construction of the (continuous) BLUE for the location model, space-filling design, and minimization of energy (kernel discrepancy) for signed measures. Integrally strictly positive definite kernels define strictly convex energy functionals, with an equivalence between the notions of potential and directional derivative, showing the strong relation between discrepancy minimization and more traditional design of optimal experiments. In particular, kernel herding algorithms, which are special instances of vertex-direction methods used in optimal design, can be applied to the construction of point sequences with suitable space-filling properties.

• [math.ST]Determining the signal dimension in second order source separation
Joni Virta, Klaus Nordhausen
http://arxiv.org/abs/1808.10669v1

While an important topic in practice, the estimation of the number of non-noise components in blind source separation has received little attention in the literature. Recently, two bootstrap-based techniques for estimating the dimension were proposed, and although very efficient, they suffer from the long computation times caused by the resampling. We approach the problem from a large sample viewpoint and develop an asymptotic test for the true dimension. Our test statistic based on second-order temporal information has a very simple limiting distribution under the null hypothesis and requires no parameters to estimate. Comparisons to the resampling-based estimates show that the asymptotic test provides comparable error rates with significantly faster computation time. An application to sound recording data is used to illustrate the method in practice.

• [math.ST]On Second Order Conditions in the Multivariate Block Maxima and Peak over Threshold Method
Axel Bücher, Stanislav Volgushev, Nan Zou
http://arxiv.org/abs/1808.10828v1

Second order conditions provide a natural framework for establishing asymptotic results about estimators for tail related quantities. Such conditions are typically tailored to the estimation principle at hand, and may be vastly different for estimators based on the block maxima (BM) method or the peak-over-threshold (POT) approach. In this paper we provide details on the relationship between typical second order conditions for BM and POT methods in the multivariate case. We show that the two conditions typically imply each other, but with a possibly different second order parameter. The latter implies that, depending on the data generating process, one of the two methods can attain faster convergence rates than the other. The class of multivariate Archimax copulas is examined in detail; we find that this class contains models for which the second order parameter is smaller for the BM method and vice versa. The theory is illustrated by a small simulation study.

• [math.ST]Sup-norm adaptive simultaneous drift estimation for ergodic diffusions
Cathrine Aeckerle-Willems, Claudia Strauch
http://arxiv.org/abs/1808.10660v1

We consider the question of estimating the drift and the invariant density for a large class of scalar ergodic diffusion processes, based on continuous observations, in $\sup$-norm loss. The unknown drift $b$ is supposed to belong to a nonparametric class of smooth functions of unknown order. We suggest an adaptive approach which allows to construct drift estimators attaining minimax optimal $\sup$-norm rates of convergence. In addition, we prove a Donsker theorem for the classical kernel estimator of the invariant density and establish its semiparametric efficiency. Finally, we combine both results and propose a fully data-driven bandwidth selection procedure which simultaneously yields both a rate-optimal drift estimator and an asymptotically efficient estimator of the invariant density of the diffusion. Crucial tool for our investigation are uniform exponential inequalities for empirical processes of diffusions.

• [physics.bio-ph]Maximum Entropy Principle Analysis in Network Systems with Short-time Recordings
Zhi-Qin John Xu, Jennifer Crodelle, Douglas Zhou, David Cai
http://arxiv.org/abs/1808.10506v1

In many realistic systems, maximum entropy principle (MEP) analysis provides an effective characterization of the probability distribution of network states. However, to implement the MEP analysis, a sufficiently long-time data recording in general is often required, e.g., hours of spiking recordings of neurons in neuronal networks. The issue of whether the MEP analysis can be successfully applied to network systems with data from short recordings has yet to be fully addressed. In this work, we investigate relationships underlying the probability distributions, moments, and effective interactions in the MEP analysis and then show that, with short recordings of network dynamics, the MEP analysis can be applied to reconstructing probability distributions of network states under the condition of asynchronous activity of nodes in the network. Using spike trains obtained from both Hodgkin-Huxley neuronal networks and electrophysiological experiments, we verify our results and demonstrate that MEP analysis provides a tool to investigate the neuronal population coding properties, even for short recordings.

• [q-bio.PE]A global model for predicting the arrival of imported dengue infections
Jessica Liebig, Cassie Jansen, Dean Paini, Lauren Gardner, Raja Jurdak
http://arxiv.org/abs/1808.10591v1

With approximately half of the world's population at risk of contracting dengue, this mosquito-borne disease is of global concern. International travellers significantly contribute to dengue's rapid and large-scale spread by importing the disease from endemic into non-endemic countries. To prevent future outbreaks, knowledge about the arrival time and location of dengue infected travellers is crucial. We propose a network model that predicts the monthly number of dengue infected air passengers arriving at any given airport, considering international air travel volumes, incidence rates and temporal infection dynamics. We verify the model's output with dengue notification data from Australia and Europe. Our findings shed light onto dengue importation routes and further reveal country-specific reporting rates that are likely attributable to differences in awareness and notification policies.

• [stat.AP]Penalized Component Hub Models
Charles Weko, Yunpeng Zhao
http://arxiv.org/abs/1808.10563v1

Social network analysis presupposes that observed social behavior is influenced by an unobserved network. Traditional approaches to inferring the latent network use pairwise descriptive statistics that rely on a variety of measures of co-occurrence. While these techniques have proven useful in a wide range of applications, the literature does not describe the generating mechanism of the observed data from the network. In a previous article, the authors presented a technique which used a finite mixture model as the connection between the unobserved network and the observed social behavior. This model assumed that each group was the result of a star graph on a subset of the population. Thus, each group was the result of a leader who selected members of the population to be in the group. They called these hub models. This approach treats the network values as parameters of a model. However, this leads to a general challenge in estimating parameters which must be addressed. For small datasets there can be far more parameters to estimate than there are observations. Under these conditions, the estimated network can be unstable. In this article, we propose a solution which penalizes the number of nodes which can exert a leadership role. We implement this as a pseudo-Expectation Maximization algorithm. We demonstrate this technique through a series of simulations which show that when the number of leaders is sparse, parameter estimation is improved. Further, we apply this technique to a dataset of animal behavior and an example of recommender systems.

• [stat.AP]The Causal Effect of Answer Changing on Multiple-Choice Items
Yongnam Kim
http://arxiv.org/abs/1808.10577v1

The causal effect of changing initial answers on final scores is a long-standing puzzle in the educational and psychological measurement literature. This paper formalizes the question using the standard framework for causal inference, the potential outcomes framework. Our clear definitions of the treatment and corresponding counterfactuals, expressed with potential outcomes, allow us to estimate the causal effect of answer changing even without any study designs or modeling examinees' answer change behaviors. We separately define the average treatment effect and the average treatment effect on the treated, and show that each effect can be directly computed from the proportions of examinees' answer changing patterns. Our findings show that the traditional method in the literature of comparing the proportions of "wrong to right" and "right to wrong" patterns--a method which has recently been criticized--indeed correctly estimates the sign of the average answer changing effect but only for those examinees who actually changed their initial responses; this does not take into account those who retained their responses. We illustrate our procedures by reanalyzing van der Linden, Jeon, and Ferrara's (2011) data. The results show that the answer changing effect is heterogeneous such that it is positive to examinees who changed their initial responses but is negative to those who did not change the responses. We discuss theoretical and practical implications of our findings.

• [stat.AP]Understanding the Characteristics of Frequent Users of Emergency Departments: What Role Do Medical Conditions Play?
Jens Rauch, Jens Hüsers, Birgit Babitsch, Ursula Hübner
http://arxiv.org/abs/1808.10618v1

Frequent users of emergency departments (ED) pose a significant challenge to hospital emergency services. Despite a wealth of studies in this field, it is hardly understood, what medical conditions lead to frequent attendance. We examine (1) what ambulatory care sensitive conditions (ACSC) are linked to frequent use, (2) how frequent users can be clustered into subgroups with respect to their diagnoses, acuity and admittance, and (3) whether frequent use is related to higher acuity or admission rate. We identified several ACSC that highly increase the risk for heavy ED use, extracted four major diagnose subgroups and found no significant effect neither for acuity nor admission rate. Our study indicates that especially patients in need of (nursing) care form subgroups of frequent users, which implies that quality of care services might be crucial for tackling frequent use. Hospitals are advised to regularly analyze their ED data in the EHR to better align resources.

• [stat.ME]An explicit mean-covariance parameterization for multivariate response linear regression
Aaron J. Molstad, Guangwei Weng, Charles R. Doss, Adam J. Rothman
http://arxiv.org/abs/1808.10558v1

We develop a new method to fit the multivariate response linear regression model that exploits a parametric link between the regression coefficient matrix and the error covariance matrix. Specifically, we assume that the correlations between entries in the multivariate error random vector are proportional to the cosines of the angles between their corresponding regression coefficient matrix columns, so as the angle between two regression coefficient matrix columns decreases, the correlation between the corresponding errors increases. This assumption can be motivated through an error-in-variables formulation. We propose a novel non-convex weighted residual sum of squares criterion which exploits this parameterization and admits a new class of penalized estimators. The optimization is solved with an accelerated proximal gradient descent algorithm. Extensions to scenarios where responses are missing or some covariates are measured without error are also proposed. We use our method to study the association between gene expression and copy-number variations measured on patients with glioblastoma multiforme. An R package implementing our method, MCMVR, is available online.

• [stat.ME]Gaussian process regression for survival time prediction with genome-wide gene expression
Aaron J. Molstad, Li Hsu, Wei Sun
http://arxiv.org/abs/1808.10541v1

Predicting the survival time of a cancer patient based on his/her genome-wide gene expression remains a challenging problem. For certain types of cancer, the effects of gene expression on survival are both weak and abundant, so identifying nonzero effects with reasonable accuracy is difficult. As an alternative to methods that use variable selection, we propose a Gaussian process accelerated failure time model to predict survival time using genome-wide or pathway-wide gene expression data. Using a Monte Carlo EM algorithm, we jointly impute censored log-survival time and estimate model parameters. We demonstrate the performance of our method and its advantage over existing methods in both simulations and real data analysis. The real data that we analyze were collected from 513 patients with kidney renal clear cell carcinoma and include survival time, demographic/clinical variables, and expression of more than 20,000 genes. Our method is widely applicable as it can accommodate right, left, and interval censored outcomes; and provides a natural way to combine multiple types of high-dimensional -omics data. An R package implementing our method is available online.

• [stat.ME]Generalized probabilistic principal component analysis of correlated data
Mengyang Gu, Weining Shen
http://arxiv.org/abs/1808.10868v1

Principal component analysis (PCA) is a well-established tool in machine learning and data processing. \cite{tipping1999probabilistic} proposed a probabilistic formulation of PCA (PPCA) by showing that the principal axes in PCA are equivalent to the maximum marginal likelihood estimator of the factor loading matrix in a latent factor model for the observed data, assuming that the latent factors are independently distributed as standard normal distributions. However, the independence assumption may be unrealistic for many scenarios such as modeling multiple time series, spatial processes, and functional data, where the output variables are correlated. In this paper, we introduce the generalized probabilistic principal component analysis (GPPCA) to study the latent factor model of multiple correlated outcomes, where each factor is modeled by a Gaussian process. The proposed method provides a probabilistic solution of the latent factor model with the scalable computation. In particular, we derive the maximum marginal likelihood estimator of the factor loading matrix and the predictive distribution of the output. Based on the explicit expression of the precision matrix in the marginal likelihood, the number of the computational operations is linear to the number of output variables. Moreover, with the use of the Mat{'e}rn covariance function, the number of the computational operations is also linear to the number of time points for modeling the multiple time series without any approximation to the likelihood function. We discuss the connection of the GPPCA with other approaches such as the PCA and PPCA, and highlight the advantage of GPPCA in terms of the practical relevance, estimation accuracy and computational convenience. Numerical studies confirm the excellent finite-sample performance of the proposed approach.

• [stat.ME]Improved Chebyshev inequality: new probability bounds with known supremum of PDF
Tomohiro Nishiyama
http://arxiv.org/abs/1808.10770v1

In this paper, we derive new probability bounds for Chebyshev's inequality if the supremum of the probability density function is known. This result holds for one-dimensional or multivariate continuous probability distributions with finite mean and variance (covariance matrix). We also show that the similar result holds for specific discrete probability distributions.

• [stat.ML]Data-driven discovery of PDEs in complex datasets
Jens Berg, Kaj Nyström
http://arxiv.org/abs/1808.10788v1

Many processes in science and engineering can be described by partial differential equations (PDEs). Traditionally, PDEs are derived by considering first principles of physics to derive the relations between the involved physical quantities of interest. A different approach is to measure the quantities of interest and use deep learning to reverse engineer the PDEs which are describing the physical process. In this paper we use machine learning, and deep learning in particular, to discover PDEs hidden in complex data sets from measurement data. We include examples of data from a known model problem, and real data from weather station measurements. We show how necessary transformations of the input data amounts to coordinate transformations in the discovered PDE, and we elaborate on feature and model selection. It is shown that the dynamics of a non-linear, second order PDE can be accurately described by an ordinary differential equation which is automatically discovered by our deep learning algorithm. Even more interestingly, we show that similar results apply in the context of more complex simulations of the Swedish temperature distribution.

• [stat.ML]On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data
Nan Lu, Gang Niu, Aditya K. Menon, Masashi Sugiyama
http://arxiv.org/abs/1808.10585v1

Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM but not by clustering in the geometric space. A two-step ERM is considered: first an unbiased risk estimator is designed, and then the empirical training risk is minimized. This approach is advantageous in that we can also evaluate the empirical validation risk, which is indispensable for hyperparameter tuning when some validation data is split from U training data instead of labeled test data. We prove that designing such an estimator is impossible given a single set of U data, but it becomes possible given two sets of U data with different class priors. This answers a fundamental question in weakly-supervised learning, namely what the minimal supervision is for training any binary classifier from only U data. Since the proposed learning method is based on unbiased risk estimates, the asymptotic consistency of the learned classifier is certainly guaranteed. Experiments demonstrate that the proposed method could successfully train deep models like ResNet and outperform state-of-the-art methods for learning from two sets of U data.

• [stat.ML]Speaker Fluency Level Classification Using Machine Learning Techniques
Alan Preciado-Grijalva, Ramon F. Brena
http://arxiv.org/abs/1808.10556v1

Level assessment for foreign language students is necessary for putting them in the right level group, furthermore, interviewing students is a very time-consuming task, so we propose to automate the evaluation of speaker fluency level by implementing machine learning techniques. This work presents an audio processing system capable of classifying the level of fluency of non-native English speakers using five different machine learning models. As a first step, we have built our own dataset, which consists of labeled audio conversations in English between people ranging in different fluency domains/classes (low, intermediate, high). We segment the audio conversations into 5s non-overlapped audio clips to perform feature extraction on them. We start by extracting Mel cepstral coefficients from the audios, selecting 20 coefficients is an appropriate quantity for our data. We thereafter extracted zero-crossing rate, root mean square energy and spectral flux features, proving that this improves model performance. Out of a total of 1424 audio segments, with 70% training data and 30% test data, one of our trained models (support vector machine) achieved a classification accuracy of 94.39%, whereas the other four models passed an 89% classification accuracy threshold.