My research interests lie in proposing fundamental machine learning algorithms for solving computer vision and robot perception problems such as 3D object detection, 3D scene understanding and human activity recognition. I built machine learning systems to learn semantic, spatial and temporal structures from multiple domain sources such as cameras, RGB-D sensors etc. into various applications such as assistive robots and self-driving cars.
Our Watch-n-Patch work models human activities to learn about the complex relations in the activities when just given a completely unlabeled set of RGB-D videos. The novelty of our approach is the ability to model the long-range action relations in the temporal sequence, by considering pairwise action co-occurrence and temporal relations. Discovering these complex relations are useful to recognizing actions, anticipating future actions, and detecting forgotten actions. We also provide a new large-scale RGB-D activity video dataset recorded by the new Kinect v2. Our algorithm has enabled robots to help the elderly and the disabled. The robot is able to observe and remind elderly in case they forgot some tasks in the house, thus making their life easier.
Publications: CVPR 15', journel paper under review.
Our works focus on sparse coding techniques to achieve good representation of data for different problems in computer science. For image clustering, we propose a novel matrix factorization algorithm, which adds a local coordinate constraint to ensure the sparseness of the obtained representations. For image retrieval, we present a novel content-based-image-retrieval system depending on a novel query formulation considering three aspects: Object, Background and Condition. We apply a sparse coding technique on achieving good image bag-of-words features to give an effective ranking method for returning the desirable search results. For document clustering, we employ a Poisson distribution model to represent the word-count frequency feature of a text for sparse coding. A novel sparse constrained Poisson regression algorithm is proposed to solve the induced optimization problem.
Publications: TIP 13', ACM MM 12', IEEE Big Data 13'.
Bag-of-words features play an important role in image recognition. Our work Bilevel Visual Words Coding learns bag-of-words image features considering representation ability, discriminative power and efficiency. To further achieve an efficient coding referring to this rule, an online method is proposed to efficiently learn a projection of local descriptor to the visual words in the codebook. After projection, coding can be efficiently completed by a low dimensional localized soft-assignment. Our work gives an improvement of image classification on standard image classification benchmarks.
Publications: IJCAI 13'.
Our works focus on unsupervised learning human actions directly from a set of unlabeled data. First, we propose a Randomly Projected Binary Feature method to detect human copy events from a large scale set of videos. Copy event detection is useful for surveillance, copyright protection, and security. Using the fast similarity computations of the proposed feature, we present a novel keyframe-based copy retrieval method to retrieve video copies from the large video databaset. Our work on video copy detection performs good results (ranks 2nd and 4th on two evaluated metrics) on TREC Video Retrieval Evaluation 2011, which is sponsored by National Institute of Standards and Technology with additional support from other U.S. government agencies.
Publications: CVPR 12'.
Extracting good visual features is the first and most important step to many computer vision problems. The binary features has extremely fast similarity computation using Hamming distance and are much more storage efficient than other float value features. Our proposed Convolutional Treelets Binary Feature Approach not only improve the representation ability of data but also significantly improve the computation speed. This is very useful to real-time or large-scale big-data problems, and can be applied into many real-world practical problems such as object tracking, video copy detection, 3d reconstruction, etc. The work based on our keypoint recognition algorithm performs good visual tracking results (ranks 2nd and 3rd on two evaluated metrics) on the international Visual Object Tracking VOT2013 challenge.
Publications: ECCV 12'.
Human Centred Object Co-Segmetation
Chenxia Wu, Jiemi Zhang, Ashutosh Saxena, Silvio Savarese.
Cornell Tech Report, 2016. [ARXIV]
Watch-Bot: Unsupervised Learning for Reminding Humans of Forgotten Actions
Chenxia Wu, Jiemi Zhang, Bart Selman, Silvio Savarese, Ashutosh Saxena.
IEEE International Conference on Robotics and Automation (ICRA), 2016. [PDF][PROJECT]
Watch-n-Patch: Unsupervised Understanding of Actions and Relations.
Chenxia Wu, Jiemi Zhang, Silvio Savarese, Ashutosh Saxena.
Computer Vision and Pattern Recognition (CVPR), 2015. [PDF]
Nonnegative Local Coordinate Factorization for Image Representation.
Yan Chen, Jiemi Zhang, Deng Cai, Wei Liu, Xiaofei He.
IEEE Transactions on Image Processing (TIP), 2013. [PDF]
Bilevel Visual Words Coding for Image Classification.
Jiemi Zhang, Chenxia Wu, Jianke Zhu, Deng Cai.
International Joint Conference on Artificial Intelligence (IJCAI), 2013. [PDF]
Sparse Poisson coding for high dimensional document clustering.
Chenxia Wu, Haiqin Yang, Jianke Zhu, Jiemi Zhang, Irwin King, Michael R. Lyu.
IEEE International Conference on Big Data, 2013. [PDF]
Sparse Poisson Coding for High Dimensional Document Clustering
Chenxia Wu, Haiqin Yang, Jianke Zhu, Jiemi Zhang, Irwin King, Michael R Lyu.
IEEE International Conference on Big Data, 2013. [PDF]
Search Web Images Using Objects, Backgrounds and Conditions.
Jiemi Zhang, Chenxia Wu, Deng Cai.
ACM International Conference on Multimedia (ACM MM), 2012. [PDF]
A Convolutional Treelets Binary Feature Approach to Fast Keypoint Recognition.
Chenxia Wu, Jianke Zhu, Jiemi Zhang, Chun Chen, Deng Cai.
European Conference on Computer Vision (ECCV), 2012. [PDF]
A Content-based Video Copy Detection Method with Randomly Projected Binary Features.
Chenxia Wu, Jianke Zhu, Jiemi Zhang.
Computer Vision and Pattern Recognition (CVPR) Workshop on Large- Scale Video Search and Mining, 2012. [PDF]