We propose a 6DoF object pose estimation framework. Directly regressing object poses in 6d dimension is difficult due to the large search space. Instead, we employ a 2D object detector to firstly localize objects in the 2D image plane, and then predict poses within local regions. Moreover, the detected bounding boxes can be used to filter out some inaccurate object segmentation or reduce the regions to be segmented. Thus, the bounding boxes can either facilitate object keypoint localization or reduce computational cost of following pose estimation network since poses are now estimated in local regions. We train one network for each one object.