This paper considers the implementation of non-blocking concurrent objects on shared-memory multiprocessors. Real multiprocessors have properties not present in theoretical models; these properties can be exploited to design non-blocking protocols that are more efficient in practice than those allowed by theoretical models. These new protocols rely on the operating system to take action when a thread of control is delayed during its non-blocking update. We illustrate the effectiveness of this approach by presenting two protocols that address factors hindering the performance of Herlihy's standard non-blocking protocol [Herlihy 90, Herlihy 91a]. These factors are: resources wasted by attempted non-blocking operations that fail, and the cost of data copying. We demonstrate the importance of these factors experimentally, and show how they can be reduced using protocols that rely on operating system support. To reduce the overhead of failing non-blocking operations, our first protocol maintains information about the utilization of the shared object; experiments show that this protocol performs better than the known alternatives. To reduce the cost of data copying, we introduce a second, optimistic protocol that avoids copying, except in the case when a thread of control is delayed during its attempted update.