Construction robots have drawn increased attention as a potential means of improving construction safety and productivity. However, it is still challenging to ensure safe human-robot collaboration on dynamic and unstructured construction workspaces. On construction sites, multiple entities dynamically collaborate with each other and the situational context between them evolves continually. Construction robots must therefore be equipped to visually understand the scene's contexts (i.e., semantic relations to surrounding entities), thereby safely collaborating with humans, as a human vision system does. Toward this end, this study builds a unique deep neural network architecture and develops a construction-specialized model by experimenting multiple fine-tuning scenarios. Also, this study evaluates its performance on real construction operations data in order to examine its potential toward real-world applications. The results showed the promising performance of the tuned model: the recall@5 on training and validation dataset reached 92% and 67%, respectively. The proposed method, which supports construction co-robots with the holistic scene understanding, is expected to contribute to promoting safer human-robot collaboration in construction.