A reinforcement learning approach to age of information in multi-user networks with HARQ