Well, loss (error) is loss and if it increases that is not good. All the math looks okay and, certainly, sigmoid function and binary cross entropy loss function *should* be fine.
The problem is that the input features and targets give crazy raw data for a single node to learn: three points - the first is True, move a little and the next is False, move a littler more and the next is True. Not going to be learned by a single node.
Add data and it becomes more understandable. Or add an output node. Anyway, the math is good but unfortunate that it runs on data that causes the divergence.