Algorithms like Adam use a different step size per coordinate. That's the coordinate-wise part. Those step sizes depend on the gradients observed in the past. If you run them on different data, you would get different step sizes. That's the data-dependent part.

Hope this clarifies the confusion. Let me know otherwise!

## Project topic

" AdaGrad / Adam / signSGD: Can you suggest/try different data-dependent coordinate-wise learning rates schemes and compare them?"What does " data-dependent" means? If we work with images, any type of image would fit?

Ok, that's clear! Thank you!

