Hi, Dalila, Thanks for reminds me the MI scope.
According to:
MI(feature;target) = Entropy(feature) - Entropy(feature|target)
A feature's entropy can be infinite, so the upbound of MI should be infinite too. content updated.
For your second questions. similar to PCA dimension reducing. it should be up to us to determine the persentage to use according to real needs. if my data contains millions of feature, to reduce the performance presure, I may take the top 10%, but cover the majority variances.
if my data contains only dozens of features, I may use all of them.
Hope this answerd your questions.
Thanks again.