If you are using Visual Studio Code, it is easy to enable both code highlight and Math by installing one extension: Markdown all in one.
Initialize a new markdown document end with
.md. To enable code highlight, surround code with ``` (straight single quote, usually under esc key). like this:
To select data from pandas’s dataframe, we can use
df_data['column'], and can also use
df_data.loc['column'], yeah, can also use
pd.eval(), and don't forget
df_data.query().If the above is not enough, there is a package called numexpr, and many more.
The Zen of Python said:
There should be one — and preferably only one — obvious way to do it.
Hey Pandas Dataframe, is there one best and obvious way to select data? let’s go through 10 ways one by one and see if we can find the answer.
Say, we have a sample pd data:
When dealing with text data, we want to measure the importance of a word to a document of a full text collection. One of the most intuitive solution would be counting the word appearance number, the higher the better. But simply counting the words # will lead to the result that favor to long document/article. After all, longer document contains more words.
We need another solution that can appropriately measure the importance of a word in the overall context. TF-IDF is one of the effective solutions. And also functioning as the backbone of modern search engines like Google.
I like the idea that we need to rethink about technologies and modernization , from the perspective of human being. Like New York Time square, they fixed the traffic jam by simply blocking some unnecessary roads and joints. rebuild and turn that areas to walking street.
But in the other side, the trending of technologize maybe unstoppable.
Thousands years ago. Socrates insists that writing destroys memory and weakens the mind. and even doubt the merit of introducing ‘letters’. But nowadays, we all can't live without writing and reading, and books.
Like IPads, Mac and computers, My kid is also super…
Whenever there is a programming speed competition, Python usually goes to the bottom. Some said that is because Python is an interpretation language. All interpretation language is slow. But we know that Java is also a kind of language, its bytecode is interpreted by JVM. As showing in this benchmark, Java is much faster than Python.
Here is a sample that can demo Python’s slowness. Use the traditional for-loop to produce reciprocal numbers:
import numpy as np
values = np.random.randint(1, 100, size=1000000)
output = np.empty(len(values))
for i in range(len(values)):
output[i] = 1.0/values[i]
Warning: this piece is sheer for fun, no intention to offend anyone, include Data Scientist.
If you are entitled with Data “Scientist”, will you ever self-doubting that am I really a “Scientist” or am I really working on the “Science” or just data analyst?
Recently, I came across a tweet, which says:
Offend a Data “Scientist” with one tweet — Ben Lindsay
Then there comes many amusing replies like this one:
machine learning is just regression with extra steps — Mike Henry
This one from my peer , an underrated tweet:
When Matplotlib was born in 2000s, at a time there was no Pandas exists. (That is why Matplotlib do not natively support DataFrame from pandas) The lib is old, but is it outdated? Can we still use Matplotlib to create a nice, modern and user friendly data visualization. Even in 2021? Let’s try it out .
The official site’s Usage Guide says we can create a chart in two styles.
import matplotlib.pyplot as plt
(fig, ax) = plt.subplots()
ax.plot([1, 2, 3, 4], [1, 4, 2, 3])
import matplotlib.pyplot as plt
Regular Expression sourced from Regular Language during 1940s. It could be one of the oldest computer language in our planet(maybe universe). And still in massive usage. Many years later, there might be some other cool programming languages pop up, but I strongly believe my two kids will still need to learn RE when they grow up, no matter what programming language they will use then.
In this article, I am trying to explain the core RE concepts and ambiguous usages that may lead to confuse and errors.
One of the most common usage with Regular Expression in Python is capturing…
Python as an interpreter, a program, won’t do much without importing any external modules or packages. Understand how Python import module and package will be helpful in almost all scenarios.
All code in this article is applied and test in Linux(Ubuntu), Windows and MacOS should(hopefully) be similar.
When we use
pip to install a package.
pip install <pkg_name>
Packages go to system wide folder
The “system wide” here means the installed packages are accessible to all Python programs.
import keywords to import a package, Python will loop the path list in
sys.path to load it up.
Lots of people guessing why a productive tool company want to buy a such a toy, childish and nonsensical App. Microsoft want to use TikTok as a channel to sell Microsoft 365? Want to create a TikTok for Business and profit more? Want to expand its market to Social Media and counter the competition from Facebook and Google? Want to save TikTok from the ban?
But I feel all the above list may only partly true, here must be some other reasons.
Leonardo Da Vinci finished the painting around 1519, but not as valuable as today until 1911. …
Daddy of two kids, husband and Data Scientist @ Microsoft, Redmond.