I recently became interested in how we can programmatically solve the 15 puzzle. The 15 puzzle is a sliding puzzle that consists of a 4 x 4 board of tiles numbered from 1 to 15, with one empty space. The tiles are shuffled and the goal is to slide the tiles around until they are in order, i.e. the numbered tiles will run from 1 to 15 starting from the top left corner from left to right and top to bottom, with the empty space at the bottom right corner.
I recently discovered an application called Heynote and I think it is a great tool for developers.
It is common that one would have multiple Github accounts, such as one for personal use and one for work. Managing Github repositories requires a developer to set up SSH keys on his/her computer. However, this becomes non-trivial when one has to work with multiple accounts representing different identities. This blog post describes how one can easily manage multiple Github repositories from different accounts on the same computer.
Like most people, I first knew about Andy Weir through the movie Martian. It was a great sci-fi movie with the science presented accurately. Therefore, when I knew that Andy Weir’s latest science fiction, Project Hail Mary, was released, I immediately bought the book and started reading.
最近在看 Daniel Kahneman 的新書 Noise(中譯《雜訊》)。Daniel Kahneman 是2002年諾貝爾經濟學奬得主,他的前作 Thinking Fast and Slow(中譯《快思慢想》)基於他跟拍檔 Amos Tversky 多年來的研究,介紹人類的兩大思考模式,引起很大迴響。如果没有看過該書,可以先看看 Kahneman 2011年在 Google 的演講。這次的新書,則主要介紹及說明決策過種由於各種原因出現的雜訊,導致人們在面對相同的問題時也可以給出完全不一樣的判斷。他指出,這種在決策中現的雜訊隨處可見,書中的例子更是令人大開眼界。
It is common that the different projects you are working on depend on different versions of Python. That is why pyenv becomes very handy for Python developers, as it lets you switch between different Python versions easily. With pyenv-virtualenv it can also be used together with virtualenv to create isolated development environments for different projects with different dependencies.
To use a pre-trained BERT model, we need to convert the input data into an appropriate format so that each sentence can be sent to the pre-trained model to obtain the corresponding embedding. This article introduces how this can be done using modules and functions available in Hugging Face’s transformers
package (https://huggingface.co/transformers/index.html).
Trie is a very useful data structure. It is commonly used to represent a dictionary for looking up words in a vocabulary.
Matplotlib by default does not support displaying Unicode characters such as Chinese, Japanese and Korean characters. This post introduces two different methods to allow these characters to be shown in the graphs.
PyCon HK 2018 was held on 23-24th November 2018 at Cyberport. I gave a talk on how to deploy machine learning models in Python. The slides of the talk can be found at the link: http://talks.albertauyeung.com/pycon2018-deploy-ml-models/.
N-grams are contiguous sequences of n-items in a sentence. N can be 1, 2 or any other positive integers, although usually we do not consider very large N because those n-grams rarely appears in many different places.
When performing machine learning tasks related to natural language processing, we usually need to generate n-grams from input sentences. For example, in text classification tasks, in addition to using each individual token found in the corpus, we may want to add bi-grams or tri-grams as features to represent our documents. This post describes several different ways to generate n-grams quickly from input sentences in Python.
PyCon HK 2017 was held on 3rd-4th November 2017 at the City University of Hong Kong. I gave a talk on using gradient boosting machines in Python to perform machine learning. The slides of the talk can be found at the link: http://talks.albertauyeung.com/pycon2017-gradient-boosting/.
pandas is one of the most commonly used Python library in data analysis and machine learning. It is versatile and can be used to handle many different types of data. Before feeding a model with training data, one would most probably pre-process the data and perform feature extraction on data stored as pandas DataFrame
. I have been using pandas extensively in my work, and have recently discovered that the time required to manipulate data stored in a DataFrame
can vary hugely depending on the method you used.
In natural language processing, it is a common task to extract words or phrases of particular types from a given sentence or paragraph. For example, when performing analysis of a corpus of news articles, we may want to know which countries are mentioned in the articles, and how many articles are related to each of these countries.
There is probably no need to say that there is too much information on the Web nowadays. Search engines help us a little bit. What is better is to have something interesting recommended to us automatically without asking. Indeed, from as simple as a list of the most popular questions and answers on Quora to some more personalized recommendations we received on Amazon, we are usually offered recommendations on the Web.
#
一、
公元一九零五年,是物理學上充滿突破的一年。在這短短的一年內,愛因斯坦 (Albert Einstein) 發表了五篇有關光電物理,分子運動,以及相對論的論文。人們把這一年稱為物理學或愛因斯坦的「奇蹟年」 (Annus Mirabilis)。在工作之餘進行物理學研究的這段時間,愛因斯坦居住在瑞士的伯恩 (Bern) 。伯恩成為愛因斯坦成名的地方,這城市的名字,也就跟這位廿十世紀最偉大科學家的名字連在一起,變得不可分割了。
從洛桑 (Lausanne) 坐火車到蒙特勒 (Montreux),沿著日內瓦湖 (Lac Léman) 邊望去,可以看到一座碩大的城堡倚立在岸邊。這座古堡,名氣不小。古堡古老莊嚴的建築及四周綺麗的風光自然吸引不少遊人,它還因為英國詩人拜倫 (Lord Byron) 的一首詩而名聲大噪。這座古堡,便是 Château de chillon.