We live in the era of big data. With the growth of the internet, data is rapidly growing in a huge amount and with high variability. Big data processing could give you a headache because it naturally takes a lot of running time. Apache Spark (or Spark) is one of the popular tools to process big data.
Spark is a unified analytics engine for large-scale data processing. With Spark, we can perform data processing quickly and distribute processing tasks across multiple computers. People use Spark because it is deployable in popular programming languages such as Python, Scala, Java, R and…
In this article, I will show you an example of JSON data processing. The goal is to make JSON files containing one observation per file in a folder combined into a single JSON file. …
Dealing with spatial data in the spatial analysis might be cumbersome. Spatial data is usually stored in a form of points, lines, or polygon coordinates that are representable in a map. Thankfully there is a python package called GeoPandas purposed to work with geospatial data in python easier.
I recently worked on spatial data analysis. In that project, GeoPandas helped me to label coordinates with their corresponding areas. Labelling area is indeed important to identify whether there is a spatial pattern in the data. In this article, I will particularly discuss how to put area label on given coordinates.
The tech industry is growing rapidly. As it is growing, there is a lot of discussions and articles related to tech published. 3 topics are interesting for me to see:
I will discuss these topics by asking each question on every topic. To answer such questions, I will mainly use the StackOverflow Survey 2020 data. The survey is participated by around 60,000 developers from around the world. It contains questions related to multiple topics of software developers such as education, career, tech culture, etc.
A Data Science Enthusiast