In this article, I am going to demonstrate in detail the Modular ETL Architecture. ETL is a vast concept which explains the methodology of moving data across various sources to destinations while performing some sort of transformations within it. This is an advanced article that considers that the user has a substantial amount of understanding about how ETL is implemented using different tools like SSIS and the underlying working principle along with how to deploy multiple packages using SSIS. It is extremely important to implement a well-designed ETL architecture for your organization’s workload, otherwise, it might lead to performance degradations along with other challenges. To keep things simple, I will just explain the Modular ETL Architecture in this article which will be followed by a detailed hands-on tutorial in the next article – “Implementing Modular Architecture in ETL using SSIS”.
Read more »Aveek Das
- Getting started with PostgreSQL on Docker - August 12, 2022
- Getting started with Spatial Data in PostgreSQL - January 13, 2022
- An overview of Power BI Incremental Refresh - December 6, 2021
An introduction to SSIS Data Lineage concepts
September 3, 2020In this article, I am going to discuss SSIS data lineage concepts, which are often used while designing ETL workloads on a data warehouse. Although this article is focused on implementing data lineage using SSIS, it does not only confine to SSIS but to any ETL tools in the market using which data is moved from one source to a destination. In my previous article, Understanding Data Lineage in ETL, I have already discussed the generic importance of data lineage concepts for any ETL tool. I would definitely suggest you have a look at it if you want to understand in general how data lineage helps to track the source of a single record in the warehouse.
Read more »Understanding Data Lineage in ETL
September 3, 2020In this article, I am going to explain what Data Lineage in ETL is and how to implement the same. In this modern world, where companies are dealing with a humongous amount of data every day, there also lies a challenge to efficiently manage and monitor this data. There are systems that generate data every second and are being processed to a final reporting or monitoring tool for analysis. In order to process this data, we use a variety of ETL tools, which in turn makes the data transformation possible in a managed way.
Read more »Exploring databases in Python using Pandas
August 24, 2020In this article, I am going to cover in detail working with databases in Python using Pandas and SQLAlchemy. This is a part of the series Learn Pandas in Python where I talk about the various techniques to work with the Pandas module in Python.
Read more »Introduction to SQLAlchemy in Pandas Dataframe
August 20, 2020In this article, I am going to demonstrate how to connect to databases using a pandas dataframe object. Pandas in Python uses a module known as SQLAlchemy to connect to various databases and perform database operations. In the previous article in this series “Learn Pandas in Python”, I have explained how to get up and running with the dataframe object in pandas. Using the dataframe object, you can easily start working with your structured datasets in a similar way that of relational tables. I would suggest you have a look at that article in case you are new to pandas and want to learn more about the dataframe object.
Read more »Working with Pandas Dataframes in Python
August 19, 2020In this article, I am going to explain in detail the Pandas Dataframe objects in python. In the previous article in this series Learn Pandas in Python, I have explained what pandas are and how can we install the same in our development machines. I have also explained the use of pandas along with other important libraries for the purpose of analyzing data with more ease. Pandas provides a dataframe object which makes it relatively easier to consider working with the data as it provides a tabular interface for the data in it. People who are already familiar in working with relational databases, they can really find similarities between a table in the database and the dataframe object in pandas.
Read more »Deploy serverless applications using the AWS SAM CLI
August 18, 2020In this article, we are going to learn to deploy serverless applications to the AWS Cloud using the AWS SAM CLI. This article is a part of the three-article series “Develop and Deploy Serverless Applications with AWS SAM CLI”. If you have some idea about how to develop and test your serverless applications locally using the AWS SAM CLI, then you might proceed with this article. However, if you want to learn more about developing and running your code locally, I would strongly recommend reading the previous articles of this series, Getting started with the AWS SAM CLI and Set up a local serverless environment using the AWS SAM CLI, which explains in detail the various configurations required to start and run the serverless functions on your local.
Read more »Set up a local serverless environment using the AWS SAM CLI
August 18, 2020In this article, we are going to work on setting up your local development environment for creating serverless applications using the AWS SAM CLI. This article is a part of the three-article series “Develop and Deploy Serverless Applications with AWS SAM CLI”. If you already know about the working principle of the AWS SAM CLI, you may proceed with this; otherwise, I would highly recommend reading my previous article on the series, Getting started with the AWS SAM CLI, where I talk about the introduction to the AWS Serverless Application Model and its workflow.
Read more »Getting started with the AWS SAM CLI
August 17, 2020In this article, we will learn the concept of the AWS SAM CLI. This is a part of the three-article series “Develop and Deploy Serverless Applications with AWS SAM CLI”. SAM, abbreviated for Serverless Application Model is a framework provided by Amazon Web Services, which can be leveraged to build applications on the local machine and deploy those to the AWS Lambdas directly.
Read more »Getting started with Jupyter Notebooks
August 14, 2020In this article, I am going to explain what Jupyter Notebooks are and how to install the same on your machine. Further, I will demonstrate how to use these notebooks using Visual Studio Code and perform data analysis and other development activities. It is an open-source platform using which you can create and share documents that contain live code, equations, and visualizations along with the formatted text. Most importantly, these notebooks can be run on the web browser by just starting a server and using it. This open-source project is maintained by the team at Project Jupyter.
Read more »Getting started with Pandas in Python
August 5, 2020In this article, I am going to explain how to use Pandas in Python. Pandas is one of the most popular modules in python that can be used for data manipulation and analysis using python. Basically, it provides an easy interface to interact with flowing data and apply transformations to them on the go. This module is covered under the BSD license and can be used for free. You can download this module by visiting the website or by installing it through the python package manager.
Read more »Getting started with Amazon S3 and Python
July 31, 2020In this article, I am going to explain what Amazon S3 is and how to connect to it using python. This article will be focused on beginners who are trying to get their hands on python and working around the AWS ecosystem. AWS, as you might know, is one of the largest cloud providers along with Microsoft Azure and Google Cloud Platform. There are a lot of services offered by Amazon including AWS S3. Amazon S3, also abbreviated as Amazon Simple Storage Service is a storage service offered by the cloud provider that enables the users to store any kind of files in this service. It is designed to make web-scale computing easier for developers.
Read more »Working with Power BI Data Models in Visual Studio Code
July 30, 2020In this article, I am going to introduce the Tabular Object Model (TOM) in the Power BI Data Model and provide an understanding of how this model can be accessed outside of the Power BI environment. For more info about the Tabular Object Model in the Power BI Data Model, please read this article. In this tutorial, we are going to use the Visual Studio Code to simply write a dotnet console application and try to access the Tabular Object Model from the Power BI file. With the help of this knowledge, programmers and BI developers can not only view the underlying model in the Power BI Data Model but also can enhance the data model programmatically by writing a few lines of code. It can also be further improved by automating the creation of the Power BI models with the help of the Tabular Object Model library in dotnet.
Read more »Diving deep with complex Data Structures
July 29, 2020In my previous article, Understanding common Data Structures, I have mentioned the most commonly used data structures in software programming. In this article, let us get into some more details about the other data structures that are a bit complex than the ones already discussed but also used quite often while designing software applications. Here, we will look into the following data structures.
Read more »Understanding the SQL MERGE statement
July 27, 2020In this article, I am going to give a detailed explanation of how to use the SQL MERGE statement in SQL Server. The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and deletes all in a single transaction without having to write separate logic for each of these. You can specify conditions on which you expect the MERGE statement to insert, update, or delete, etc.
Read more »Understanding common Data Structures
July 15, 2020In this article, I am going to walk you through the concepts of the common Data Structures that every student, colleague working with computers should be aware of. Data Structure forms an integral part of any system or database design. It is a very interesting and intuitive concept that you can apply anywhere. Through this article, I aim to introduce the beginners to the concepts of Data Structures and brush up the same for colleagues who have already been associated with the industry for years. This will also help you understand some database concepts more easily once you have a grasp over these concepts.
Read more »Migrate Power BI reports between multiple workspaces
July 14, 2020In this article, I am going to explain how we can create a Power BI Report using the Power BI Desktop and then publish it to the Power BI service workspace. Once the report is published to a workspace, we can also consider migrating the same report across several workspaces and all this can be done programmatically by using the Power BI REST APIs. This article is specially targeted towards Power BI admins or DevOps team whose task is to migrate dashboards between various environments like Development, QA, Production, etc. without any manual intervention of the reports.
Read more »Getting started with SSISDB
July 13, 2020In this article, I am going to explain how to start using the SSISDB database, also known as the SSIS Catalog database. In my previous article, I have provided an overview of this SSIS catalog database in detail. I would recommend you read that article, before moving on further with this if you want to understand how the SSIS catalog database works. The SSIS catalog database is one single database in which you can deploy all your SSIS packages and then organize and manage those packages centrally.
Read more »Deploy Python apps to Azure Functions using Visual Studio Code
July 10, 2020In this article, we are going to build a small python application and deploy it to Azure Functions. The development and the deployment will be done using the Visual Studio (VS) Code. As you might be aware, VS Code is one of the most widely used and preferred code editors for programmers. It is a cross-platform tool, which means you can install this tool on any operating system of your choice, starting from Windows, Linux, or Mac OS.
Read more »How to debug Python scripts in Visual Studio Code
July 8, 2020In this article, I am going to explain how we can easily debug Python scripts using the Visual Studio (VS) Code. In my previous article on this topic, I have explained how to set up a development environment to start coding in Python. I would definitely recommend reading that article if you have not setup your Visual Studio Code environment yet. This article can be considered as a continuation of the above-mentioned steps as I am assuming that you are already into programming in Python using the VS Code.
Read more »Introduction to the SSIS Catalog database (SSISDB)
July 7, 2020In this article, I am going to explain in detail the SSIS catalog that can be used to deploy SQL Server Integration Services (SSIS) projects. Using this catalog, developers and database administrators can easily deploy and manage their integration services projects after deployment. The SSIS Catalog database was introduced in SQL Server 2012 and prior to that users had the following three options to deploy their SSIS packages:
Read more »Add users to a Power BI workspace using PowerShell
July 6, 2020In this article, I am going to explain how to add users to a Power BI workspace using Power BI PowerShell. As you already know, Power BI is a business intelligence tool from Microsoft, using which we can build graphical reports and dashboards which bring some sense to our data that resides on the database. Power BI also has a web interface known as the Power BI service, which can be used to share and collaborate these reports and dashboards with multiple users within or out of the organization.
Read more »Understanding SSIS memory usage
July 3, 2020In this article, I am going to explain in detail SSIS memory usage and how can we optimize our tasks and the data flow to leverage maximum benefits from the in-memory operating tool. As you might be aware, SSIS also known as SQL Server Integration Services is a data integration tool, provided by Microsoft which comes shipped with the SQL Server editions. SSIS is an enterprise-scale, in-memory data integration tool which can be used to move data between different databases or different servers in a comfortable yet manageable way.
Read more »Setting up Visual Studio Code for Python Development
July 1, 2020In this article, I am going to explain how to set up your Visual Studio Code for Python Development. Visual Studio Code or, popularly known as VS Code, is one of the free and open-source code editors developed by Microsoft and is mostly preferred by developers of all the major programming languages due to its flexibility and other integrated development tools like debugging, IntelliSense, etc. Visual Studio Code is available to be downloaded for all the major operating systems like Windows, Linux, and macOS. You can visit https://visualstudio.microsoft.com/ to download it based on the OS you are using.
Read more »Getting started with Azure Function Apps
June 25, 2020In this article, I am going to explain how to get started with Azure Function Apps. In my previous article, An introduction to Serverless Applications, I have mentioned the serverless architecture and the various cloud offerings to develop serverless applications. As we know, in serverless architecture, the users only write the business logic code, and all other worries are taken care of by the cloud provider. This helps businesses to quickly implement solutions and ship it to customers with more quality. Also, another important point about serverless applications is that they are scalable on-demand, which means as a developer, we no longer need to monitor or manually pull up the resources when the executions are more.
Read more »