 Data Science Dojo is offering Locust for FREE on Azure Marketplace packaged with pre-configured Python interpreter and Locust web server for load testing. 


Why and when do we perform testing? 

Testing is an evaluation and confirmation that a software application or product performs as intended. The purpose of testing is to determine whether the application satisfies business requirements and whether the product is market ready. Applications can be subjected to automated testing to see if they meet the demands. Scripted sequences are used in this method of software testing, and testing tools carry them out. 

The merits of automated testing are: 

  • Bugs can be avoided 
  • Development costs can be reduced 
  • Performance can be improved till requirement 
  • Application quality can be enhanced 
  • Development time can be saved 

Testing is usually the last phase of the SDLC (Software Development Life Cycle)  

What is load testing and why choose Locust?  

Performance testing is one of several types of software testing. Load testing is an example of performance testing to evaluate performance under real-life load conditions. It involves the following stages: 

  • Define crucial metrics and scenarios 
  • Plan the test load model 
  • Write test scenarios 
  • Execute test by swarming load 
  • Analyze the test results 

It is a modern load testing framework. The major reason senior testers prefer it over other tools like JMeter is because it uses an event-based approach for testing rather than thread based. This results in less consumption of resources and thus saves costs. 

Challenges faced by QA teams  

Before such feasible testing tools, the job of testing teams was not much easier as it is now. Swarming a large number of users to direct as a load on a website was expensive and time-consuming.  

Apart from this, monitoring the testing process in real time was not prevalent either. Complete analytics were usually drawn after the whole testing process concludes, which again required patience. 

The testers needed a platform through which they can evaluate quality of product and its compliance with the specified requirements under different loads without the prolonged wait and high expense. 

Working of Locust 

Locust is an open-source web-based load testing tool. It is based on python and is used to evaluate the functionality and behavior of the web application. For the quality assurance process in any business, load testing is an extremely critical element to assure that the website remains up during traffic influx as it will eventually contribute to the success of the company. Through Locust, web testers can determine the potential of the website to withstand the number of concurrent users. With the power of python, you can develop a set of test scenarios and functions that imitate many users and can observe performance charts on web UI. 


Locust file
Figure 1: A sample 


The self.client.get function points to the pages of a website that you want to target. You can find this code file and further breakdown here. The host domain, users and the spawn rate for the load testing are supplied at the web interface. After running the locust command, the web server is started at 8089. 


locust web interface
Figure 2: Locust web interface


It also allows you to capture different metrics during the testing process in real-time. 


graphs with metrics
Figure 3: Graphs with metrics visualizations


Key characteristics of Locust 


  • An interactive user-friendly web UI is started after executing the file through which you can perform load testing 
  • Locust is an open-source load-testing tool. It is extremely useful for web app testers, QA teams and software testing managers 
  • You can capture various metrics like response time, visualized in charts in real-time as the testing occurs 
  • Achieve increased throughput and high availability by writing test codes in pre-configured python interpreter 
  • You can easily scale up the number of users for extensive production level load testing of web applications 


What Data Science Dojo provides 


Locust instance packaged by Data Science Dojo comes with a pre-configured python interpreter to write test files, and a Locust web UI server to generate the desired amount of load at specific rates without the burden of installation.  

Features included in this offer:  

  • VM configured with Locust application which can start a web server with rich UX/UI 
  • Provides several interactive metrics graphs to visualize the testing results 
  • Provides real-time monitoring support 
  • Ability to download requests statistics, failures, exceptions, and test reports 
  • Feature to swarm multiple users at the desired spawn rate 
  • Support for python language to write complex workflows 
  • Utilizes event-based approach to use fewer resources 

Through Locust, load testing has been easier than ever. It has saved time and cost for businesses as QA engineers and web testers can perform testing now with few clicks and few lines of easy code. 





Locust can be used to test any web application. By swarming many clients spawning at a specific rate, the functionality of a website can be assured that it can manage concurrent users. To achieve extensive load testing, you can use multi-cores on Azure Virtual Machine. Also, the Its web interface calculates metrics for every test run and visualizes them as well. This might slow down the server if you have hundreds upon hundreds of active test units requesting multiple pages. The CPU and RAM usage may also be affected but through Azure Virtual Machine this problem is taken care of. 

At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data.  


 

 

Data Science Dojo is offering Apache Druid for FREE on Azure Marketplace packaged with a pre-configured web environment of Druid with support of various data sources. 

What is data ingestion? 

Data ingestion is the method involved with shipping information from at least one source to an objective site for additional handling and examination. This information can begin from a scope of sources, including data lakes, IoT gadgets, on-premises data sets, and other applications, and arrive in various environments, for example, cloud warehouse or our very own Druid data store. 


Online Analytical Processing (OLAP) is a method for quickly responding to multidimensional analytical questions in computing. OLAP frameworks are usually utilized in numerous BI and data science programs. It involves ingesting data in real-time, whether it’s streaming or in batches, for drawing analytics. OLAP systems usually maintain a data warehouse having redundancy along with maintaining time-series of datasets. They require customized queries to be computed at fast speeds. 


Backend services of Apache Druid  

  1. Middle Manager: This process is responsible for ingesting the data 
  2. Broker: This process is responsible for retrieving queries from external clients 
  3. Coordinator: It assigns segments to specific nodes 
  4. Overlord: It assigns ingestion tasks to middle managers 
  5. Historical: It handles the storage and querying of data 
  6. Router: Optional component to provide single API gateway for coordinators, overlords and brokers 

Obstacles for data engineers & developers 

Collection and maintenance of data from different sources was a hectic task for data engineers and developers. The organization of schema and its monitoring was another challenge in case of huge data. The requirement to response efficiently to complex OLAP queries and any sort of quick calculation was a nightmare. 

In this scenario, a unified environment to deal with the ad-hoc queries, management of different data sets, keeping the time-series of data and quick data ingestions from various sources all from one place would be enough to tackle the mentioned challenges. 

Methodology of Apache Druid  

Apache Druid is an interactive real-time database backend environment for ingesting, maintaining, and segmenting data from a variety of sources either streaming or in batches, thus making it flexible. It is a scalable distributed system with parallel processing for queries and has a column-based structure for storing datasets, indicating the properties of each ingestion.

Druid stores the data safely in deep storage and provides indexing and time-based partitioning for faster filtering and searching performance. Users can query the ingested datasets with Druid’s optimized SQL engine. It also provides automatic summarization and algorithmic approximation of data. 

Druid Architecture (Picture Courtesy: ) 


Major features   

  • Apache Druid has a fast and optimized user interface. Druid UI makes it easy to supervise, refresh and troubleshoot your datasets. The column-oriented organization provides ease of control to the users 
  • Any ingested data can be subjected to queries with the help of an in-browser SQL editor. It delivers the results with low latency 
  • It is an open-source tool. Developers, data engineers, DevOps, companies focusing on web and mobile analytics, solutions architects who want to monitor network performance, and anyone interested in data science can use this offer 
  • Druid provides the feature of maintaining logs of each activity. In case of failure of any operation, the logs are updated, and the user can check them on the same web server 
  • You can monitor the status of your datasets oriented in a column via the web server 


What does Data Science Dojo provide?  

Apache Druid instance packaged by Data Science Dojo serves as a pre-configured data store for managing and monitoring ingested data along with SQL support to query data without the burden of installation. It offers efficient storage, quick sifting on dimensions of data, and querying of data at a sub-second normal reaction time. It supports a variety of data sources to ingest data from. 

Features included in this offer:  

  • A Druid service that is easily accessible from the web, having a rich user interface 
  • Easy to operate and user friendly 
  • In-browser SQL coding environment to query ingested data sets 
  • Low latency automated data aggregations and approximations using algorithms 
  • Quick responsiveness and high uptime 
  • Time-based data partitioning 
  • Feature of schema configuration and data tuning at the time of ingestion 

Our instance of Apache Druid supports the following data sources: 

  • Apache Kafka 
  • HDFS 
  • HTTP(s) 
  • Local disk 
  • Azure Event Hub 
  • Paste Data 
  • Other custom sources 

By specifying credentials and adding extensions you can also ingest from : 

  • Azure Data Lake 
  • Google Cloud Storage 
  • Amazon S3 & Kinesis 


Apache Druid is majorly used for OLAP systems because of its time series data ingestion, and the way the services perform indexing, and response to queries in real-time. It has a flexible and fault-tolerant architecture. When coupled with Microsoft cloud services, responsiveness and processing speed outperform their traditional counterparts because data-intensive computations aren’t performed locally, but in the cloud. 

Install the Apache Druid offer now from the Azure Marketplace by Data Science Dojo. 


   

   


Data Science Dojo is offering Metabase for FREE on Azure Marketplace packaged with web accessible Metabase: Open-Source server. 

Metabase query
Metabase query



Organizations often adopt strategies that enhance the productivity of their selling points. One strategy is to utilize the prior business data to identify key patterns regarding any product and then take decisions for it accordingly. However, the work is quite hectic, costly, and requires domain experts. Metabase has bridged that gap of skillset. Metabase provides marketing and business professionals with an easy-to-use query builder notebook to extract required data and simultaneously visualize it without any SQL coding, with just a few clicks. 

What is Metabase and its question? 

Metabase is an open-source business intelligence framework that provides a web interface to import data from diverse databases and then analyze and visualize it with few clicks. The methodology of Metabase is based on questions and the answers to them. They form the foundation of everything else that it provides. 


A question is any kind of query that you want to perform on a data. Once you are done with the specification of query functions in the notebook editor, you can visualize the query results. After that you can save this question as well for reusability and turn it into a data model for business specific purposes. 

Challenges for businesses  

For businesses that lack expert analysts, engineers and substantial IT department, it was costly and time-consuming to hire new domain experts or managers themselves learn to code and then explore and visualize data. Apart from that, not many pre-existing applications provide diverse data source connections which was also a challenge. 

In this regard, a straightforward interactive tool that even newbies could adapt immediately and thus get the job done would be the most ideal solution. 

Data analytics with Metabase  

Metabase concept is based on questions which are basically queries and data models (special saved questions). It provides an easy-to-use notebook through which users can gather raw data, filter it, join tables, summarize information, and add other customizations without any need for SQL coding.

Users can select the dimensions of columns from tables and then create various visualizations and embed them in different sub-dashboards. Metabase is frequently utilized for pitching business proposals to executive decision-makers because the visualizations are very simple to achieve from raw data. 


visualization on sample data
Figure 1: A visualization on sample data 


A visualization on sample data 
Figure 2:  Query builder notebook


Major characteristics 

  • Metabase delivers a notebook that enables users to select data, join with other tables, filter, and other operations just by clicking on options instead of writing a SQL query 
  • In case of complex queries, a user can also use an in-built optimized SQL editor 
  • The choice to select from various data sources like PostgreSQL, MongoDB, Spark SQL, Druid, etc., makes Metabase flexible and adaptable 
  • Under the Metabase admin dashboard, users can troubleshoot the logs regarding different tasks and jobs 
  • Has the ability to enable public sharing. It enables admins to create publicly viewable links for Questions and Dashboards  

What Data Science Dojo has for you  

Metabase instance packaged by Data Science Dojo serves as an open-source easy-to-use web interface for data analytics without the burden of installation. It contains numerous pre-designed visualization categories waiting for data.

It has a query builder which is used to create questions (customized queries) with few clicks. In our service users can also use an in-browser SQL editor for performing complex queries. Any user who wants to identify the impact of their product from the raw business data can use this tool. 

Features included in this offer:  

  • A rich web interface running Metabase: Open Source 
  • A no-code query building notebook editor 
  • In-browser optimized SQL editor for complex queries 
  • Beautiful interactive visualizations 
  • Ability to create data models 
  • Email configuration and Slack support 
  • Shareability feature 
  • Easy specification for metrics and segments 
  • Feature to download query results in CSV, XLSX and JSON format 

Our instance supports the following major databases: 

  • Druid 
  • PostgreSQL 
  • MySQL 
  • SQL Server 
  • Amazon Redshift 
  • Big Query 
  • Snowflake 
  • Google Analytics 
  • H2 
  • MongoDB 
  • Presto 
  • Spark SQL 
  • SQLite 


Metabase is a business intelligence software and beneficial for marketing and product managers. By making it possible to share analytics with various teams within an enterprise, Metabase makes it simple for developers to create reports and collaborate on projects. The responsiveness and processing speed are faster than the traditional desktop environment as it uses Microsoft cloud services. 

At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data.  

 

 

Data Science Dojo is offering Countly for FREE on Azure Marketplace packaged with web accessible Countly Server. 

Purpose of product analytics  

Product analytics is a comprehensive collection of mechanisms for evaluating the performance of digital ventures created by product teams and managers. 

Businesses often need to measure the metrics and impact of their products, for e.g., how the audience perceives their product like how many visitors are reading a particular page or clicking on a specific button. This gives an insight into what future decisions need to be taken regarding any product. Whether it should be modified? or removed? or kept as it is? Countly has made this work easier by providing a centralized web analytics environment to track the user engagement with a product along with monitoring its health.  


Challenges for individuals  

Many platforms require developers for coding to visualize analytics which is not only time consuming but also come at a cost. At the application level, having an app crash leaves anyone in shock, and that is followed by a hectic task of determining the root cause of the problem which is time-consuming. At the corporate level, the current and past data needs to be analyzed appropriately for the future strength of the company and that requires robust analysis easily acquired by anyone which was a challenge faced by many organizations  

Countly analytics 

Countly enables users to monitor and analyze the performance of their applications irrespective of the platform in real-time. It can compile data from numerous sources and presents it in a manner that makes it easier for business analysts and managers to evaluate app usage and client behavior. It offers a customizable dashboard with the freedom to innovate and improve your products in order to meet important business and revenue objectives while also ensuring privacy by design. It is a world leader in product analytics because it tracks more than 1.5 billion unique identities on more than 16,000 applications and more than 2,000 servers worldwide. 


Analytics based technology - countly
Figure 1: Analytics based on type of technology



Analytics based on user activity - Countly
Figure 2: Analytics based on user activity



Figure 3: Analytics based on views - Countly
Figure 3: Analytics based on views


Major characteristics 

  • Interactive web interface: User-friendly web environment with customizable dashboards for easy accessibility along with pre-designed metrics and visualizations 
  • Platform-independent: Supports web analytics, mobile app analytics, and desktop application analytics for macOS and Windows 
  • Alerts and email reporting: Ability to receive alerts based on the metric changes and provides custom email reporting 
  • Users’ role and access manager: Provides global administrators the ability to manage users, groups, and their roles and permissions 
  • Logs Management: Maintains server and audit logs on the web server regarding user actions on data 

What Data Science Dojo has for you  

Countly Server packaged by Data Science Dojo provides a web analytics service that provides insights about your product in real-time, no matter if it’s a web application or mobile app, or even desktop application without the burden of installation. It comes with numerous pre-configured metrics and visualization templates to import data and observe trends. It’s helpful for businesses to identify the application usage and determine the client response to the apps.  

Features included in this offer:  

  • A VM configured with Countly Server: Community Edition accessible from a web browser 
  • Ability to track user analytics, user loyalty, session analytics, technology, and geo insights  
  • Easy-to-use customizable dashboard 
  • Logs manager 
  • Alerting and reporting feature 
  • User permissions and roles manager 
  • Built-in Countly DB viewer 
  • Cache management 
  • Flexibility to define data limits 


Countly provides the feasibility to analyze data in real-time. It is highly extensible and possesses various features to manage different operations like alerting, reporting, logging, job management, etc. The analytics throughput can be increased by using multi-cores on Azure Virtual Machine. Also, Countly can handle different platform applications at once. This might slow down the server if you have thousands upon thousands of active client requests on different applications. The CPU and RAM usage may also be affected but through Azure Virtual Machine all these problems are taken care of. 

At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data. 

 

 


Data Science Dojo is offering RStudio for FREE on Azure Marketplace packaged with a pre-installed running version of R alongside other language backends to simplify Data Science. 


What is data science? 


Data Science is one of the quickest-growing areas of work in the industry. According to Harvard Business Review, it’s regarded as the “sexiest job of the 21st century”. 

Data science joins math and measurements, programming, refined analyses, machine learning and AI to reveal significant knowledge concealed in an association’s dataset. These understandings can be utilized to direct businesses in planning and decision making. The lifecycle of Data Science involves data collection (ingestion), data pre-processing and wrangling, predictive data analysis via machine learning and finally communication of outcomes for future strategies. 


Challenges faced by developers 


Individuals who were learning or pursuing Data Science and Machine Learning through R found it difficult to code and develop models using only a terminal or command line interface. Developers who wanted to perform extensive high powered ML operations but didn’t have enough computation power to do it locally was also another challenge.  

In these circumstances an interactive environment configured with R can help the users in gaining hands-on experience with machine learning, data analysis and other statistical operations. 

Working with RStudio 


RStudio is an open-source tool that gives you an effortless coding IDE in the cloud with a pre-installed R programming language to start your data mining and analytics work. It is integrated with a set of modules that make code development, scientific computing, and graphical jobs to be more productive and easier. This tool allows developers to perform a variety of technical tasks such as predictive modeling, clustering, multivariate querying, stock market rate, spam filtering, recommendation systems, malware, and anomaly detection, image recognition, and medical diagnosis. 


Rstudio -potential for data science
Web interface of RStudio Server executing a demo R function


Key attributes 


  • Provides an in-browser coding environment with syntax suggestions, autocomplete code feature and smart indentation 
  • Provides the user with an easy-to-use free coding platform accessible at the local web server, powered by Azure machines 
  • Apart from the primary built of R, RStudio has support for other famous interpreters as well such as Python, SQL, HTML, CSS, JS, C, Quarto and a few others 
  • In-built debugging functionality by toggling breakpoints to detect and eradicate the issues or fix them quickly 
  • As the computations are carried on Microsoft’s cloud servers, there is no memory or performance pressure on the company’s storage devices 
  • In order to optimize the workload, the RAM and compute power can be scaled accordingly, thanks to Azure services 


What Data Science Dojo has for you 


The RStudio instance packaged by Data Science Dojo provides an in-browser coding environment with a running version of R pre-deployed in it, reducing the burden of installation. With an interactive user-friendly GUI-based application, developers can perform Machine Learning tasks with comfort and flexibility.  

  • A browser based RStudio environment up and running with R pre-deployed 
  • Convenient accessibility and navigation 
  • Ability to work with different language scripts simultaneously 
  • Rich graphics and interactive environment 
  • Support for git and version control 
  • Code consoles to run code interactively, with full support for rich output 
  • Integrated R documentation and user help 
  • Readily available cheat sheets to get started 

Our instance supports the following backends: 

  • R 
  • Python 
  • HTML 
  • CSS 
  • JavaScript 
  • Quarto 
  • C 
  • SQL 
  • Shell 
  • Markdown and Header files 




RStudio provides customers with an easy-to-use environment to gain hands-on experience with Machine Learning and Data Science. The responsiveness and processing speed are much better than the traditional desktop environment as it uses Microsoft cloud services. It comes with built-in support for git and version control.

Several variants of the R script can be executed in RStudio. It allows users to work on a variety of language backends at the same time with smart observability of variables and values side by side. The documentation and user support are incorporated into the tool to make it easy for developers to code. 

At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data. 


  



Data Science Dojo is offering Apache Superset for FREE on Azure Marketplace packaged with pre-installed SQL lab and interactive visualizations to get started. 


What is Business Intelligence?  


Business Intelligence (BI) depends on the idea of utilizing information to perform activities. It expects to give business pioneers noteworthy bits of knowledge through data handling and analytics. For instance, a business breaks down the KPIs (Key Performance Indicators) to distinguish its benefits and shortcomings. Hence, the decision-makers can conclude in which department the organization can work to increase efficiency.  

Recently two elements in BI have resulted in sensational enhancements in metrics like speed and proficiency. The two elements include:  


  • Automation  
  • Data Visualization  


Apache Superset widely focuses on the latter model which has changed the course of business insights.  


But what were the challenges faced by analysts before there were popular exploratory tools like Superset?  


Challenges of Data Analysts


Scalability, framework compatibility, and absence of business-explicit customization were a few challenges faced by data analysts. Apart from that exploring petabytes of data and visualizing it would cause the system to collapse or hang at times.  

In these circumstances, a tool having the ability to query data as per business needs and envision it in various diagrams and plots was required. Additionally, a system scalable and elastic enough to handle and explore large volumes of data would be an ideal solution.  


Data Analytics with Superset  


Apache Superset is an open-source tool that equips you with a web-based environment for interactive data analytics, visualization, and exploration. It provides a vast collection of different types of vibrant and interactive visualizations, charts, and tables. It can customize the layouts and the dynamic dashboard elements along with quick filtering, making it flexible and user-friendly. Apache Superset is extremely beneficial for businesses and researchers who want to identify key trends and patterns from raw data to aid in the decision-making process.  


Sales analytics - Apache superset
Video Game Sales Analytics with different visualizations



It is a powerhouse of SQL as it not only allows connection to several databases but also provides an in-browser SQL editor by the name SQL Lab  

SQL lab - Apache superset
SQL Lab: an in-browser powerful SQL editor pre-configured for faster querying


Key attributes  


  • Superset delivers an interactive UI that enriches the plots, charts, and other diagrams. You can customize your dashboard and canvas as per requirement. The hover feature and side-by-side layout make it coherent  
  • An open-source easy-to-use tool with a no-code environment. Drag and drop and one-click alterations make it more user-friendly  
  • Contains a powerful built-in SQL editor to query data from any database quickly  
  • The choice to select from various databases like Druid, Hive, MySQL, SparkSQL, etc., and the ability to connect additional databases makes Superset flexible and adaptable  
  • In-built functionality to create alerts and notifications by setting specific conditions at a particular schedule  
  • Superset provides a section about managing different users and their roles and permissions. It also has a tab for logging the ongoing events  


What does Data Science Dojo have for you  


Superset instance packaged by Data Science Dojo serves as a web-accessible no-code environment with miscellaneous analysis capabilities without the burden of installation. It has many samples of chart and dataset projects to get started. In our service users can customize dashboards and canvas as per business needs.

It comes with drag-and-drop feasibility which makes it user-friendly and easy to use. Users can create different visualizations to detect key trends in any volume of data.  


What is included in this offer:  


  • A VM configured with a web-accessible Superset application  
  • Many sample charts and datasets to get started  
  • In-browser optimized SQL editor called SQL Lab  
  • User access and roles manager  
  • Alert and report feature  
  • Feasibility of drag and drop  
  • In-build functionality of event logging  


Our instance supports the following major databases:  


  • Druid  
  • Hive  
  • SparkSQL  
  • MySQL  
  • PostgreSQL  
  • Presto  
  • Oracle  
  • SQLite  
  • Trino  
  • Apart from these any data engine that has Python DB-API driver and a SQL Alchemy dialect can be connected  




Efficient resource requirement for exploring and visualizing large volumes of data was one of the areas of concern when working on traditional desktop environments. The other area of concern includes the ad-hoc SQL querying of data from different database connections. With our Superset instance, both concerns are put to rest.

When coupled with Microsoft cloud services and processing speed, it outperforms its traditional counterparts since data-intensive computations aren’t performed locally but in the cloud. It has a lightweight semantic layer and is designed as a cloud-native architecture.  

At Data Science Dojo, we deliver data science education, consulting, and technical services to increase the power of data.  


 




