Data Hubs

Introduction to Data Science

June 3, 2017 Analytics, Big Data, Big Data Analytics, Big Data Management, Cloud Computing, Cold Path Analytics, Data Analytics, Data Collection, Data Hubs, Data Science, Data Scientist, Edge Analytics, Emerging Technologies, Hot Path Analytics, Human Computer Interation, Hype vs. reality, Industrial Automation, Internet of Nano Things, Internet of Things, IoT, IoT Devices, Keyword Analysis, KnowledgeBase, Machine Learning(ML), machine-to-machine (M2M), Machines, Predictive Analytics, Predictive Maintenance, Realtime Analytics, Robotics, Sentiment Analytics, Stream Analytics No comments

We all have been hearing the term Data Science and Data Scientist occupation become more popular these days. I thought of sharing some light into this specific area of science, that may seem interesting for rightly skilled readers of my blog. 

Data Science is one of the hottest topics on the Computer and Internet  nowadays. People/Corporations have gathered data from applications and systems/devices until today and now is the time to analyze them. The world wide adoption of Internet of Things has also added more scope analyzing and operating on the huge data being accumulated from these devices near real-time.

As per the standard Wikipedia definition goes Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.”.

Data Science requires the following skillset:

  • Hacking Skills
  • Mathematics and Statistical Knowledge
  • Substantive Scientific Expertise

aoz1BJy

[Image Source: From this article by Berkeley Science Review.]

Data Science Process:

Data Science process involves collecting row data, processing data, cleaning data, data analysis using models/algorithms and visualizes them for presentational approaches.  This process is explained through a visual diagram from Wikipedia.

Data_visualization_process_v1

[Data science process flowchart, source wikipedia]

Who are Data Scientist?

Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings.

They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to produce and present results with dashboards (displays of current values) rather than papers/reports, as statisticians normally do.

Importance of Data Science and Data Scientist:

“This hot new field promises to revolutionize industries from business to government, health care to academia.”

The New York Times

Data Scientist is the sexiest job in the 21st century as per Harward Business Review.

McKinsey & Company projecting a global excess demand of 1.5 million new data scientists.

What are the skills required for a Data Scientist, let me share you a visualization through a Brain dump.

FxsL3b8

I thought of sharing an image to take you through the essential skill requirements for a Modern Data Scientist.

So what are you waiting for?, if you are rightly skilled get yourselves an Data Science Course.

Informational  Sources:

Azure in Germany–a complete EU cloud computing solution

May 18, 2017 .NET, Analytics, AppFabric, Azure, Azure in Germany, Azure IoT Suite, Cloud Computing, Cloud Services, Cloud Strategy, Cognitive Services, Computing, Data Analytics, Data Governance, Data Hubs, Data Warehouse, Emerging Technologies, Event Hubs, IaaS, Intelligent Edge, Internet of Things, IoT, IoT Central, IoT Hub, Machine Learning(ML), Media Services, Media Services & CDN, Messaging, Microsoft, Mobile Services, PaaS, SaaS, SQL Azure, Storage, Backup & Recovery, Stream Analytics, Virtual Machines, Windowz Azure No comments

With my earlier article Azure in China, it came in to my interest to look for any other country/region specific independent cloud data center requirements.  I came across Azure for US Govt(Similar to Amazon Govt Cloud) instance and Azure Germany data center.  For this article context I will be covering only Azure in Germany.

What is Azure Germany?

Just like regional regulatory requirements in China, Germany also wanted a completely locally owned/managed Azure Data Center for EU/EFTA/UK requirements. This is also to ensure stricter access control and data access policy measurements. This  approach is to enable organizations doing business in EU/EFTA and UK can better harness the power of cloud computing.

  • All customer data and related applications and hardware reside in Germany
  • Geo-replication between datacenters in Germany to support  business continuity
  • Highly secured datacenters provide 24×7 monitoring
  • It meets all Public sector or restricted industry requirements
  • Follows all Compliance requirements for EU/EFTA and UK.
  • Lower cost, locally accessible  within your business locations in Germany/EU.

“ Azure Germany is an isolated Azure instance in Germany, independent from other public clouds.”

Who controls it?

An independent data trustee controls access to all customer data in the Azure Germany datacenters. T-Systems International GmbH, a subsidiary of Deutsche Telekom and an experienced, well-respected IT provider incorporated in Germany, serves as trustee, protecting disclosure of data to third parties except as the customer directs or as required by German law.

** Even Microsoft does not have access to customer data or the datacenters without approval from and supervision by the German data trustee.

What Compliance?

Azure Germany has an ongoing commitment to maintaining the strictest data protection measures, so organizations can store and manage customer data in compliance with applicable German laws and regulations, as well as key international standards. Additional compliance standards and controls that address the unique role of the German data trustee will be audited over time. Refer to: Microsoft Trust Center compliance.

[Source : Microsoft Azure]

Useful Links:

IoT Hub vs Event Hub–A quick comparison

December 11, 2016 Azure, Cloud Computing, Cloud to Device, Communication Protocols, Connectivity, Contrained Networks/Devices, Data Hubs, Device Shadow, Device to Cloud, Device Twin, Emerging Technologies, Event Hubs, HTTP2, Identity of Things (IDoT), Intelligent Cloud, Internet of Things, Interoperability, IoT, IoT Hub, IoT Privacy, IoT Security, Messaging, Microsoft, Performance, Protocols, Reliability, Scalability, Tech-Trends No comments

With this article I am trying to provide you a birds eye view comparison of IoT Hub and Azure Event Hub, so that some of you may stop feeling that there is nothing new in IoT Hub.

For the interest of this article, I put together a table with side-by-side comparison of some important features/desired features from an IoT Hub like platform.

Feature IoT Hub Event Hub
Communication Supports both device-to-cloud and cloud-to-device bidirectional communication Supports only device-to-cloud communication
State Management Can maintain device state using Device Twins and query them whenever needed. Not Supported
Protocol Support AMQP 1.1, AMQP over Web Sockets, MQTT 3.2, MQTT over Web Sockets, HTTP 1.1, Web Sockets. AMQP 1.1, AMQP over Web Sockets, HTTP 11 , Web Sockets only
Protocol Extensions Provides IoT protocol gateway a customizable implementation for industrial protocol channelling. Not Supported
Security Provides identity to each device and easily revocable through IoT Hub Device Management portal. Shared access policies with limited revocation capabilities are provided.
Monitoring/ Operations Provides a rich set of features through Device Management capability. Includes individually enable/disable or provision new device. Change security keys as needed. View/identify individual device problems easily. Does not provide individual performance metrics. Can provide only a high level aggregated metrics only.
Scalability Scalable to thousands/millions of simultaneous devices Limited number of simultaneous connections up to 5000 connections per Azure Service Bus Quotas. Event Hub provides a capability to partition your message to channel it in to associated Service Bus quotas.
SDK Support/ Developer Support Provides very good Integration SDK and developer support. Both Azure IoT  Device SDK and IoT Gateway SDK are the most essential kits provided for almost all devices/OS platforms. It also support all the latest programming languages such as C#, Node.js, Java and Python.
Also provides  direct MQTT, AMQP and REST based HTTP APIs.
Very detail oriented documentation provided.
.NET, Java and C apart from protocols such as AMQP, HTTP API interfaces.
Files/Images Upload Capability Supports IoT devices/solutions to upload files/images/snapshots to cloud and define a workflow for processing them. Not Available
Message Routing Very decent message routing capability is available out of the box. Up to 10 end points can be defined and Advanced Rules can be defined on how routing should occur. Requires additional programming and hosting to support as per the need.

From this comparison table, you can analyse that IoTHub is the right candidate for your IoT solution needs, as Event Hub lacking certain capabilities that are essential for an IoT Ingestion point. If you are only requiring to send messages to cloud and doesn’t require any fancy stuff as IoTHub provides, you can choose Event Hub.

Remember with more power comes more responsibility, that’s what IotHub intend to provide to you.

Hope this overview was helpful. Please feel free to comment or initiate a discussion any time. Please share your feedbacks on this article as well.

Redis Cache–Azure Plans

August 13, 2016 .NET, ASP.NET, Azure, Cloud Computing, Data Caching, Data Hubs, Emerging Technologies, KnowledgeBase, Microsoft, Performance, Redis Cache, Windows Azure Development No comments

redis-white

Azure Redis Cache, a secure data cache based on Open source Redis Cache, which will provide you a fully managed/serviced instance from Microsoft. Means you don’t have to bear the burden of managing the server/software patches etc..

What is Redis Cache?

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.

You can run atomic operations on these types, like appending to a string; incrementing the value in a hash; pushing an element to a list; computing set intersection, union and difference; or getting the member with highest ranking in a sorted set.

In order to achieve its outstanding performance, Redis works with an in-memory dataset. Depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log. Persistence can be optionally disabled, if you just need a feature-rich, networked, in-memory cache.

Redis also supports trivial-to-setup master-slave asynchronous replication, with very fast non-blocking first synchronization, auto-reconnection with partial resynchronization on net split.

5 High-level Use Cases of Redis Cache

1. Session Cache
One of the most apparent use cases for Redis is using it as a session cache. The advantages of using Redis over other session stores, such as Memcached, is that Redis offers persistence. You can maintain your applications user, role and authorization permission lists etc in Redis Cache for faster accessibility.

2. Full Page Cache (FPC)
Outside of your basic session tokens, Redis provides a very easy FPC platform to operate in. Going back to consistency, even across restarts of Redis instances, with disk persistence your users won’t see a decrease in speed for their page loads

3. Queues
Taking advantage of Redis’ in memory storage engine to do list and set operations makes it an amazing platform to use for a message queue. Interacting with Redis as a queue should feel native to anyone used to using push/pop operations with lists in programming languages such as C#, Python, Java, Php etc.

4. Leaderboards/Counting
Redis does an amazing job at increments and decrements since it’s in-memory. Sets and sorted sets also make our lives easier when trying to do these kinds of operations, and Redis just so happens to offer both of these data structures.

5. Pub/Sub
The use cases for Pub/Sub are truly boundless. You can use it for social network connections, for triggering scripts based on Pub/Sub events, and even a chat system built using Redis Pub/Sub!

[Courtesy: ObjectRocket]

Finally let us come to context of this blog to take you to essential pricing model from Microsoft:

Azure Redis Cache is available in three tiers:

  • Basic—Single node, multiple sizes, ideal for development/test and non-critical workloads. The Basic tier has no SLA.
  • Standard—A replicated cache in a two-node primary/secondary configuration managed by Microsoft, with a high-availability SLA.
  • Premium—All of the Standard tier features, including a high-availability SLA, as well as better performance over Basic and Standard-tier caches, bigger workloads, disaster recovery, redis persistence, redis cluster, enhanced security and isolation through Virtual Network Deployment.
  • ** Basic and Standard caches are available in sizes up to 53 GB(250 MB, 1 GB, 2.8 GB, 6 GB, 13 GB, 26 GB, 53 GB. )
  • ** Premium caches are available in sizes up to 530 GB with more on request.

[Courtesy: Microsoft]

Useful Links: