As the Co-Founder and Chief Technology Officer (CTO) of CallTech Inc., I lead the development of AI-powered solutions for call center optimization, including speech analytics, sentiment analysis, and performance evaluation. CallTech’s flagship platform, Calltech.app, is a cloud-based system designed to enhance customer satisfaction and improve workflow efficiency for businesses of all sizes. Previously, as the R&D Center Director at AdresGezgini Inc., I oversaw digital marketing solutions and web-based software development projects, working with a team of certified professionals to deliver cutting-edge services to thousands of businesses. I also hold a Ph.D. in Electrical and Electronics Engineering from the İzmir Institute of Technology, where I served as a Research Assistant from 2004 to 2011. In addition, I wasa visiting lecturer at local universities, teaching courses in Image Processing and Python programming. Currently, I also work as a Deep Learning/Machine Learning Engineer at digiMOST GmbH, a subsidiary of AdresGezgini Inc., located in Marl, Germany.Outside of work, I am a father of two and a lifelong learner, continually exploring advancements in AI and technology.
Ph.D.
Electrical and Electronics Engineering Department
M.Sc.
Electrical and Electronics Engineering Department
B.Sc.
Electrical and Electronics Engineering Department
In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-300M models which are two pre-trained multilingual models for speech to text were examined for the Turkish language. Mozilla Common Voice version 11.0 which is prepared in Turkish language and is an open-source dataset, was used in the study. The multilingual models, WhisperSmall and Wav2Vec2-XLS-R-300M were fine-tuned with this data set which contains a small amount of data. The speech to text performance of the two models was compared. WER values are calculated as 0.28 and 0.16 for the Wav2Vec2-XLSR-300M and the Whisper-Small models respectively. In addition, the performances of the models were examined with the test data prepared with call center records that were not included in the training and validation dataset.
This study presents a pre-trained BERT model application on texts that are extracted from website URLs automatically to classify texts according to the industry. With the aim of doing so, the related dataset is first obtained from different kinds of websites by web scraping. Then, the dataset is cleaned and labeled with the relevant industries among 42 different categories. The pre-trained BERT model which was trained on 101.000 advertisement texts in one of our previous ad text classification studies is used to classify texts. Classification performance metrics are then used to evaluate the pre-trained BERT model on the test set and 0.98 average accuracy and 0.67 average F1 score for different 12 categories are obtained. The method can be used to test the compatibility of texts to be used in online advertising networks with the advertiser's industry. In this way, the suitability of the texts, which is an important component in determining the quality of online advertising, within the industry will be tested automatically.
In this study, it is focused on the automatic evaluation of telephone conversations between call center employees and customers as positive or negative. The dataset used in the study include telephone conversations between call center employees and customers in the company. The data set contains 10411 three-second call center records; 5408 of them are positive records and 5003 of them are negative records that include arguments, anger and insults. In order to obtain meaningful features for emotion recognition from voice records, MFCC features were extracted from each call center records. The proposed CNN architecture is trained with MFCC features to classify call center records as positive or negative. The proposed CNN model showed 86.1% training accuracy, 77.3% validation accuracy and it achieved 69.4% classification accuracy on the test data. This study aimed to increase customer satisfaction by automatic analysis of conversations in call centers and notifying quality managers of negative records.
Image segmentation has been a well-addressed problem in pattern recognition for the last few decades. As a sub-problem of image segmentation, the background separation in biomedical images generated by magnetic resonance imaging (MRI) has also been of interest in the applied mathematics literature. Level set evolution of active contours idea can successfully be applied to MRI images to extract the region of interest (ROI) as a crucial preprocessing step for medical image analysis. In this study, we use the classical level set solution to create binary masks of various brain MRI images in which black color implies background and white color implies the ROI. We further used the MRI image and mask image pairs to train a deep neural network (DNN) architecture called U-Net, which has been proven to be a successful model for biomedical image segmentation. Our experiments have shown that a properly trained U-Net can achieve a matching performance of the level set method. Hence we were able to train a U-Net by using automatically generated input and label data successfully. The trained network can detect ROI in MRI images faster than the level-set method and can be used as a preprocessing tool for more enhanced medical image analysis studies.
Call center agents are the first point of contact for customers and are responsible for providing customer service. However, some agents may not be able to provide the required service due to various reasons. In this study, we propose a novel semi-supervised framework for call center agent malpractice detection. The proposed framework consists of two main steps. In the first step, we use a deep neural network to learn the features of the call center agent's speech. In the second step, we use a semi-supervised learning method to detect the malpractice of the call center agent. We evaluate the proposed framework on a real-world dataset and show that the proposed framework outperforms the state-of-the-art methods.
Text classification is a natural language processing (NLP) problem that aims to classify previously unseen texts. In this study, Bidirectional Encoder Representations for Transformers (BERT) architecture is preferred for text classification. The classification is aimed explicitly at a chatbot that can give automated responses to website visitors' queries. BERT is trained to reduce the need for RAM and storage by replacing multiple separate models for different chatbots on a server with a single model. Moreover, since a pre-trained multilingual BERT model is preferred, the system reduces the need for system resources. It handles multiple chatbots with multiple languages simultaneously. The model mainly determines a class for a given input text. The classes correspond to specific answers from a database, and the bot selects an answer and replies back. For multiple chatbots, a special masking operation is performed to select a response from within the corresponding bank answers of a chatbot. We tested the proposed model for 13 simultaneous classification problems on a data set of three different languages, Turkish, English, and German, with 333 classes. We reported the accuracies for individually trained models and the proposed model together with the savings in the system resources.
Identifying the authors of a given set of text is a well addressed and complicated task. It requires thorough knowledge of different authors' writing styles and discriminating them. As the main contribution of this paper, we propose to perform this task using machine learning and deep learning methods, state-of-the-art algorithms, and methods used in numerous complex Natural Language Processing (NLP) problems. We used a text corpus of daily newspaper columns written by thirty authors to perform our experiments. The experimental results proved that document embeddings trained via neural network architecture achieve cutting edge accuracy in learning writing styles and identifying authors of given writings even though the dataset has a considerably unbalanced distribution. We represent our experimental results and outsource our codes for interested readers and natural language processing (NLP) enthusiasts as a GitHub repository. They can reproduce and confirm the results and modify them according to their own needs.
How can we use a text corpus stored in a customer relationship management (CRM) database for data mining and segmentation? To answer this question, we inherited the state of the art methods commonly used in natural language processing (NLP) literature, such as word embeddings, and deep learning literature, such as recurrent neural networks (RNN). We used the text notes from a CRM system taken by customer representatives of an internet ads consultancy agency between 2009 and 2020. We trained word embeddings by using the corresponding text corpus and showed that these word embeddings could be used directly for data mining and used in RNN architectures, which are deep learning frameworks built with long short-term memory (LSTM) units, for more comprehensive segmentation objectives. The obtained results prove that we can use structured text data populated in a CRM to mine valuable information. Hence, any CRM can be equipped with useful NLP features once we correctly built the problem definitions and conveniently implement the solution methods.
When a robust and dense surface reconstruction is aimed, structured light imaging techniques are usually much appreciated. In this paper we propose a method to reconstruct both geometrical and reflective properties of surfaces by using structured light imaging. We use a technique where a camera and a projector are both treated as viewing devices. They are calibrated in the same manner. Each visible point can be correctly located on both image planes without solving a correspondence problem; hence, a dense reconstruction can be obtained. Since both the camera and the projector are explicitly calibrated, lighting and viewing directions can be identified for each surface point. It is also possible to measure reflected radiance by using high dynamic range (HDR) images for each surface point. The lighting and viewing directions that are known after calibration are combined with the reflected radiance and the incoming irradiance measurements to determine the bidirectional reflectance distribution function (BRDF) values of the material at the reconstructed surface points. We illustrate the reconstruction of surface reflection properties of sample surfaces by fitting the Phong BRDF model to the BRDF measurements.
In this thesis, we aim to capture realistic geometrical descriptions of real world scenes and objects with a special effort to characterize reflection properties. After a brief review of the stereo imaging literature, we show our contributions to enhance stereo matching performance by identifying and eliminating specular surface reflections. The identification of specular reflection can be done both passively and actively. We use dichromatic-based methods to identify and eliminate specular reflections passively. We utilize polarization imaging methods to do the same job actively. In this work we also study structured light based methods that can give better reconstruction results compared to stereo imaging methods. We propose three laser scanners equipped with a pair of line lasers and a method to calibrate these systems. Another convenient way to obtain good surface reconstruction results using structured light is to use projectors that can be used as a light source that project complicated patterns. We show our results from a digital camera-projector-based scanning system as well. This system can robustly generate a very dense reconstruction of surfaces. We also use the projector based scanning system to determine the surface reflection properties. Using high dynamic range imaging (HDRI) techniques makes it possible for us to estimate scene radiance values. Since we can determine the incoming and outgoing light directions, we are able to measure bidirectional reflectance distribution function (BRDF) values from reconstructed surface points for corresponding directions. If the sample surface have not only diffuse reflection components but also a sufficient amount of specular highlights, it is possible to approximate BRDF corresponding to a surface by fitting an analytical BRDF model to the measured data. In our work we preferred to use Phong BRDF model. Finally, we present results with rendered synthetic images where the parameter values of the Phong model were estimated using scans of real objects.
In this study, 3D scanning systems that utilize a pair of laser stripes are studied. Three types of scanning systems are implemented to scan environments, rough surfaces of near planar objects and small 3D objects. These scanners make use of double laser stripes to minimize the undesired effect of occlusions. Calibration of these scanning systems is crucially important for the alignment of 3D points which are reconstructed from different stripes. In this paper, the main focus is on the calibration problem, following a treatment on the pre-processing of stripe projections using dynamic programming and localization of 2D image points with sub-pixel accuracy. The 3D points corresponding to laser stripes are used in an optimization procedure that imposes geometrical constraints such as coplanarities and orthogonalities. It is shown that, calibration procedure proposed here, significantly improves the alignment of 3D points scanned using two laser stripes.
A method to enhance the performance of stereo matching is presented. The position of the specular light reflection on an object surface varies due to the change in the position of the camera, light source, object or all combined. Additionally, there may be situations exhibiting a colour shift owing to a change in the light source chromaticity or camera white balance settings. These variations cause misleading results when stereo matching algorithms are applied. In this reported work, a single-image-based statistical method is used to normalise source images. This process effectively eliminates non-saturated specularities regardless of their positions on the object. The effect of specularity removal is tested on stereo image pairs.
Automatically recognizing and analyzing visual activities in complex nenvironments is a challenging and open-ended problem. In this thesis this problem domain is visited in a chess game scenario where the rules, actions and the environment are well defined. The purpose here is to detect and observe a FIDE (Federation International des Echecs) compatible chess board and to generate a log file of the moves made by human players. A series of basic image processing operations have been applied to perform the desired task. The first step of automatically detecting a chess board is followed by locating the positions of the pieces. After the initial setup is established every move made by a player is automatically detected and verified. A PC-CAM connected to a PC is used to acquire images and implement the corresponding software. For convenience, .Intel R Open Source Computer Vision Library (OpenCV). is used in the current software implementation.